Saturday 26 July 2014

Cloud, high availability, antifragility and so on


In a previous article I wrote about how (the lack of) operational maturity may be impacting the adoption of private cloud in enterprise data centres.  In truth, that's really only half the story: the other significant question is "what applications can be run in the cloud" ?

The majority of "serious" enterprise applications have been around for a long time - think Oracle RDBMS, SAP CRM, and more.  Even if the full client-server or N-tier application stack of these systems has a distributed front-end, at their heart they typically run as monolithic programs that are very tightly integrated into their host computing system and its associated storage and other resources.

A great deal of modern IT design is focused on how to make these systems as resilient as possible, by deploying on robust underlying hardware and software infrastructure, providing redundancy within each host, and providing automated failover and disaster recovery systems to try to ensure there is no single point of failure. When more performance is needed, servers are upgraded with new capacity - hopefully during a seamless internal upgrade of CPU and or memory, but often via a carefully managed and often protracted "lift and shift" migration and update.

The thing is that to a great extent this concept of hardening, protecting, and updating a few known, vital systems runs counter to the "pure" cloud model, which is that cloud-based applications should be independent of the underlying platform & be able to simply scale up or down by adding instances for performance. The application itself should be "antifragile", that is, not need careful maintenance to ensure that it is up and running (this is the "pets vs cattle" analogy).

Antifragility is a term coined by Nassim Taleb (he of  the "Black Swan theory" fame) to describe something that does not merely withstand a shock but actually improves because of it.  Mr Taleb gives a great introduction to Antifragility in his speech at the RSA.  The idea is catching on in the industry: PWC describes how Instagram founders Mike Krieger and Kevin Systrom made use of the concept as they faced the immense problems of scaling their new platform.

The poster-children of cloud computing: Instagram, Netflix, and others, have built their applications (from the ground up) by adopting the antifragile approach. So far, however, traditional software vendors have not yet taken on this methodology, not least because the concepts are too radical for a large portion of their (understandably conservative) customer base.  In this respect we face a chicken-and-egg situation:  without a well-established base of private cloud computing environments to target, software application vendors are unlikely to create products with the cloud in mind.  Simultaneously, without applications that can take advantage of cloud infrastructure, operation managers and systems designers must rely on the traditional approach for making services highly-available, which these days can have the unfortunate side-effect of trying to shoehorn "pets" into an environment that's intended for "cattle"... which in turn yields poor results, frustration, and abandonment of "cloud" as an operational model.

As a "chicken-and-egg" problem there is an obvious solution[1]: start building the private cloud infrastructure for those applications that can make good use of it in the short term: development systems, stateless servers, short-term but frequently needed project infrastructure, and so on. Ideally, data centre managers can re-use their existing infrastructure and virtualisation systems under  Infrastructure-as-a-Service  (IaaS) platform management software, so as not to face a huge and complex migration or the additional expense of a separate silo of equipment just for "cloud".  Meanwhile many enterprise software vendors are working on Software-as-a-Services versions of their own products, in an attempt to capture that particular part of the market. This indicates that when and if cloud computing becomes a well-known operational method for private data centres, the software vendors have already done most of the work to "cloudify" their products.

The short version for IT managers and systems designers:  start building operational experience with private cloud now, and check with vendors about the availability of their products for "real" ("cattle-style") cloud deployment from time to time to assess the viability of moving mission-critical loads to a true cloud environment.



[1] In the case of "chicken-and-egg", the answer is "egg" (from dinosaurs, you see...). 

No comments:

Post a Comment