If the latest AWS outage changes anything in your approach to cloud adoption, then you’re doing it wrong.
This is not the first AWS outage (I first wrote about one in 2011, back when I had hair), nor will it be the last. Nor will it be only AWS that suffers another outage at some point in the future. We’ve already seen outages from Office365, Azure, Softlayer, and Gmail.
Outages are a thing that happens, whether your computing is happening in your office, in co-location, or in ‘the cloud’, which is just a shorthand term for “someone else’s computer”.
To think that putting applications ‘in the cloud’ magically makes everything better is naive at best.
I’ve written about the resiliency trade-off before, so to summarize, there are only two ways to approach this: Assume robust, or assume fragile.
Writing an application that assumes all of the infrastructure it runs on is fragile and may fail at any moment is complex and difficult. So, for many years the dominant thinking in writing applications was to assume that the infrastructure was essentially perfect, which made writing the applications much simpler. This is the assume robust model.
The trade-off was that we had to spend a lot of time and effort and money on making the infrastructure robust. So we have RAID, and clustering, and Tandem/HP NonStop, and redundancy, and a host of other techniques that help the infrastructure to stay online even when bits of it break.