Data Control In A Multi-Cloud World


I was talking with Michael Tso, CEO at Cloudian, last week about their S3 storage product and he mentioned a feature that sounded like a silly gimmick at first: a kind of “data GPS” that shows you where your data is, down to which disk it’s on, in which server, in which rack. In a cloud system, why do you care?

The more I thought about it, the more I understood why this feature is so popular: It’s fine to have storage policies that define where your data should reside—in this datacenter, but not that one; data from Germany can’t go to the US—but how do you know the policy is working? Trust, but verify.

Telstra’s well-regarded Five Knows framework [PDF link] requires that you “know where your data is” which becomes tricky when you’re dealing with cloud systems whose very design encourages you not to care about exactly where your data resides. If you care that your data is on a specific server, then that server can’t be quickly and easily replaced as a commodity part. Disks die all the time, and servers are constantly upgraded to newer, faster models. Does it really matter where in a datacenter the server is?

Probably not, but if the server stops being in the datacenter because it’s been stolen, yes, you do care. Just as you don’t really care which chip of flash your data is on, but if that chip is inside the laptop you just left in a taxi, you care. You need to be sure that before a server is removed from the datacenter (because it’s been replaced by a newer, faster one, rather than because it’s been stolen) that all data on that system has been moved to the new one. Because data isn’t physical, “moving” it actually requires copying it first, then deleting the copy you don’t need. You need some level of assurance that the deleting part actually happened.

This “data GPS” sounds like a silly gimmick, but it does provide a solution (well, a partial solution) to this important part of cyber-security: knowing where your data is. You want to know where all copies of your data are, at all times. Because the security—the secrecy and privacy—of that data, depends on it.

Which all comes down to control. Do you have control over where your data is, who can see it, and when?

That’s one appeal of on-site infrastructure: you maintain absolute control of where your information is. That control comes with costs—you have to manage all that infrastructure—but maybe it’s worth it? Every time you outsource part of the system (such as its physical location in a co-location facility) you cede some control to another party whose interests may not be fully aligned with your own. You hope that your contracts will provide some assurances, but those provide remedies after the fact. There’s nothing to prevent a provider from selling your servers on the black market for a quick buck other than the threat of consequences. You trust them not to.

And so we encrypt data at rest, and maintain control of the keys, to move the locus of control back to ourselves. Now it doesn’t matter quite so much if a bad actor at the service provider sells my servers for a quick buck. My data remains encrypted, and I can sue the provider for breach of contract, though I should probably check the fine print again to make sure.