The PivotNine Blog

Data Control In A Multi-Cloud World

I was talking with Michael Tso, CEO at Cloudian, last week about their S3 storage product and he mentioned a feature that sounded like a silly gimmick at first: a kind of “data GPS” that shows you where your data is, down to which disk it's on, in which server, in which rack. In a cloud system, why do you care?

The more I thought about it, the more I understood why this feature is so popular: It's fine to have storage policies that define where your data should reside—in this datacenter, but not that one; data from Germany can't go to the US—but how do you know the policy is working? Trust, but verify.

Telstra's well-regarded Five Knows framework [PDF link] requires that you “know where your data is” which becomes tricky when you're dealing with cloud systems whose very design encourages you not to care about exactly where your data resides. If you care that your data is on a specific server, then that server can't be quickly and easily replaced as a commodity part. Disks die all the time, and servers are constantly upgraded to newer, faster models. Does it really matter where in a datacenter the server is?

Probably not, but if the server stops being in the datacenter because it's been stolen, yes, you do care. Just as you don't really care which chip of flash your data is on, but if that chip is inside the laptop you just left in a taxi, you care. You need to be sure that before a server is removed from the datacenter (because it's been replaced by a newer, faster one, rather than because it's been stolen) that all data on that system has been moved to the new one. Because data isn't physical, “moving” it actually requires copying it first, then deleting the copy you don't need. You need some level of assurance that the deleting part actually happened.

This “data GPS” sounds like a silly gimmick, but it does provide a solution (well, a partial solution) to this important part of cyber-security: knowing where your data is. You want to know where all copies of your data are, at all times. Because the security—the secrecy and privacy—of that data, depends on it.

Which all comes down to control. Do you have control over where your data is, who can see it, and when?

That's one appeal of on-site infrastructure: you maintain absolute control of where your information is. That control comes with costs—you have to manage all that infrastructure—but maybe it's worth it? Every time you outsource part of the system (such as its physical location in a co-location facility) you cede some control to another party whose interests may not be fully aligned with your own. You hope that your contracts will provide some assurances, but those provide remedies after the fact. There's nothing to prevent a provider from selling your servers on the black market for a quick buck other than the threat of consequences. You trust them not to.

And so we encrypt data at rest, and maintain control of the keys, to move the locus of control back to ourselves. Now it doesn't matter quite so much if a bad actor at the service provider sells my servers for a quick buck. My data remains encrypted, and I can sue the provider for breach of contract, though I should probably check the fine print again to make sure.

It also helps to explain why hybrid multi-cloud is so appealing to organisations: it's not appropriate to put all data in the cloud unless you are able to ensure you will maintain control over that data, both now and in the future. Will the Privacy Shield still operate for non-American data held in US data centers? Can US courts compel companies to hand over data held in other nations?

This concept of control—of being able to give my informed consent to who can access my data, when, and for how long—is missing in most discussions of information security and of privacy. Once you start thinking in these terms, it becomes more obvious how security and privacy are complementary rather than in opposition, and how they are intricately linked.

Who controls access to this data? is a question we should all be asking ourselves as we design the secure IT systems of the modern world.

This article first appeared in Forbes.com here.