The PivotNine Blog

Perishable Data From The Internet Of Things

10 December 2015

Justin Warren

Once upon a time, if an enterprise valued a piece of data, it would get loaded into the Enterprise Data Warehouse, a mighty beast that consumed huge quantities of data, and money, in order to churn out important insights once a quarter.

Mike Flannagan, VP and General Manager of Cisco’s Data and Analytics Group, believes that with the advent of the Internet of Things “the majority of data will be processed at the edge.”

“High value data may not end up in the data warehouse,” he says, because the data being generated out at the edge is difficult to bring back to a centralised store. Consider sensor equipment on an offshore drilling platform, or similar remote site. The bandwidth requirements for moving that data back to a central site are so immense that it’s just not cost-effective, or even feasible, to do it.

Instead it makes a lot more sense for industrial telemetry data, video surveillance feeds, even retail transaction data, to be processed at the edge, close to where it’s generated.

Cisco advocates customers make use of the infrastructure they already have—their networks—and augment it. “Use the network to process the data, not just move it along,” Flannagan says.

Cisco has devices to sell you to do this (of course) like their 4000 series Integrated Service Routers that essentially gloms a UCS style server onto a router. Compute and storage has become so powerful—and affordable—in such a small form-factor that you can now pack some serious processing power into remote office equipment.

Which is where my Pluto analogy kicks in.

Analytics is a term used to summarise building mathematical models for the way the world works. In the words of the great statistician George E P Box: “Essentially, all models are wrong, but some are useful.” Testing your model against real world data is important, because while your model might work really well on last year’s data, today the real world might be different to the way your imperfect model believes the world works.

If I have a model for how drilling equipment behaves a day or two before it breaks, I can use it to trigger preventative maintenance at the optimum time. For equipment that costs millions an hour if it’s broken, keeping it online can save a lot of money, which makes substantial investments worthwhile.

The thing about these models is that the more data you have, the better they work. That’s what all the fuss about so-called Big Data is partially about. Not very much data gives you the equivalent of a blurry blob instead of the crisp images of icy plains we have today. Getting more data, and in real time, at your remote sites is like sending a spacecraft to Pluto instead of trying to look at a distant planet from the blurry distance of Earth.

The camera is your model, developed at home, but sent out into the real world to report back. What if the data out on the remote rig says the drill will break in two hours, but it takes more than four and a half hours (that’s how long it takes light to get from Pluto to Earth) to send a signal back? Centralised command-and-control just doesn’t work.

This kind of data is perishable. You have to use it quickly, or its value drops rapidly. That is the promise of analysing data out at the edge, and being able to act upon it. Aggregate data can still be brought back into the central datacentre to look for overall trends, but data can be acted upon more quickly—and models tested for validity more often—if data processing happens closer to where it’s generated.

And then we can all enjoy high resolution pictures of the previously unseen.

This article first appeared in Forbes.com here.