Andrea Danti - Fotolia
Small World Big Data
Published: 17 Mar 2017
The data privacy and access discussion gets all the more complicated in the age of IoT.
Some organizations might soon suffer from data paucity -- getting locked, outbid or otherwise shut out of critical new data sources that could help optimize future business. While I believe that every data-driven organization should start planning today to avoid ending up data poor, this concern is just one of many potential data-related problems arising in our new big data, streaming, internet of things (IoT) world. In fact, issues with getting the right data will become so critical that I predict a new strategic data enablement discipline will emerge to not just manage and protect valuable data, but to ensure access to all the necessary -- and valid -- data the corporation might need to remain competitive.
In addition to avoiding debilitating data paucity, data enablement will mean IT will also need to consider how to manage and address key issues in internet of things data security, privacy and veracity. Deep discussions about the proper use of data in this era of analytics are filling books, and much remains undetermined. But IT needs to prepare for whatever data policies emerge in the next few years.
Piracy or privacy?
Many folks explore data privacy in depth, and I certainly don't have immediate advice on how to best balance the personal, organizational or social benefits of data sharing, or where to draw a hard line on public versus private data. But if we look at privacy from the perspective of most organizations, the first requirements are to meet data security demands, specifically the regulatory and compliance laws defining the control of personal data. These would include medical history, salary and other HR data. Many commercial organizations, however, reserve the right to access, manage, use and share anything that winds up in their systems unless specifically protected -- including any data stored or created by or about their employees.
If you are in the shipping business, using GPS and other sensor data from packages and trucks seems like fair game. After all, truck drivers know their employers are monitoring their progress and driving habits. But what happens when organizations track our interactions with IoT devices? Privacy concerns arise, and the threat of an internet of things security breach looms.
Many people are working hard to make GPS work within buildings, ostensibly as a public service, using Wi-Fi equipment and other devices to help triangulate the position of handheld devices and thus locate people in real time, all the time, on detailed blueprints.
In a shopping mall, this tracking detail would enable directed advertising and timely deals related to the store a shopper enters. Such data in a business setting could tell your employer who is next to whom and for how long, what you are looking at online, what calls you receive and so on. Should our casual friendships -- not to mention casual flirting -- bathroom breaks and vending machine selections be monitored this way? Yet the business can make the case that it should be able to analyze those associations in the event of a security breach -- or adjust health plan rates if you have that candy bar. And once that data exists, it can be leaked or stolen without proper internet of things data security.
Data to define you
The problem is not just getting your thermostat hacked or your toaster helping a bad guy through your home firewall due to IoT data security lapses. The deeper issue is machine learning algorithms run by institutions far beyond the immediate vendors and brands you buy profiling you. Imagine having to pay 20%-50% higher insurance premiums because your electric toothbrush hasn't had a new head installed recently. You might be profiled as fitting a certain political profile because of how you heat or cool your house. You might be targeted for high-risk loans because there is some correlation with how many times a week you opt for toast versus bagels.
You might counter that there are ways to ensure some basic privacy by aggregating and anonymizing personally identifying information out of such data, but we already know it's difficult, if not impossible, to truly anonymize stores of big data. Accumulated masses of IoT data can easily contain deeply embedded clues that can be correlated with public data sets to restore identifying information.
Imagine that your car reports where it's parked most nights. Or that smart components within the car can track when they were last serviced or upgraded. A business that makes clutches might learn about a car owner's home address -- and thus their identity -- travel patterns and driving habits.
Some supply chains already push the monitoring and proactive maintenance for embedded -- or even merely associated -- components back up their chains. Wal-Mart made a fortune offering its suppliers some transparency into sales in exchange for having those suppliers maintain their own inventory in stores. This seemed fine, since the traditional goods we've bought didn't keep reporting on us once we carried them home. But now, new, intelligent devices we buy and plug in keep up a continual connection and data flow up to a third-party service. Who's got eyes on all that big data about us that we unwittingly generate?
The hugely powerful capabilities of big data storage and analytics, growing real-time streams of low-level data from IoT, increasingly accessible AI and deep learning, persistent memory and increased chip-embedded functionality (i.e., encryption) are already here. As IT groups are tasked to operationalize any new capabilities, they should keep in mind that it is critical to build future-proof scalable architectures that can support fine-grained data management.
I expect organizations will find they need to create, store and use a lot more metadata than they do today. This metadata could include information about data use and access over time, chain of custody and provenance, encryption tags, source confidence, estimates about usefulness and, of course, policy tags indicating the usual retention, sensitivity, accessibility and other regulatory concerns that comprise IoT security. And remember -- metadata is itself data and has its own access, privacy and veracity requirements that will recursively require meta-metadata. Now there's a headache waiting to happen.
Furthermore, I'd bet most future data management products will embrace microservices that will implement data management and metadata-enforcement capabilities close to where data is stored. In a big and distributed IoT data world with device-level persistent memory and amorphous hybrid clouds, important data might live anywhere and flow in agile and fluid ways. In fact, some are predicting that important data will not only be generated in streams, but only really exists as streams through both processing and persistence.
How can you manage and ensure internet of things data security if it might be in motion everywhere at all times? Well, for one, any metadata -- particularly about privacy, access and veracity -- will have to travel with the data. One interesting emerging capability is blockchain functionality (made famous by Bitcoin). Blockchain is already being used as metadata in some new applications to digitally sign and help validate the source of application data.
Metadata management functionality will also need to remain close to the data wherever it is, or wherever it goes. Today, we are seeing new storage products emerging that support embedded lambda functionality in which the actual storage layer, like databases with their event-triggered stored procedures, can now execute arbitrary (including user-defined) functions directly in the storage layer close to the stored data (and metadata).
It may take some time to wrap your head around a new world of actively intelligent, data- and metadata-aware storage that is built for the era of internet of things data security. And even more new capabilities may yet be required to help address these issues. For example, since all data is related to all other data -- in some way -- the best future management view may be through a graphical metadata database. For IT to remain relevant, we have to be ready to tackle these new challenges, modernize our data centers and look deeply into IoT data security.
The bright sides of the IoT and big data partnership
This series explores what IoT and big data mean for the data center
An inquisitive investigation of machine learning