Big data is being created everywhere we look, and we are all thinking about how to take advantage of it. I certainly want to come up with some novel new big data application and become fabulously wealthy just for the idea. The thing is, most companies -- perhaps all -- can profit from big data today just by accelerating or refining some piece of their current business, supposing they can identify and corral the right information in the right time and place.
There is no need to find a new earth-shattering application to get started. I believe a significant big data payback is right in front of any marketing, sales, production or customer-engagement team. One simply needs to find a way to unlock the buried big data treasure. And, of course, that's where big data concerns from practical to theoretical bubble to the surface.
A big sticking point has been finding the data science expertise, especially experts who could build optimized machine learning models tailored for your exact business needs. But we are seeing some interesting efforts recently to automate and, in some ways, commoditize big data handling and complicated machine learning. These big data automation technologies enable the regular Java Joe or Josie programmer to effectively drop big data analytics into existing, day-to-day operational-focused business applications.
Not only does this have the democratizing effect of unlocking big data value for non-data scientists, but it also highlights the trend toward a new application style. In the next three to five years, we will see most business applications that we've long categorized as transactional converge with what we've implemented separately as analytical applications. Put simply, with big data power, "business intelligence" is becoming fast enough and automated enough to deliver inside the operational business process in active business timeframes.
As these data processing worlds collide, they will reveal big data concerns for IT staff, and for those making decisions on IT infrastructure and data centers. Storage, databases and even networks will all need to adapt. Along with the rise of the internet of things (IoT), hybrid cloud architectures, persistent memory and containers, 2017 is going to be a pivotal year for challenging long-held assumptions and changing IT directions.
While I will undoubtedly focus a lot of time and energy as an industry analyst on these fast-evolving topics in the near term, there is a longer-term big data concern: Some companies might not be able to take advantage of this democratization of data simply because they can't get access to the data they need.
We've heard warnings about how hard it is to manage big data as important data. We need to think about how we can ensure it's reliable, how we can maintain and ensure privacy -- and regulatory compliance -- how we can ensure we only implement ethical and moral big data algorithms and so on. But before all that, you first need access to the data -- assuming it exists or can be created -- that is valuable to your company. I call this the data paucity problem -- there's too little big data in use.
As an example, I don't believe every IoT device manufacturer will end up getting unfettered access to the data streams generated by their own things, much less to the ecosystem of data surrounding their things in the field. I think it is inevitable that some will be getting locked out of their own data flowback.
Already, we've had conversations with folks that build a subsystem or component that they then sell down a supply chain before it ends up assembled in a larger system for actual field use.
Imagine something like a small but important electromechanical part destined to be assembled into a car. Even if those parts could create insightful data streams that would be invaluable to the part manufacturer, how far back up the supply chain will -- or can -- that data readily flow? Information about the actual deployment and customer no doubt belongs to the business-to-consumer vendor at the end of the chain. Will they share active field data with suppliers? Will their customers want or allow them to expose their data in that way? Can a system vendor share all embedded component data freely back upstream to all suppliers? This doesn't happen easily today, and it will likely be an even bigger big data concern as components are loaded with more and more IoT sensors.
Organizations that will come to rely on or expect to have access to certain key data sources for valuable insight may find that they are locked out -- or priced out -- of that data. This could be the result of privacy concerns, complex supply models or even simple "data" market activity.
In fact, data paucity will be a concern even if a given data source is made available. That data might not be available in a timely manner, at an affordable price, in a suitably granular form or with enough supporting data to interpret it properly. It's easy to degrade the value of big data by aggregating it, delaying it, rolling up intervals or rounding off precision. We can imagine how data sources might lose significant value with every field redacted or important bit masked.
The internet of internet of things
IoT will soon host computing within the web of itself. All of those devices, when effectively connected in a distributed fashion, could come to host and process distributed data internally, within IoT. While much of what we call IoT today is really structured as many isolated directed networks (most devices report up a chain of ownership and strict hierarchy of data flow), over time, more devices will connect with each other across environments and vendors and at multiple levels. These inevitable interconnections will make IoT truly an internet of its own.
Data could live at all levels of nodes within that web: endpoint devices, intermediate collection points, cloud aggregations, deep archives. Programming in the future could involve describing what you want done with the data wherever it is, and the system will decide where and when your inherently distributed functions will run. The data itself could be driving a lot of the operational IT hosting decisions.
You could be locked out of not just a data source, but out of the opportunity to participate in the IoT as an internet itself. If you are only passively downstream, getting a copy of what comes out of the archive, you will be necessarily limited to historical analysis.
While this has some value, recall that some of today's most valuable new applications are increasingly converging analytics and operational processes. If you want to play in that kind of future, you'll need to address big data concerns today, including securing data access rights and operational participation.
How enterprises attack big data
The gold rush is on for big data infrastructure
More predictions for the future of big data