Modern Infrastructure

OpenStack infrastructure turns to containers

Sergey Nivens - Fotolia

Big data revives IT operations analytics

Big data is a fitting match for IT operations management, where the endless logs and events finally add up to more efficiency and less troubleshooting.

At least in IT, the cobbler's children haven't always been last on the list.

For decades, IT operations have been helped by sophisticated tools. Still, many management challenges remained seemingly intractable. At last, though, that final frontier has been breached by a new generation of IT operations analytics tools that use far more sophisticated mechanisms to illuminate problems and posit solutions.

"Many vendors are extending their tools in the area of big data and analytics to solve operational, availability and user experience issues," said Tim Grieser, program vice president for enterprise system management software at IDC. "What's new here is incorporating that big data and analytics perspective to take into account very large volumes of data from a lot of sources."

Some of those sources are within the vendor's own tool set, while some take in wider types of data, particularly "wire data," consisting of most of the data transmitted across computer and telecommunication networks via wire and transport protocols. The goal is to have a new way to assess IT performance and availability and user experience, which improves how the operations team understands and diagnoses those issues.

"Ideally, this will go beyond past practice and enable you to anticipate and prevent problems in the future," Grieser said. IT operations management is about service availability and service experience. Today, tools gather information spanning from the end user to operational data within the IT infrastructure, as well as data about business outcomes, he explained.

Organizations have found success with the newest generation of smarter IT operations analytics tools. "The biggest use case is troubleshooting," he said. Organizations can skip the traditional "war room" approach and move from managing crises to just plain managing.

"By looking at data from all sources, they can perhaps get quietly to the right part of the infrastructure or application where there is a problem," he said. And, he noted, the deployment options range from software as a service (SaaS) to on premises. For example, Splunk, one of the pioneers in smartening up IT operations tools, now offers a cloud-based SaaS as well as its traditional Splunk Enterprise operational intelligence platform. Organizations that choose the SaaS option often do so to avoid the costs of processing and storing large volumes of data, he said. On the other hand, those who opt to use the on-premises tool may be driven by policies relative to data privacy and data security, he noted.

"We have had applications for these IT challenges for many years and now there is extra capability of big data and machine learning from companies like ExtraHop and BMC," said Dan Conde, an analyst at Enterprise Strategy Group (ESG).

A more efficient IT operation

Unlike other business functions where big data is openly exploited, analytics are applied in IT to fairly specific use cases. IT operations analytics scenarios are tied to making IT more efficient, finding problems and getting better security and responsiveness. Network troubleshooting applications and appliances can generate huge volumes of data; big data allows you to save more information than ever before.

"With big data economics, it has given new life to the space, so we see more traditional players changing infrastructure, and engineering approaches the problems in a deeper way," said Nik Rouda, analyst at ESG.

That labor-intensive and often futile work of IT fire fighters changes radically. So many systems within the IT environment generate logs and data about their activities -- and errors -- that it represents a perfect opportunity for analysis. Since much of that data is unstructured, it also makes it a good candidate for big data handling techniques. Data with obvious defined thresholds to monitor often is also ripe for big data tools to discover the patterns hidden within.

And it's not just the data about IT itself that can feed IT operations analytics and management -- it is all the data.

"Wire data is now seen by most organizations, including Gartner, as the most important source for IT performance and availability management; even more important than machine or host-based data sources," explained Erik Giesa, the senior vice president of marketing and business development at ExtraHop, a company with products focused on delivering visual reports and data analytics for IT intelligence and business operations.

Citing the example of a single data set by an ExtraHop appliance, he explained that it includes real-time preprocessing, measurement and calculation of over 250 data packets exchanged between four systems using multiple protocols and different data payloads. "Multiply this by tens, if not hundreds of thousands, of transactions per second and you get a sense of the intelligence and scale required to perform real-time stream analytics," said Giesa.

Even a dozen humans could not perform all of these measurements and calculations using packet capture and network data for a single transaction, he explained, let alone for thousands of transactions per second. Extracting and producing calculations on page load time, bandwidth used, transaction size, order ID and revenue, and determining the location while ensuring a database transaction was successful is "beyond a typical IT operations monitoring tool, whether it's an APM [application performance management], NPM [network performance management] or log aggregation product," he added. Very little of this information is typically logged by an application or machine, and the effort required to instrument an agent to acquire and analyze the data would be impractical, especially at scale.

"Now imagine trends like IoT [internet of things], SDN [software-defined networking], containerization and microservices; organizations aren't going to instrument all of those sensors, networks or microservices with agents or self-reported logs. The only way to be able to analyze all of that activity and behavior will be from the wire," Giesa said.

Big data's contribution to IT operations analytics can greatly aid security in the enterprise, Rouda said. "An occasional SQL injection might not be noticed if it is repeated infrequently, but big data analytics might quickly spot it as anomalous."

Given that background, he explained, it is no wonder there's some excitement about being able to identify issues faster and hone in on slippery issues such as utilization.

It's a natural place for investment, too, because it falls within IT's own purviews, area of control and operational charter. "It is using technology to improve technology -- so that is why it has been getting adopted," he said. The trend has led to the emergence of some newer IT operations management vendors, the best-known being Splunk. There are also service-level management tool vendors such as BMC, as well as other companies that came from the networking space, he noted.

How the "future" looks

Splunk's core platform, Enterprise 6.4, searches, monitors and analyzes machine data from a wide range of sources: customer clickstreams and transactions to security events and network activity. It uses a range of search, analytics, visualizations and prepackaged use cases to help IT discover and share insights. According to the company, Splunk Enterprise can be applied to application delivery, IT operations, security and compliance, business analytics and IoT. Over 1,000 Splunk apps and add-ons can also deliver prepackaged views, dashboards and workflows.

Similarly, BMC's TrueSight Intelligence SaaS platform uses a REST API to connect with sources of IT operational and business data and then automatically learns the behavior of systems. It outputs this information to a graphical interface so users can view the health and performance of application and key performance indicators.

CloudPhysics, a virtualization management tool provider, also gathers an enormous amount of data about its customers' data centers to provide insights about capacity opportunities, performance issues and overall data center risk and health. In addition, CloudPhysics can compare each customer to the company's global dataset -- an anonymized aggregation of all its customers' metrics -- allowing users to benchmark their environment against the average. These insights help customers decide whether to buy another server, whether resources are underutilized and even which applications would work best in the cloud.

There's an app for that

Core analytics has always been important in IT operations. "We had to monitor any point where the digital services were important for the business," said Bill Berutti, president of performance and availability at BMC.

Recently, as apps became more pervasive and user experience more critical, the application performance market grew along with the need to analyze the data. Log analytics also became important and, once again, he noted that companies like BMC have success using logs to understand if there is a problem and have to figure out at what tier of the application or infrastructure the problem occurs.

IT operations analytics became even more important as the move to digital services began in earnest. "The big disruptive ideas like Airbnb or Uber are examples of extreme digital that has disrupted industries, and I would bet that IT analytics have been an important part of that," Berutti said. "For those organizations, if the application doesn't work, the business doesn't exist."

But it goes beyond those all-digital businesses. "Every retailer and financial institution is increasingly in the same situation," he added.

For example, the competitive advantage in banking used to be ATMs; now it is 24-hour banking available through the app on your mobile device, which can provide account balances and handle check deposits easily, quickly and conveniently. Supporting that functionality requires strong analytics, Berutti said.

Some IT organizations have tried to build capabilities by applying big data themselves, but "ran into difficulty with the challenges of data science or machine learning, which require skills not present in most traditional IT shops," he said.

From troubleshooting to forecasting

Looking beyond analytics to "fix" what's broken, the other main emerging focus within the industry is on predictive analytics, Berutti said. There have been some aspects of predictive analytics available in the past, but machine learning algorithms that have come into vogue are better equipped to handle multivariate root cause analysis, where many problems arise, he noted.

Machine learning is well applied to IT challenges, ESG's Rouda agreed. "With machine learning, you can watch all aspects of network behavior and truly start to learn what is going on," he said. For example, a large chip manufacturer reported having 80 billion to 100 billion network events each day and employed dozens of security workers to try to understand the system dynamics. However, as Rouda noted, that's a challenge for any size group of people. 

By contrast, with machine learning, "we can group things into clusters and a human can have a role looking in on the process and helping to refine it," he said. "You can't do it all by machine because the machine won't understand every implication, but it would certainly be very good at establishing correlations. And, in fact, that is where a lot of shifting is occurring."

Rouda speculates that the predictive market for IT operations analytics will grow first with vendors updating their existing customer base with machine learning and big data capabilities and then moving to expand the market. "The number of apps and the amount of storage per IT person has been going up, but IT budgets have been staying flat, so this innovation can make the management process far more efficient," Rouda said.

About the author:
Alan Earls is a Boston-based freelance writer focused on business and technology.

Article 7 of 8

Dig Deeper on IT Log Management and Reporting

Get More Modern Infrastructure

Access to all of our back issues View All