The following is part two of a series on monitoring.
As more and more organizations adopt microservices and containers, they're looking for new monitoring tools that can help track performance. And while there are some technologies that can help, the bigger challenge is understanding relationships between individual containers -- which is where specialized container monitoring players have focused their efforts.
Docker infrastructure metrics are sufficient for things such as whether an individual service is up or not, but not whether the overall application performance is sufficient, said Amit Agarwal, chief product officer at Datadog, a cloud-based monitoring service.
"This is the problem of pretty much any virtual environment," Agarwal said, and where the metrics provided by Docker need to be complemented with other metrics. For example, Datadog can help you see the relationship between a database table scan, Web server connections and cache hits, he explained.
To do that, a monitoring service must understand relationships between services. For instance, Datadog takes into consideration "tags" that have been created by users of AWS' Elastic Compute Cloud (EC2) Container Service (ECS), as well as attributes such as machine type.
The size, scope and dynamicity of many containerized environments also demand more intelligence from container monitoring tools, said Remmelt Pit, director of strategy and engineering at Container Solutions, an application design consulting firm in the Netherlands, and the engineering muscle behind Cisco's Mantl.io microservices platform.
"With monolithic applications, if a server goes down, we care about that," Pit said. "But with microservices, the pieces are so small that we don't care about the separate containers and processes anymore—they are ephemeral."
To monitor what could be a very large environment in Mantl.io, Container Solutions has evaluated a tool from Ruxit, a division of Dynatrace. The product provides automatic network discovery, and automatically adjusts alerting thresholds, said Pit, adding that it calculates correlations between events using artificial intelligence algorithms that learn usage patterns and top calculate correlations between events.
In practice, that means "Ruxit understands repetitive behaviors like nightly backups that consume lots of resources, [but] do not trigger alerts, when they take place," he said.
Correlation equals complexity
Ultimately, the holy grail of any monitoring service isn't whether an application is up or down, or even whether the end user is a having a good experience, said Leon Fayer, vice president at OmniTI, a Web development and architecture firm.
"As someone once said to me, 'I don't give a crap if our data center is on fire as long as the business is still making money,'" Fayer said. IT metrics need to be correlated not just to one another and to end user experience, but to business metrics such as those measured by business intelligence dashboards.
Further, they need to be sorted in terms of importance. "Is it important to know that a service is going to run out of space? Yes, but it doesn't warrant getting up at 3:00 a.m."
In some ways, "we're kind of going backwards with containers," he said. What you want is for your systems and application and business metrics to all be correlated, but with containers, processes are by definition divided up in to ever smaller, self-contained units. And even though more apps are instrumented with monitoring by default, thanks to automation, too often, those apps end up in their own little silo, he said.
Meanwhile, microservices mean more services to keep track of and archive. "You want to retain data for basically forever for better correlation and to trend and predict data," he said.
It's a lot, but on the bright side, "the problem is not new," Fayer said. "It's just accentuated by the fact that we are trying to divide up big units in to smaller ones."
Five drawbacks of container technology
Preparing your infrastructure for a microservices model
Is container technology right for your organization?