It's still true: Monitoring sucks. Server monitoring is a touchy subject, and if you've got a large quantity of servers in your arsenal, it's doubly so.
A lot of the reasons people hate their monitoring software is it gives too little and asks too much.
founder of the Assimilation Monitoring Project
Through a combination of discovery and monitoring, Alan Robertson's Assimilation Monitoring Project hopes to ease those monitoring woes. Robertson, Linux developer and founder of the High-Availability Linux (Linux HA) project, will speak at LinuxCon North America 2012 in San Diego at the end of August. In this Q&A, we discuss what this project is and how it can help.
The description for your talk mentions "cloud-scale" sets of servers. How many servers would that be? What kind of companies will benefit most from the Assimilation Project?
Alan Robertson: There isn't an obvious limit to how many servers can be monitored by a single machine. I expect it to turn out to be significantly over 100,000 servers monitored by a single server. I did some early tests on more simulated servers than that.
The Assimilation Project has two major thrusts: discovery and monitoring. The companies that would benefit most from the scalability of this project would be companies with at least a few hundred servers. Clouds come to mind as a particularly obvious target. This particular method of monitoring isn't confused by network failures, [which is] a common complaint. It also helps you find the root cause when you have a cascading failure, significantly speeding up repair.
Discovery benefits any company whose processes are less than perfect. In my experience [these are] most companies. Because we discover servers and services and dependencies, we will bring things to your attention that you aren't monitoring and then make it easy to monitor them. Most monitoring systems take months and months to configure, and a lot of that is really figuring out what there is to monitor -- discovery -- and then telling the monitoring system how to monitor them. When you're done you don't really know if you're monitoring everything, and if somehow you do get everything, you know that it won't stay that way very long.
In Assimilation Monitoring's case, it tells you what you're running, and in many cases it will already know how to monitor the processes. This makes a revolutionary change. Discovery becomes even more valuable when reorganizing or acquiring data centers or services. I know of several outsourcers who are dying for what we're building.
What are some of the issues you discovered with current monitoring software that led up to the project? Did the "Monitoring Sucks" movement play into it?
Robertson: Current monitoring software is painful to configure and tends to flake out when the network glitches. Scaling current monitoring software is typically painful and awkward at best. Because configuration is difficult and monitoring often isn't a high priority, configurations are often out of date and things get missed.
On the discovery side, there is very little open source discovery software out there. Most discovery requires an intrusive batch process that in many cases isn't even permitted to run because of the security concerns it raises. If you have discovery software that you're allowed to run, it has probably been a while since you did, and it probably doesn't integrate with your monitoring software. We have continuous integrated Stealth Discovery™ which makes these issues just go away.
We're motivated by the same things as Monitoring Sucks, and I know some of the people involved. Although I started out with this as a pure monitoring project, I realized that a lot of the reasons people hate their monitoring software is it gives too little and [asks for] too much. By integrating discovery with monitoring, it now can ask less of you and give you more intelligent results.
Without giving away too much of your talk, what is the Assimilation Monitoring Project? Is it hinged on a particular Linux distribution, and is the project itself open source?
Robertson: The Assimilation project is about tightly integrating monitoring and discovery to create a [system] much easier to set up, use, and maintain [and] that can be more helpful and more accurate in helping you diagnose and repair problems. System administrators are really the heroes of IT shops.
There are two parts to the project: the collective management authority and the agents, or nanoprobes. The nanoprobes are intended to eventually run everywhere, even Windows. The management function should be able to run on any Linux distribution. All these pieces are open source. Those who know my Twitter handle (@OSSAlanR) might already have guessed that. The collective management function is written in Python and uses the Java-based Neo4j graph database.
Tell us a bit about your background. Are you still with IBM? How did you come to work on this project?
Robertson: I've developed for some form of UNIX my whole career -- 21 years with Bell Labs, a year at SuSE and 11 years at IBM, where I still work today. I've done development nearly my entire career and managed computer systems for about 10 years at Bell Labs. I founded the Linux-HA project in 1998 and led it for about 10 years. The software we wrote there provides industry-leading capabilities and reliability. Every Linux distribution provides it. I've spoken at [about] 25 conferences all over the world, and I've been very active in the Linux community. I started this project as personal-time-only open source project in early 2011.
I speak every year at the LISA conference for system administrators. I spent 10 years managing systems for Bell Labs and 10 years creating and leading the Linux-HA project, so I have a natural affinity for sysadmins, DevOps and system management issues. My last two years or so have been working with some incredibly large one-of-a-kind supercomputers, which is what led me to thinking about scaling in a new way. When I realized I had a great solution to scalability in monitoring, I decided I had to build it since no one really believed that it would work the way it does. After I got started, I realized I needed to do some discovery of switches for the scalability to work like I want it to.
That got me thinking about what else could be discovered without putting my users in "security jail" for running my code. Over time, it became apparent that tightly integrating discovery made the project much more useful [for] a far larger audience. I know it's a cliché, but when you integrate monitoring and continuous discovery, the whole is more than the sum of its parts.
A while back I had a conversation with an architect of a major commercial monitoring and discovery package. He thought, [in a dismissive way, that] what I was talking about was "magic." But it does work exactly like we say it does, and although it seems like magic, it isn't. In my LinkedIn profile I wrote, "If someone thinks it's impossible, then I'm probably interested." That comment really speaks to what a great thing [it is that] we're doing. I really love doing hard stuff and making it reliable and easy to use. I can't tell you how much encouragement I've gotten from the system administrators I've talked to and how much that means to me.
What do you hope people will take away from your talk?
Robertson: First and foremost, I want them to walk away inspired by a greater vision of what monitoring can be. It can not only completely avoid "suckage," but it can be an incredible help. My foremost goal for the project is to make heroes out of our users and their managers.
Beyond the inspiration, I want them to come away with:
- A crystal-clear understanding of exactly what the project is (and why it's so cool);
- A clear understanding about what discovery is and how it can benefit them;
- Knowing what role this software should take in their organization; and
- Knowing what their role in the project should be.