This column is the second in a series on bringing enterprise-grade, open source network and systems management to your IT operations using the OpenNMS platform.
In the first installment, we configured OpenNMS for service assurance. You might have found that in addition to the notifications you configured, OpenNMS produces resource graphs from historical data for the response-time latency of certain monitored services (Figure 1). This information is helpful for spotting trends in services' responsiveness, but it's a fairly symptomatic approach to diagnosing a “disease.”
OpenNMS also provides a performance management subsystem that lets you look more deeply into system statistics to find the cause of performance degradation, sometimes even helping to spot developing issues before they affect services. Some kind of management “agent” technology is necessary to gain access to performance data. OpenNMS supports several protocols for performance management. This column focuses on Simple Network Management Protocol (SNMP), which is supported by almost every server operating system and managed network device.
Getting started with SNMP in OpenNMS
SNMP comes in three protocol versions: SNMPv1, SNMPv2c and SNMPv3. Some older and embedded devices support only SNMPv1, and some high-security environments require SNMPv3 for its enhanced security model, but SNMPv2c strikes a good balance between capabilities and complexity
To get started with performance management, you'll need to install and configure the SNMP agent and tools on your servers' operating system. For the sake of brevity, we'll continue limiting our managed nodes to servers running Red Hat Enterprise Linux 5 or CentOS 5, which include the Net-SNMP agent and tools in the default repositories. Use the YUM package manager to install these packages:
yum install net-snmp net-snmp-utils
You'll need to edit the file /etc/snmp/snmpd.conf to set up a read community string (like a case-sensitive password for requesting data via SNMP), to open up the default view of the management information base (MIB) and to enable monitoring of filesystem capacity. To set the community string, find the following lines near the top of the snmpd.conf file:
# sec.name source community
com2sec notConfigUser default public
And change the default “public” to a value that only you know. Community strings cannot contain spaces, and it's best to avoid the @ symbol. Remember your choice, as you will need it again soon.
Next, open up the default view exposed by Net-SNMP to include the whole MIB tree. Find the following two lines in the snmpd.conf file:
view systemview included .188.8.131.52.2.1.1
view systemview included .184.108.40.206.220.127.116.11.1
And add a third line like this below them:
view systemview included .1.3
Finally, tell the Net-SNMP agent to monitor and expose the capacity of the file systems you care about. To monitor each filesystem’s capacity, add a line to the bottom of the snmpd.conf file that follows the format “disk [mount-point].” For example, if the root (/), /var and /home filesystems are mounted on separate block devices, you might add the following three lines:
Save your changes and restart the Net-SNMP daemon service by typing:
service snmpd restart
If the system firewall is enabled on your servers, you'll also need to configure it to allow traffic to UDP port 161 (the standard SNMP port) from your OpenNMS server. On Debian and Ubuntu systems, you'll probably need to tell snmpd to listen on interfaces other than local loopback (127.0.0.1). This setting has long been in /etc/default/snmpd, but in Ubuntu 10.10 it's moved into /etc/snmp/snmpd.conf.
The cruel joke of Simple Network Management Protocol (SNMP)
Some users say the “simple” in SNMP's name is a cruel joke, as it takes considerable effort to understand its workings. The seemingly misleading term actually refers to the protocol's simplicity compared with the competing (and now largely extinct) Common Management Information Protocol, which is truly complex.
Now you'll need to tell OpenNMS which community string you picked for SNMP access to your servers. In the OpenNMS Web user interface, go to Admin → Configure SNMP Community Names by IP. In the resulting form, enter the starting and ending range of IP addresses for which the community string is valid and the community string itself. Choose v2c for the protocol version, and leave the timeout and retry values alone unless your network has many slow links. Submit the form and then tell OpenNMS to rescan one of your newly SNMP-enabled servers. From the node's detail page (accessible via the Node List link in the main navigation strip), click the Rescan link and confirm that you want to rescan the node. After a few minutes, the rescan will complete and you should see a new SNMP Attributes box (Figure 2) in the node's details page:
If you see this box and got the above MIB view changes correct, you've completed the hard work. OpenNMS makes the decisions automatically about which SNMP data to collect from which kinds of nodes. Within five minutes, you will be able to visualize the collected SNMP performance data (Figure 3 and 4) by clicking on the Resource Graphs link in the node's detail page. Select Node-level Performance Data plus the entries for the network interface names and filesystem mount points that interest you and click Graph Selected Resources.
You can easily arrange these resource graphs into Key SNMP Custom (KSC) reports to present data about a number of disparate resources in a single view. OpenNMS can also do Top-N and Bottom-N statistical reporting on any single performance attribute across any or all nodes in its database for a specified time period (Figure 5).
OpenNMS performance management benefits
While resource graphs and statistical reports use historical data stored in disk files, the thresholding subsystem of OpenNMS compares the value of a performance attribute (or expression involving several attributes) in real time against a user-defined threshold. When a threshold is exceeded, OpenNMS creates an event that can trigger a notification. With notifications, you’ll immediately know when a dodgy nightly batch job has run amok and is about to fill your server's disk again. You’ll be able to free some space before the job fails and you’ll have the eternal gratitude of the application administrator.
OpenNMS performance data also proves useful in the new executive reporting feature. This exciting feature allows performance data, as well as latency data from service monitoring, to be presented with event, alarm and outage data in PDF reports built using the JasperReports stack in OpenNMS.
OpenNMS's performance management capabilities are a powerful complement to the service assurance features covered in the previous tip. In the next installment, we'll shift our focus and build a fully free and open source log aggregation server using the syslog-ng package, but we'll later return to OpenNMS and its event management subsystem in this series.
What did you think of this feature? Write to SearchDataCenter.com's Matt Stansberry about your data center concerns at [email protected].