IT monitoring

Contributor(s): Alex Gillis

IT monitoring is the process to gather metrics about the operations of an IT environment's hardware and software to ensure everything functions as expected to support applications and services.

Basic monitoring is performed through device operation checks, while more advanced monitoring gives granular views on operational statuses, including average response times, number of application instances, error and request rates, CPU usage and application availability.

How IT monitoring works

IT monitoring covers three sections, called the foundation, software and interpretation.

Foundation. The infrastructure is the lowest layer of a software stack and includes physical or virtual devices, such as servers, CPUs and VMs.

Software. This part is sometimes referred to as the monitoring section and analyzes what is working on the devices in the foundation, including CPU usage, load, memory and a running VM count.

Interpretation. Gathered metrics are presented through graphs or data charts, often on a GUI dashboard.

IT monitoring can rely on agents or be agentless. Agents are independent programs that install on the monitored device to collect data on hardware or software performance data and report it to a management server. Agentless monitoring uses existing communication protocols to emulate an agent, with many of the same functionalities.

For example, to monitor server usage, an IT admin installs an agent on the server. A management server receives that data from the agent and displays it to the user via the IT monitoring software interface, often as a graph of performance over time. If the server stops working as intended, the tool alerts the administrator, who can repair, update or replace the item until it meets the standard for operation.

Real-time vs. trends monitoring

Real-time monitoring is a technique IT teams use to determine the active and ongoing status of an IT environment through constant data collection and access. Measurements from real-time monitoring software depict data from the current IT environment, as well as the recent past, which enables IT managers to react quickly to current events in the IT ecosystem.

Historical monitoring data enables the IT manager to improve the environment or identify potential issues before they occur, because they identify a pattern or trend in data from a period of operation. Trend analysis takes a long-term view of an IT ecosystem to determine system uptimes, service-level agreement adherence and capacity planning. 

IT infrastructure monitoring

IT infrastructure monitoring is a foundation-level process that collects and reviews metrics concerning the IT environment's hardware and low-level software. Infrastructure monitoring provides a benchmark for ideal physical systems operation, therefore easing the process to fine-tune and reduce downtime, and enabling IT teams to detect outages, such as an overheated server.

Server and system monitoring tools review and analyze metrics, such as server uptime, operations, performance and security.

Network metrics are included in IT infrastructure monitoring. Network monitoring seeks out issues caused by slow or failing network components, or security breaches. Metrics include response time, uptime, status request failures and HTTP/HTTPS/SMTP checks.

Application performance monitoring

Application performance monitoring (APM) gathers metrics on software application performance based on both end user experience and computational resource consumption. Examples of APM-provided metrics include average response time under peak load, performance bottleneck data and load and response times.

Available IT monitoring tool options

Vendors differ in terms of systems monitored, monitoring type supported, agent support versus agentless capability and metrics presentation. Some APM vendors also offer IT infrastructure monitoring capabilities, and vice versa, while other tools are designed specifically to watch over the network, or CPU performance, and so on.

This is an incomplete list to show examples of various monitoring tool types:

Microsoft System Center Operations Manager (SCOM) can monitor both infrastructure and application performance in real-time. SCOM can use both agent and agentless management, integrate into Windows operating systems (OSes), and monitors server hardware, OS performance, hypervisors and applications.

Datadog is a service that monitors both applications and infrastructures in real-time. Datadog utilizes agent-based monitoring. It automatically collects and analyzes logs, error rates and latency, alerting users to abnormalities via communication methods such as email, Slack or PagerDuty.

The open source software Nagios monitors both infrastructure and software. Nagios users can gather metrics on applications, networks and server resources with or without the use of agents.

This was last updated in May 2019

Continue Reading About IT monitoring

Dig Deeper on Real-Time Performance Monitoring and Management

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

Are IT operations staff or developers responsible for application performance monitoring in your company?


File Extensions and File Formats