Get started with threshold monitoring

IT monitoring doesn't have to be difficult to set up and use. Learn how to set thresholds and dashboards, know when and how to escalate responses, and keep IT systems humming along.

Adam Bertram

Published: 27 May 2020

Monitoring metrics empower an IT team to both optimize and secure services. Thankfully, monitoring platforms offer plenty of customization to fit your organization's needs.

As part of that, there are some crucial considerations to keep in mind when designing threshold monitoring for your IT tool.

Active monitoring has its value, but a great platform makes your involvement much more passive. Thresholds are synonymous with automation. Once you've established your system's steady state -- or baseline operating conditions -- you can configure thresholds for core system performance indicators. These can include the following:

process activity
user activity
CPU load
memory consumption
file disk usage
inactivity
errors and error logging
login activity

As we can see, threshold monitoring isn't only about performance. It's also about access. You want to know exactly how well your services perform and who's using them. The rule sets you'll write will relate to the above categories. Thankfully, defining your threshold sets is straightforward:

Define whether you want your rules to be driven by thresholds (resources) or events (errors and logins). Your monitoring tool will trigger notifications if these thresholds are crossed.
Group these written rules into rule sets, based on categorizations and device-specific deployments.
Assign sets to various devices as needed.
Update your agents to ensure these rule sets are active.

Once you establish these notifications, how are they delivered? The two most popular ways are via email and Simple Network Management Protocol traps. The latter method, which you configure with a network management tool, sends internal emergency messages from impacted devices to ensure your system -- and you -- are in the know.

This article is part of

The definitive guide to enterprise IT monitoring

Which also includes:
Compare 8 tools for IT monitoring
Improve container monitoring with these strategies and tools
How to respond to 4 common IT alerts

Download1

Download this entire guide for FREE now!

The power of alerts and dashboards

Monitoring thresholds are useful for identifying abnormal spikes or dips in activity, while alerts tell us when our systems can't automatically recover. Two things are key: human intervention and relevance. For example, you should configure your alerts so that they reach actual members of your team and those notified must be able to tackle the existing problem. Also, system issues must be severe enough to warrant these notifications; otherwise, you'll spam your DevOps team.

Dashboards are the lifeblood of your environment. They're the simplest way to visualize performance, and they keep tabs on traffic, resource consumption and usage patterns.

Comparatively, thresholds grant us greater overall status monitoring -- that's their intention. While thresholds tell us something problematic might happen, alerts confirm this beyond a shadow of a doubt. That said, severity does vary. We often assign priority levels to alerts. External alerts for user-facing failures typically take precedence over internal hiccups. When a system outage grows in scale, you might need more hands on deck for remediation. Widespread problems like those, which require formidable levels of intervention, will carry higher priority than their smaller counterparts.

Dashboards are the lifeblood of your environment. They're the simplest way to visualize performance, and they keep tabs on your prioritized metrics, such as traffic, resource consumption and usage patterns.

You might want to break down your monitoring at the application level. On the other hand, chances are high that you've deployed Kubernetes atop your microservices. You can configure different dashboards to examine both levels of your environment -- observing both the forest and the trees within it.

Start small with application monitoring

Application monitoring is the most granular approach, since you can rapidly apply features, bug fixes and security updates. This is where CI/CD come into play. The idea is that your applications are always changing, and these dashboards can help that for the better.

Popular tools such as Jaeger and OpenCensus provide numerous functionalities to further those goals. These types of monitoring platforms enable the following:

tracing and response time analysis
integration with service meshes and cloud networking resources
host data inspection
exporting
interfacing with other third-party platforms

This is what we'd consider the micro level, where you can really dig in and unearth the smallest of optimization gremlins.

Oversee an entire DevOps environment

When you're managing multiple containers or networking systems simultaneously, a rich dashboard experience is vital. It's not easy to oversee all aspects of your ecosystem effectively, especially at scale, to keep services running smoothly. With platform oversight, there's a lot to process, and it can be difficult to manually organize all that data cohesively.

Before you can analyze any data, you must ingest it. The Prometheus product scrapes pertinent information from your nodes and containerized applications, pulls these metrics into the dashboard and displays them in legible blocks. Prometheus is especially useful at the node level to diagnose failures.

Microservices run on distributed systems. Resources are provisioned, often across servers, which are then contacted via clients (users). Every request uses some degree of system resources, so it's useful to pick a tool that scales readily with activity. Grafana, for example, offers a panel-driven observation environment. DevOps teams can share these panels and capture systemic performance at certain moments in time (snapshots). This is helpful to track response time, volumes and network traffic. Here are some other dashboard visualizations you should employ:

line charts
heat maps
gauges
flame graphs

Consider open source tools, which play nicely with other tools and are add-on-friendly. That could be crucial as your services grow and priorities shift.

Don't be fooled into thinking this is a one-size-fits-all approach. These are highly customizable, while taking the guesswork out of configuration. They're also scalable. You won't have to constantly seek new tools as your system evolves.

Remember, monitoring doesn't have to be difficult.

Next Steps

AWS monitoring best practices extend beyond CloudWatch

Compare Grafana vs. Datadog for IT monitoring

Learn how New Relic works, and when to use it for IT monitoring

Get started with threshold monitoring

IT monitoring doesn't have to be difficult to set up and use. Learn how to set thresholds and dashboards, know when and how to escalate responses, and keep IT systems humming along.

The definitive guide to enterprise IT monitoring

The power of alerts and dashboards

Start small with application monitoring

Oversee an entire DevOps environment

Next Steps

Dig Deeper on IT systems management and monitoring

Index Engines' customized alerts aim at detecting bad actors

The definitive guide to enterprise IT monitoring

Compare Amazon CloudWatch vs. AWS CloudTrail

real-time monitoring