It's no small task to transform an IT organization to integrate development, operations and quality assurance teams. A DevOps methodology requires team and process changes and then, once everything is in place, the onus is on IT to create DevOps metrics and measure the outcomes.
The key to a productive DevOps program is effective and comprehensive measurement and monitoring. Create a detailed game plan to understand how processes and projects work and how to improve them over time.
DevOps metrics are complicated by the fact that this methodology has no formal framework, thus the definition of progress can vary between organizations. However, there are common metrics and key performance indicators that should keep a DevOps implementation on track.
Innovation begets rapid technology changes. Track code deployment frequency for a clear picture of how quickly new features and capabilities roll out. This metric should remain stable or trend upward over time. A decrease or stutter could indicate a bottleneck somewhere within the DevOps team structure.
Change lead time
Determine how long it takes a change such as a bug fix or new feature to go from inception to production. Long lead times could suggest inefficient processes that inhibit change implementations. This DevOps metric should be a starting point to discover patterns that signal complexity that is causing a bottleneck in a given area.
Change success or failure rate
Changes must roll out smoothly, not just frequently. Keep the failure rate for changes deployed into production applications as low as possible. Develop a system to track the success and failure of changes. A high rate of change failure affects the application's end users, and requires that admins invest additional time to troubleshoot issues and fix bugs rather than accomplish high-value initiatives.
Mean time to detection
Don't measure DevOps metrics in isolation. A low change failure rate isn't good enough if it takes too long to detect a problem. For example, if the MTTD is thirty days, that means it could take a full month to diagnose an issue that causes failure rates to rise. MTTD should decrease over time as DevOps processes mature. If MTTD is high or trends upward, expect the bottlenecks causing these existing delays to introduce additional congestion into the DevOps workflow later down the road. Fast detection is a boon to security as well, as it should minimize an attack's reach.
Mean time to recovery
MTTR is another DevOps metric that admins should keep as low as possible. Eliminate issues once you become aware of them. DevOps organizations follow the principle that frequent, incremental changes are easier to deploy, and easier to fix when something goes wrong. If a release includes a high degree of change and complexity, it becomes more difficult to identify and resolve issues within it.
How often do end users take advantage of the product's or service's features? If your DevOps team spends the time and effort to create and implement new code into production, make sure that those features are valuable to the target population. BizDevOps reinforces the value of iteration to meet user demand as quickly as possible. If feature usage is low or declining, reevaluate the prioritization factors with management to determine a better feedback loop from users.
Rate of security test passes
Vulnerability scanning is not a common test case within a release pipeline. Determine the test's viability and implement whenever possible. A failed security test during the build phase could prevent some releases from going into production. If code releases consistently fail vulnerability tests, this DevOps metric suggests that teams are ignoring secure practices upstream and ultimately must correct those issues.
Application uptime is an important metric for every IT organization. Service-level agreements require that the infrastructure, services and supporting applications meet a high goal of availability. Services should be online as much as possible, which creates uptime goals as high as 99.999%.
An unexpected influx of end users can create performance issues at the infrastructure layer -- turns out you can get too much of a good thing. Storage bottlenecks, CPU spikes, high memory consumption and network latency are all common side effects of a surge in application use. Closely monitor these standard performance aspects of the servers that support an application. Increasing volumes of end users can require additional infrastructure to be built in. However, performance decreases without additional end-user requests could indicate that bugs or inefficient changes from development and release are bogging down the app. Verify and correct these as they occur to enable high availability and a good end-user experience.
Make BizDevOps happen in your org
How mature are your IT ops processes?
Follow this guide for an IT overhaul