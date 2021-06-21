The work around an IT platform can be separated into two types: work that adds value to the business and work that keeps the platform running. The aim of an IT operations team should be to maximize the amount of the former while minimizing the time and cost spent on the latter.

Work that keeps the IT platform running has many different names -- "keeping the lights on" is one -- but one term, toil, is growing in acceptance. The way to reduce toil is to adopt site reliability engineering (SRE).

Toil covers tasks such as patching, updates, firefighting issues and replacing broken parts. For most IT workers, toil is mind-numbing work with little technical attraction -- and is unlikely to ever result in plaudits from IT management, never mind business management.

Google -- where SRE originated -- advises that a specific SRE team should be set up to minimize toil. For many organizations, this might not be attractive or possible. Instead, they can embed the main concepts around toil minimization through implementation of SRE approaches into development and operational teams.

The aim should not be to completely eliminate toil -- there will always be tasks that cannot be magicked away. Certain work, such as better application of systems management tools or a more efficient platform monitoring approach, add no direct value to the business -- but are not toil. They lay the foundation for improved toil management and bring discernible value to the business over time as the platform becomes more reliable through improved engineering.

Frighteningly, and as recently as a decade ago, the costs of toil could be up to 80% of an IT operations group's budget. Increasing equipment reliability, improved systems management tools and the increased use of automation have reduced the costs -- but they remain a strong approach.

Even if an IT group has reached a 50/50 split between toil and value-added IT work, a move to 40/60 is a 20% increase to the IT budget allocated to value-add -- without any change in the overall IT funding. At the 80/20 level, a 10% shift means adding a half to actual value-add spend.