5 prerequisites for applying AI to an ops environment

For organizations that want to apply AI to their ops environments, IT teams must precede the starting line with an understanding of what AI can do -- and what they want from it.

Clive Longbottom

Published: 22 Feb 2021

Artificial intelligence is the latest solution to all our problems. While some measure of follow-through on this promise is possible, there is still the need for a strong degree of actual intelligence to ensure that an AI tool will deliver viable insight.

When it comes to applying AI to the operational environment, follow these five steps to ensure meaningful results.

Know what you want

AI is not magic: It can do some clever things, but it needs guidance. IT teams must understand what they want from AI so they aren't overwhelmed by tons of data output. For example, is the goal to improve performance, or for higher availability? Do they want to ensure that apps and services are maintained with patch levels, or are they more interested in tracking memory leaks and rogue network issues? AI can help achieve all this -- but not all at once. IT organizations must prioritize their needs and wants to work through them one at a time until their environment carries them out automatically.

Create a baseline

To make an AI tool work in an operational environment, IT teams need to know what is already there. Use tools for discovery to create databases of what applications, apps and services run on what physical and virtual environments. Provide granular data around the versions and patch levels of OSes, applications and services. Also, create data on areas such as device driver versions and patch levels. This data shows what actions the AI is advising and what those changes might mean to the environment.

Allow the tool to run and learn over time

AI cannot provide direct value immediately. It must learn its environment and understand what is happening across the platform before it can apply any intelligence. A platform will likely have multiple cyclical workloads -- daily login floods; daily, weekly and other regular report runs; various, regular maintenance jobs; and end-of-month and quarterly financial reconciliations. Allow the AI tool to learn across at least two full cycles of major workloads when possible. For example, do not run the tool in this "learning" mode for two years to cover the financial closure of the former, as the data volume is too large.

Use human intelligence to determine how long the AI learns: For most organizations, a couple months should cover most workloads. However, where the quarterly workloads might affect the platform, consider throttling back the AI's actions during the initial runs of longer cycle workloads. This ensures that the AI does not see these workloads as problems that require fixes, such as workload resource throttling.

Don't fully automate straightaway

AI's siren song is that it removes human input from the workflows, and thus speeds up processes and removes errors. Simple automation can also do this; however, AI applies more complex actions. To start, ensure that any AI tool runs in an advisory mode, wherein it reports what actions it thinks would work best -- but does not take any of those actions. Check -- and double check -- those suggestions and initiate any changes manually. Install break points along the process so that any adverse effects can be stopped as soon as possible. This is crucial -- at least in the early stages -- as the tool will not yet know what changes are normal, and might confuse itself as it tries to make changes to its own environment.

Create fallback positions at known points

Even when AI is ready to run as a fully automated tool, things might still go wrong. AI tools should maintain rollback positions in the event of a failure. However, do not rely on just these recovery points. Create full backups with snapshots so admins can revert to a known position manually.

What next?

Applying AI tools in an operations environment can minimize human intervention and fully automate many actions to optimize said environment. However, AI can also completely trash a platform if it lacks the knowledge and capacity to understand the interdependencies between the platform's various aspects. Therefore, the AI tool cannot deal with the outcomes of its own actions.

Beware the AI marketing hype. AI is still an emerging capability: In human terms, many AI tools are as smart as a toddler learning to talk, and the majority are similar to a teenager mumbling "whatever" every now and again. The tools will continue to evolve -- rapidly. For now, apply real human intelligence and use AI tools more as Augmented intelligence than artificial intelligence.

Dig Deeper on Systems automation and orchestration

Part of: AIOps tool adoption decisions

Up Next

5 prerequisites for applying AI to an ops environment

For organizations that want to apply AI to their ops environments, IT teams must precede the starting line with an understanding of what AI can do -- and what they want from it.

Evaluate open source vs. proprietary AIOps tools

Cut through the market hype surrounding AIOps to find the tool that suits your needs, comparing the pros and cons of open source vs. proprietary tools.

5 critical features that put the 'AI' in AIOps tools

Don't fall victim to AI-washing in the IT systems management market. Instead, know what to look for in a truly 'intelligent' operations platform -- starting with these 5 capabilities.

What are the cost considerations when buying AIOps tools?

AIOps tools can reduce overhead for IT staff, but first, enterprises must decide how they will use the tool to know which features to budget for.

5 prerequisites for applying AI to an ops environment

For organizations that want to apply AI to their ops environments, IT teams must precede the starting line with an understanding of what AI can do -- and what they want from it.

Know what you want

Create a baseline

Allow the tool to run and learn over time

Don't fully automate straightaway

Create fallback positions at known points

What next?

Dig Deeper on Systems automation and orchestration

6 machine learning applications for data center optimization

Manage complexity in Kubernetes with AI and machine learning

Rubrik makes ransomware a focus for its cloud backup SaaS

Designing and building artificial intelligence infrastructure