NicoElNino - Fotolia
CIOs often lament the number of people and the portion of their budget devoted to keeping the lights on. These are the tasks of IT operations -- necessary, but not visible.
CIOs would prefer to take charge of innovative projects that bring high value to their organization, but they're judged by the unglamorous and unexciting uptime and performance stats of underlying computer systems, especially systems tied to revenue generation. Keeping the lights on is quite important to people who don't want to stare at a blank computer screen. But there's more than one way to do it.
The emergence of AIOps
AIOps tools use AI to monitor and manage environments under the direction of the operations team. AIOps upends cloud and IT operations through changes to the entire process to make it more proactive, predictive, prescriptive and personalized.
Proactive. Humans can monitor systems and anticipate problems, but there simply aren't enough skilled people available to cover an enterprise's entire environment all the time. And DevOps or IT admins likely won't be experts in all the apps and systems necessary to determine the root cause of a given problem. A human-driven approach is manual and reactive.
Cloud and IT are fertile ground for AI and machine learning algorithms. Every user, physical or virtual device, and application in the IT environment generates data in the form of logs, events, metrics and alerts. This data is collected by AIOps tools to reflect the health status of systems: An application's performance slows; a database runs smoothly; a printer overheats; all WAN connections are up; a user is locked out; and countless other minute details generated 24 hours a day, every day of the year. AI and machine learning systems learn the IT environment and then use it, over time, to drive AIOps activity proactively with little to no human intervention.
AI and machine learning can augment human effort on mundane tasks, which frees up admins to do more significant, high-value work that requires their intelligence.
Predictive. Predictive AIOps tools detect a potential oncoming failure and suggest a corrective course to fix it and avoid downtime -- for example, reboot a server or patch an application. By contrast, unintelligent monitoring systems must catch a failure occurrence after the fact, alert IT ops and support subsequent diagnosis and resolution.
Prescriptive. AI-driven cloud and IT operations solve or avoid a problem through root cause diagnosis and resolution suggestions. For example, an AIOps tool could send an event alert about an unstable wireless router to a systems administrator's or a network engineer's dashboard with data on the potential problem and particular recommended actions. If it is left unresolved, users will lose network connectivity. The AIOps tool predicted this outage and recommended a restart of the wireless router. The admin verifies the situation and restarts the router. With the aid of the AIOps tool, users experienced minutes of downtime, instead of days or longer under the old process of a reactive action.
An AIOps tool's prescriptive suggestions for issue resolutions can, of course, end up off-base. Stories of AI gone horribly wrong abound after all. Humans help train the system through feedback on the prescribed fix's accuracy and efficiency. This important feedback loop between admins and software that can learn from them helps improve the system's accuracy for the future. An admin who disapproves of a prescribed action can tell the tool about what they did instead to resolve the problem. The more information an admin provides about the root cause of a given problem, the more accurately the AIOps tool works. The next time this issue arises, the system is better prepared to offer a helpful suggestion.
Personalized. Every company has a unique IT environment. One enterprise uses a major public cloud provider, such as AWS, Microsoft Azure or Google Cloud, and runs Cisco routers and Dell servers; another has Juniper network gear, IBM and Hewlett Packard Enterprise servers, and so on. An AIOps tool must learn the environment in which it operates, and it does this by absorbing the full environment's data: logs, events, metrics and alerts.
AIOps should drive down a key metric that every service desk lives by -- mean time to repair -- by reducing how long it takes to identify and fix problems, thereby increasing customer satisfaction and service uptime.
AIOps can supplant -- or at least complement -- IT staff members who spend too much time on mundane tasks, such as systems monitoring, alert response, problem diagnosis and course of action determination. If technology can do those things for humans, staff hours can be devoted to higher-value work, and lower-level IT operations jobs can be cut. This helps solve the problems of skilled IT worker shortages and high turnover in entry-level, less stimulating positions.