IT incident management is a telling measure of how well the IT team functions, but generally goes without praise and recognition. Users often don't notice good work, but poor incident responses create outrage, angry email chains and a lot of questions for IT operations.
Most organizations have room for improvement in their IT incident management process. Focus on three stages: troubleshooting sessions, IT incident reports and the postmortem of substantial issues.
Many of these ideas are covered in the IT Infrastructure Library, and each organization should choose the parts that work for them.
What went wrong?
This is where it all starts -- a help desk ticket, phone call or email alert. Review all your methods of incoming communication in a timely manner. Regardless of how many times you tell people to call if it's urgent, they'll still use email instead. A ticketing system that automatically logs incoming email as an incident -- and that responds with a comment to call for urgent issues -- can improve communication.
Set and communicate expectations as a fundamental part of the initial IT incident response.
Incident management should also take the requestor into account. Try to accommodate their communication preferences, whether they demand a rundown of what's going on or simply expect to hear when an issue is fixed. Diagnose an IT incident with as much research and remote management as possible, so long as it doesn't affect users' work.
Some admins are better at the art of troubleshooting root causes than others. Regardless, everyone should follow an IT incident management process to eliminate possible causes more efficiently.
Envision a game in which players guess a number between 1 and 100. With each guess, they're told to aim higher or lower. This information narrows the focus of the next guess. For example, if a computer can't see network drives, check if it has any network connectivity at all. The answer can help you hone in on an understanding of the issue and how to resolve it.
What have we learned?
An IT staff empowered to learn and collaborate with each other inherently develops a better IT incident management process.
The more you know about IT, the more aware you are of what you don't know. A team that shares knowledge and helps each other out is better equipped to solve IT issues more quickly. Make a knowledge base part of your IT incident management toolkit: Wikis are common, and many teams rely on a collaborative chat tool, such as Slack or Microsoft Teams, but even basic email conversations can spread the word and make it easier to fix an issue next time.
Analyze any incidents that have a significant effect on the business. Post-mortem analysis isn't a witch hunt looking to place blame; it's an investigation to see what failed and why, and to discuss what approaches may have worked better. Report findings, including the measurable effect of the IT incident, how it was remediated and how it should be avoided in the future, to the business side.
Systems and staff are ever-changing, so IT incident analysis must be an ongoing process. Think of it as operational maintenance for the service desk.
How can we handle that better?
An ideal IT incident management process includes a way to analyze how you track issues and where to improve response processes. But a system with all the bells and whistles often isn't feasible because of costs and maintenance overhead.
Frequent meetings are a great time to discuss people's thoughts and experiences with incidents. Meet internally with the IT team, and see if users are willing to participate, as well.
At the least, infrequent surveys can gauge the general impression your user base holds regarding IT responses. Surveys enable users to provide constructive criticism of the IT incident response process.
Feed issues back to other IT areas, and other departments, to improve IT operations overall. If the engineering or development teams aren't aware of bugs, technical issues and end-user experience problems, they can't be expected to do things differently next time.
Fixing the problem is a great start. IT shops attain higher maturity when they push change back into engineering processes as a result of incidents in production. This feedback loop is a core tenet of DevOps culture.