A modern IT monitoring strategy can bring together developers and IT ops pros for DevOps incident response, but...
tools can't substitute for a disciplined team approach to problems.
Dev and ops teams at Nasdaq Corporate Solutions LLC adopted a common language for troubleshooting with AppDynamics' App iQ platform. But effective DevOps incident response also demanded focus on the fundamentals of team building and a systematic process for following up on incidents to ensure they don't recur.
"We had some notion of incident management, but there was no real disciplined way for following up," said Heather Abbott, senior vice president of corporate solutions technology, who joined the York-based subsidiary of Nasdaq Inc. in 2014. "AppDynamics has [affected] how teams work together to resolve incidents ... but we've had other housekeeping to do."
Shared tools in an IT monitoring strategy renew focus on incident
Nasdaq Corporate Solutions manages SaaS offerings for customers as they shift from private to public operations. Its products include public relations, investor relations, and board and leadership software managed with a combination of Amazon Web Services and on-premises data center infrastructure, though the on-premises infrastructure will soon be phased out.
In the past for its IT monitoring strategy, Nasdaq's dev and ops teams used separate tools, and teams dedicated to different parts of the infrastructure also had individualized dashboard views. The company's shift to cross-functional teams, focused on products and user experience as part of a DevOps transformation, required a unified view into system performance. Now, all stakeholders share the AppDynamics App iQ interface when they respond to an incident.
With a single source of information about infrastructure performance, there's less finger-pointing among team members during DevOps incident response, which speeds up problem .
"You can't argue with the data, and people have a better ongoing understanding of the system," Abbott said. "So, you're not going in and hunting and pecking every time there's a complaint or we're trying to improve something."
DevOps incident response requires team vigilance
Since Abbott joined Nasdaq, incidents are down more than 35%. She cited the IT monitoring tool in part, but also pointed to changes the company made to the DevOps incident response process. The company moved from an ad hoc process of incident response divided among different departments to a companywide, systematic cycle of regular incident review meetings. Her team conducts weekly incident review meetings and tracks action items from previous incident reviews to prevent incidents from recurring. Higher levels of the organization have a monthly incident review call to review quality issues, and some of these incidents are further reviewed by Nasdaq's board of directors.
Heather Abbottsenior vice president of corporate solutions technology, Nasdaq
And there's still room to improve the DevOps incident response process, Abbott said.
"We always need to focus on blocking and tackling," she said. "We don't have the scale within my line of business of Amazon or Netflix, but as we move toward more complex microservices-based architectures, we'll be building things into the platform like Chaos Monkey."
Like many companies, Nasdaq plans to tie DevOps teams with business leaders, so the whole organization can work together to improve customer experiences. In the past, Nasdaq has generated application log reports with homegrown tools. But this year, it will roll out AppDynamics' Business iQ software, with its investor-relations SaaS products, to make that data more accessible to business leaders, Abbott said.
AppDynamics App iQ will also expand to monitor releases through test, development and production deployment phases. Abbott said Nasdaq has worked with AppDynamics to create intelligent release dashboards to provide better automation and performance trends. "That will make it easy to see how system performance is trending over time, as we change," he said.
While Nasdaq mainly uses AppDynamics App iQ, the exchange also uses Datadog, because it offers event correlation and automated root cause analysis. AppDynamics has previewed automated root cause analysis based on machine learning techniques. Abbott said she looks forward to the addition of that feature, perhaps this year.