Fotolia

News Stay informed about the latest enterprise technology news and product updates.

Slack outage spotlights potential SaaS risk

Relying on a third party for software as a service has its advantages, but also comes with the risk of dealing with outages like the one suffered by ChatOps tool Slack late last week.

Software as a service is all fun and games until someone loses an API.

Slack users found that out the hard way late last week as the ChatOps tool suffered its ninth outage in the last twelve months. According to the Slack Status page, the 90-minute outage began around 10 am Pacific Time on June 10, when web servers became overwhelmed with traffic, causing API failures. The fix for that issue then caused problems with file uploads and downloads on the service, which was resolved just before Noon Pacific.

Not all users of the service were impacted, and some who plan to deploy the service for ChatOps said the downtime would be an inconvenience at worst. While some companies use Slack and its integration with chat bots to issue commands for automated tasks, automation is typically performed by other tools, and in the event of a Slack outage, there are still other ways to communicate with those tools, such as command-line interfaces.

Still, the Slack outage did give some users pause about putting too many eggs in the ChatOps basket.

"[It's] not a mission critical app for us, and an outage is an annoyance," said Theo Kim, head of DevOps engineering for GoPro, a digital video camera maker in San Francisco. "I certainly wouldn’t move mission-critical workloads over to Slack in light of the number of outages."

Companies that rely too much on integrations with Slack will experience an outage "like driving on a highway, going really fast, but your windshield is obscured by a black tarp, [and] you can only see through the part that is not covered," said TJ Saotome, vice president of information technology and portfolio management for Dartmouth Research & Consulting in Boston.

"How much you can see depends on how much integration you have with external systems," Saotome explained.  "The more you have, the less you see when the API fails."

Sparking dialogue about SaaS risk

The Slack outage should make users think about ways to mitigate the situation if their favorite software as a service (SaaS) tools become unavailable, according to Jason Hand, DevOps evangelist with VictorOps, based in Boulder, Colo., and author of the book ChatOps for Dummies.

SaaS outages are a fact of life these days, Hand said. For example, Salesforce.com suffered a high-profile outage just last month. "It's going to fail, so the goal is to recover fast."

Services such as Slack are still generally reliable -- the company's own status page described last week's outage as "terrible," but its overall availability for the month still sits at 99.96%, a hair better than the service level agreement offered by Amazon EC2.

However, for the rare times when third-party services become unavailable, users should make sure they haven't become a single point of failure in their environment.

What are you really doing to yourself if you're wholly reliant on any single service?
Jason HandDevOps evangelist at VictorOps

"Understand how to interact without relying on ChatOps or Slack," Hand said. Another possible mitigation is to have a second chat service running on standby to take over in an emergency; HipChat accounts can be had for free, for example.

IT pros are preparing for the event that ChatOps tools become unavailable in disaster recovery (DR) tests, according to Elliot Murphy, CEO of Kindly Ops LLC, a managed DevOps service based in Portland, Maine. 

DR tests with Murphy's clients typically include a scenario where the chat system is unavailable, and "it's surprising how long it takes people to pick up the phone and call one another," he said. "After an exercise like that I usually see people going to update their phone contact lists with all the important numbers they should have had."

Some people also find ChatOps tools such as Slack are a good way to create audit trails for regulatory compliance purposes, and it is an inconvenience if that becomes unavailable -- but all is not lost, Hand said.

"Wherever you're interacting with your systems, it will be logged and retrievable somewhere," Hand said. Manual interaction with systems in large organizations operating at scale is a pain, but it can be done with operational priorities set ahead of a SaaS failure.

The benefits of ChatOps are still worth taking on the SaaS risk, Hand argues -- most of the time services like Slack speed things up, and outages are temporary.

Outages like Slack's "create a conversation that needs to be had," Hand said. "What are you really doing to yourself if you're wholly reliant on any single service?"

Slack did not respond to requests for comment as of press time.

Beth Pariseau is senior news writer for TechTarget's Data Center and Virtualization Media Group. Write to her at bpariseau@techtarget.com or follow @PariseauTT on Twitter.

Next Steps

IT operations tools can spark DevOps collaboration

Slack stands out among collaboration tools

Coding meets IT communication in ChatOps

Dig Deeper on DevOps Team Organization

PRO+

Content

Find more PRO+ content and other member only offers, here.

Join the conversation

4 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

How do you prepare for SaaS failure within your environment?
Cancel
http://www.thecloudcast.net/2016/05/the-cloudcast-251-it-tricks-that-saas.html
Cancel
An option to mitigate the risk is to bridge Slack rooms to other app so that you don't even lose history: that's why Matrix (http://matrix.org) exists: it's an open standard aiming to break the fragmentation between messaging apps, and provides equivalent clients to Slack (like ours, Vector - https://vector.im). You can even run you own Matrix server to keep ownership of your data.

On top of baseline messaging and file sharing capabilities like other collaboration tools provide, Vector (which is opensource) provides public and private chat rooms, is not locked by team: all conversations are accessible without needing to switch account. Every message is indexed and has a permalink to it, so easy to share information, especially given people can access rooms as guests (if the room allows), no need to create an account. Web and Android support voice and video conferencing in beta. End-to-end encryption is about to land as well as a proper UI to provision integrations into a room.

By building Vector on top of Matrix we benefit from bridges to Slack, and integrations to Github, Jira and many more to come as the overall community contributes bridges and integrations. Today we expect bridges to Mattermost, RocketChat and Skype soon, as well integrations to Jenkins, Slack Webhooks,Trello, Basecamp. Every app should be built on top of Matrix :)
Cancel
I don't know that I've ever seen any data published about internal IT availability rates, but i highlight doubt they are better than 99.9 or 99.99%, especially for Tier-2 orTier-3 applications. In most cases, the SaaS benefits outweigh an occasional outage (which are getting more and more rare). 
Cancel

-ADS BY GOOGLE

SearchDataCenter

SearchAWS

SearchServerVirtualization

SearchCloudApplications

SearchCloudComputing

DevOpsAgenda

Close