I’m sure we can all agree that when an IT organization experiences a system disruption or outage, the sooner the right people know about it, the sooner the issue will be resolved, and the better the outcome for the business. So what does the organization need to have such a system in place, and to give it the best shot at a quick resolution?
I had the opportunity to discuss this topic in a recent email interview with Vincent Geffray, senior director of product marketing at Everbridge, a Glendale, Calif.-based provider of unified communications systems for incident alerting and management. Geffray explained what’s needed—and why it’s needed—this way:https://o1.qnsr.com/log/p.gif?;n=203;c=204663295;s=11915;x=7936;f=201904081034270;u=j;z=TIMESTAMP;a=20410779;e=iIT issues very often translate instantaneously into business issues, such as revenue loss, drop in employee productivity, deterioration of brand image, etc. Therefore, it is more important than ever before that IT issues are resolved as quickly as possible to limit the negative impact on the business. IT departments consistently have to deal with incidents of all severity levels, from minor service disruptions to total outages. In these critical situations where every minute counts, the IT department needs to be able to reach out to the on-call IT specialists, whether they are on site or remote, so they can resolve and restore the service quickly. Automated IT communications, alerting and escalation [systems] play an important role in restoring services faster by connecting the right on-call people with the right information. Having the right critical communications [system] in place also enables IT departments to immediately alert the “need to know” people, including the CIO—and sometimes the customers—via text, phone, email, etc., and ensure the necessary next steps are taken quickly and efficiently to mitigate the issues.
Geffray also provided some reasons why an IT incident may not be resolved quickly enough:
Obviously, if the IT department does not have any kind of monitoring in place, it’s most likely that they will hear about the IT issue when it’s already too late—that is, when users start calling the service desk to complain. In these circumstances, you see the IT teams playing catch-up, inviting all kinds of people to unorganized war rooms, always in a firefighting mode. In some cases, IT departments have all the tools they need, but because they are on disparate systems and not integrated, this can generate inefficient workflows. Another reason can be found with alert fatigue, when everyone on the IT teams receive all the alerts generated by all the monitoring tools with the same sense of urgency. Again, this does not contribute to resolving issues quickly. Now, from a communications standpoint, we’ve seen many instances when there is an overlap in communication methods and a lack of collaboration mechanisms. Inefficient processes like these can result in long wait times in trying to find and connect with the right person for the job. Things can get even more complicated for a global company that needs to reach its stakeholders located in many countries and across time zones. Inefficient and obsolete processes and systems cause significant delays, and ultimately inhibit the ability to reach employees, vendors, and/or customers.
According to Geffray, three key constituencies need to be alerted when an incident occurs:
First, you want to quickly identify and communicate with your on-call IT specialists across the different IT teams—network, applications, middleware, server, databases, etc.—as they need to be involved in the incident resolution process as soon as possible. We call them the ‘resolvers.’ Second, if the ecommerce application is down, for example, you may want to inform the different ‘stakeholders,’ such as the application owner, the CIO, the VP of sales, and the marketing team so they are ready to address any bad publicity they’ll find on social networks. And third are the ‘end users,’ whether they are your internal employees who can’t use email because it’s down, partners who can’t access your portal, or your online customers who can’t buy services or products from you any longer. You want to make sure you communicate to these different groups with the appropriate information, and in a timely manner.
A contributing writer on IT management and career topics with IT Business Edge since 2009, Don Tennant began his technology journalism career in 1990 in Hong Kong, where he served as editor of the Hong Kong edition of Computerworld. After returning to the U.S. in 2000, he became Editor in Chief of the U.S. edition of Computerworld, and later assumed the editorial directorship of Computerworld and InfoWorld. Don was presented with the 2007 Timothy White Award for Editorial Integrity by American Business Media, and he is a recipient of the Jesse H. Neal National Business Journalism Award for editorial excellence in news coverage. Follow him on Twitter @dontennant