Why Simplifying Incident Management Processes with Cloud-Sourcing is like Managing the Contents of a Black Box

Managing incident management in the cloud simplifies ITIL workflow processes as shown in the figure below, right? From our experience, we’ve seen the number of cloud tools to manage incidents multiply tenfold which means we’re doomed to go through the same boom and bust process as with any new technological wave; first, the period of rapid growth and expansion as everyone hops on the bandwagon and then comes market consolidation period where a few market leaders emerge. So making incident management simple in this case may not, in fact, be so simple and is more like managing the contents of a black box.

clip_image002[1]

As illustrated by the diagram above; when an incident is reported to the service desk, the agent attempts to resolve it at the first level by consulting the Known Error Database and the CMDB. If this doesn’t work, the agent will quickly escalate it to the second line support team and the process continues until the incident is resolved. As per its charter, Incident Management attempts to find a quick resolution to the incident so that the service degradation or downtime is minimized. The problem is that the more levels an incident is escalated, the longer the time to resolution, and the increased probability that Service Level Agreement will be broken.

Why is Incident Management Difficult to Manage?

From our experience there are several contributing factors that make Incident Management a black box and one of the most difficult and expensive of all the ITIL processes.

  • Complex System Architecture
  • Poorly architected or missing processes
  • Silo Effect Caused by Specialization among IT Professionals
  • Incomplete Monitoring of Processes and Systems
  • Failure to Conduct Root Cause Analysis
  • Lessons Learned are not Shared
  • Missing or Unclear Context Surrounding Exception Handling
  • Mismanaged Resource Spending

There are many other reasons why incident management remains difficult. As a result, there is a tendency to throw resources at incidents while the underlying cause for service delivery failure is the product of poorly architected software, infrastructure or business processes. In addition, not enough attention is paid to training IT professionals in troubleshooting which remains an art form. Finally it is getting more and more expensive to hire trained professionals in times of shrinking IT budgets.

The cloud promises simplicity, so why is the level of management difficulty increasing?

A good starting point to avoid these problems is to outsource services to the specialized market instead of doing everything onsite. There are thousands of Infrastructure, Platforms and Software as a Services available in the Computing-Cloud and the numbers are growing rapidly. A CIO magazine article, 11 Outsourcing Trends to Watch in 2011, pointed out that 2011 is the year of progressive outsourcing:

“The year will be marked by the inking of smaller IT services deals, many of them by first-time buyers who sat on the sidelines in 2010”

For you it means that using outsourced services decreases internal costs for operating and speed to market on one hand but the more you source increases the complexity for supporting and managing on the other. Managing outsourced services through this lens can be as difficult as managing the contents of a black box.

clip_image004[1]

What can I support Cloud-Sourcing?

In order to manage Outsourced Services in the cloud requires three elements:

  • Service Description
  • Service Level Agreement
  • Support Agreement

The first two elements are usually clear but how to escalate to whom and under which conditions along with what information and in which format is very often unclear and complicate service delivery.

clip_image006[1]

For an Incident Management system to work effectively in the cloud we feel that it is crucial to consider two points:

  • Integrate all provider processes into internal processes
  • Measure and control all provider service levels

When integration is an afterthought of outsourcing services or using a new cloud tool, you lose control of your ability to measure service levels end-to-end and all you’re left with are a bunch of black boxes.

Learn more about how to make black boxes more transparent:

Note: The points covered are also applicable to Problem Management, even though it is a separate process under ITIL. While Incident Management is responsible for the fix or workaround, ultimately it is the Problem Management that performs the root cause analysis for long-lasting Incidents and provides a permanent solution.

For the ITIL terms used in this entry see itSMF ITIL Overview

Leave a Reply