According to Charles Araujo, president and managing consultant of CastlePointe, your IT organization has a problem, and you don't even know it.
He explains that you're collecting data and probably think that you have everything you need. But you likely have trouble connecting that data with any action that had a meaningful and measurable impact on the results your organization delivers. That's because what you have is data. But what you really need is information that enables action.
The primary focus of any IT service management or IT transformation effort is to improve service delivery and operational efficiency in order to deliver the appropriate level of service in the most cost-effective manner possible. To do that, you must be able to measure your performance in a way that enables you to monitor your effectiveness and take the corrective actions necessary to move you toward your goal.
The problem is that most IT organizations collect reams of technical data, but have trouble converting that data into meaningful, results-driven action. The primary flaw is in the data itself. In most cases, how it's collected and reported makes it almost impossible to take action.
In this slideshow, Araujo highlights the benefits of building an IT Metrics Correlation Model to gain the full value of the data being collected.
Click through for highlights on building an IT Metrics Correlation Model to gain the full value of your data.
If you think about how you make decisions in your day-to-day life, you almost never use a single data point to make a decision. Something as simple as choosing a restaurant involves a large number of data points, including guest preferences and style, type of food, average entrée price, location, reviews from previous guests and attire – casual, trendy, dressy, etc.
When it comes to IT operational decisions, IT organizations routinely make decisions with only a single or very limited set of data points. This is particularly true when it comes to process-based metrics. You may look at Mean Time to Restore Service (MTRS) and evaluate if it's trending up or down. And if it's going in the "wrong direction," you may tell someone to fix it. But this almost inevitably leads to bad decisions or misdirection because MTRS, by itself, does not provide enough context to understand the true cause or the corrective action that is required.
The necessary context is not self-evident with most IT metrics. The mantra of "you can't manage what you don't measure" has been engrained in IT organizations for so long that you measure and measure and measure. You measure everything, and the measures tend to be collected and reported independently. But IT systems and the processes that support them are exceedingly complex and almost never operate in isolation. So in order for MTRS to be a useful metric, you must understand it in context. That means that you must correlate it with other key metrics that stand in relation to MTRS and which will enable you to understand why it is trending the direction that it is and to determine if it may be a temporary deviation or a real problem. Only then can you begin to determine what corrective action is necessary.
The challenge with correlation is that it takes work. It means that simply identifying metrics is not enough. You need to build a model which connects the metrics together in a way that explains their relationship to one another. This requires a deliberate evaluation of each metric to determine what other metrics could impact it and which it might impact.
One of the most powerful elements of a correlation model is that it is self-correcting. Because you are using the correlation model to answer the “whys” when you observe an unfavorable trend, a deficiency in the model becomes readily apparent.
Let’s use MTRS as our example. You might define that MTRS should be driven by two correlated metric indicators:
- The number of Incidents in which the initial response target was breached
- The average number of Incidents per day
The logic would read that MTRS will move in correlation to these two indicators. If your teams are not responding according to the agreed upon response timelines, there’s a strong likelihood that it’s going to take them longer to restore service. And if there are simply too many incidents occurring, it is likely to stress your organization’s capacity to respond and thus increase the average restoration time.
But what if MTRS is rising, yet response targets are being met and the average number of incidents is low? In this case, the model will have demonstrated to you that there is a correlation between MTRS and something that you have not yet identified. That means that you are not yet measuring or managing all of the things necessary to effectively manage MTRS.
Ideally, you should start by developing the key objectives and impact you desire and then map the correlation model top-down. In practice, however, you are often starting with a set of metrics and KPIs that you are already measuring and instead will need to fit them together like a puzzle. That’s ok, but you need to ensure that you keep asking yourself, “So what?” to understand why each metric matters in context until that leads you to an objective that is in line with your overall IT strategy and will be intuitively understood by the business. You may find it easiest to tackle this from both ends. Fit some of the pieces together, and once you are beginning to make some sense of them, take a fresh look at what the key business-driven objectives and desired impact should be. From there, you should be able to more easily identify your top level outcome metrics.
Your objective is to create a model that allows you to iteratively dig deeper to understand the corrective action that is required. To follow this line of thinking, some of our metrics will be indicator metrics (such as Response Target Breach Rate). They will indicate a potential cause and are really sign posts. You will normally not be able to do too much to directly impact these Indicator metrics, so they should always be correlated to diagnostic metrics.
Diagnostic metrics are those that enable you to take some direct action. Too many incidents are being categorized incorrectly? You can simplify the categorization schema, introduce some new training, add an audit step, etc. Whatever it is, by taking that corrective and proactive action, you will be able to impact all of the upstream, correlated metrics
A Metrics Correlation Model should never be viewed as a replacement for transactional monitoring and management tools typically found in service management products. However, it may be built using their reporting and dashboarding tools. Those real-time dashboards and tools are designed to help you manage in-flight incidents, changes, etc. as they progress through the process. It’s important to do that well and many of the issues with IT performance can be tackled during the transactional process.
The purpose of the Metrics Correlation Model is to enable you to see all of your points of measure in context and in relation to your overall objectives. It is meant to be a retrospective analysis tool. The model will enable you to seek out and identify trends that manifest themselves across multiple metrics and over long periods of time, to see the things that you miss at a transactional level.
Building a correlation model will enable you to overcome the single greatest challenge that most IT organizations have with their metrics efforts – how to go from data to action that has an impact. It is very easy to fall into the trap of measuring metrics for their own sake without having the clear context of how any given metric can be moved or how it affects the goals of the IT organization.