Nowadays, there is a more collaborative mindset between IT and the lines-of-business, and a greater communal understanding of relevant reporting metrics. This collaboration is extremely important, as virtualization and now cloud computing have made understanding and monitoring applications far more complex. Not only does IT have to worry about monitoring applications that exist on internal physical infrastructure, but the applications can now be spread over physical, virtual and cloud-based assets. Add in the extra complication that the infrastructure may dynamically change, and it's "Advil time" for IT.
Applications and their successful delivery have historically been defined by performance and availability. Meaning that not only must applications be functional, they also should be useable and perform well. E-commerce vendors such as Amazon, Zappos and Neiman Marcus are acutely aware of this-any degradation in performance means that customers go spend their money somewhere else. Their SLAs would encompass things like: availability of websites, availability of network infrastructure, back-end databases, transaction gateways and performance of specific actions that an end-user would perform on a website-for example: checking out, looking up a price, listing items in the shopping cart or comparing items. A very easy way to capture a general sense of how the complete infrastructure is performing would be to monitor the dollars spent per minute by customers. If this number begins to drop outside of expected spending, then something is wrong. No amount of IT service up-ness or down-ness is going to help you get this holistic overview.
An additional dimension around SLAs that will emerge over the next few years is cost. As applications are deployed across the cloud, it will be possible to dynamically move them onto more cost-effective platforms as cloud pricing changes. This concept of economic computing will become very interesting and will be explored in future blog postings.
Now, getting down to the details, what are the things that we need to do to create SLAs quickly and effectively?
Knowing that we need to keep a user's expectations in check, where do you start? People seldom realize that five nines availability (99.999 percent availability) is ridiculously hard to achieve and it comes at a prohibitive cost, which most business users will not pay. Additionally, not all applications are 24/7, they run on reduced work schedules and you'll need to accommodate for this in the reporting. A common problem that people have when starting to go down the SLA path is just figuring out where to start. As an IT user, it's important to be able to back-test an SLA based on historical performance and availability of data. Sometimes 'good enough' is, well, good enough.
Defining Metrics - Before even creating an SLA document/report, we'll need to make sure we've defined which metrics indicate availability and performance of the service we're providing. We also have to make sure we can actively monitor the metrics with our monitoring solution as well.
Baselining Current Service Level - Once monitoring of key business applications has been defined, we need to get a baseline of how we're currently performing before we commit to providing a level of service that we cannot possibly achieve. Being able to back-test SLAs for a given set of performance and availability metrics is key. For example, if end-user response time is a key component of an SLA, but it varies significantly over a period of time, it is easy to back-test when the SLA would be violated for a given performance threshold. This allows us to negotiate with an application owner for values that 'make sense.'
Proactive SLA Management - Once an SLA is created with objectives (SLO's) that define the availability and performance of an application, it's useful to get instant visibility on an SLA dashboard. It's also possible to set up SLA alerting, so that alerts can be generated when an issue occurs and starts to affect SLA performance. This is important if an SLA is trending to be violated, as it gives IT a chance to rectify an operational situation before getting penalized.
In summary, SLAs have become much easier to implement, manage and enjoy than in the past. With appropriate tooling and a collaborative mindset, IT can quickly demonstrate value around keeping applications available and performing well. Application environments are only going to get more complicated, so make sure you're staying on top.