The first page of your document is simple yet important. Typically, IT organizations use a percentage, such as 99.999% availability, to do this. ITIL change management focuses on processes, not people.That is, ITIL belongs to IT services management (ITSM). Do not be content to just report on availability, duration, and frequency. Your email address will not be published. Value is created when every customer transaction is quickly processed, thereby avoiding lengthy queue at cash register, and therefore prevents money walking out the door. a well-defined unit. It is actually one of the lacks of ITIL. Quick Content Limited68 Church Street, Market Deeping,Peterborough. Small variations in availability percentages go a long way. For example, let’s consider an IT organization that has agreed a 24×7 service and an availability of 99%. How will you document and report your findings? Here are some ways to create availability metrics that matter. Be sure you can break down and look at how long each individual outage was (duration) and how often an outage occurs (frequency). In this e-book, we’ll look at four areas where metrics are vital to enterprise IT. Availability metrics also estimate how well a service will perform in the future. IT Support: Improve Where It Matters the Most, Three Mindsets for Improving the IT-Business Relationship, Emergency Changes – How Preparation Helps to Reduce Risk, Look Beyond Self-Service. The challenge though is how to automate this kind of measuring and outage reporting in the era of microservices and API economy. The really great IT departments work with their customers to optimize their investment and deliver levels of availability that delight. Weekly availability = 100% x (168 – 8) / 168 = 95.2%. You can use this data to identify the duration of incidents and the number of users impacted. This is probably even more important, but it’s a topic for another blog. Stuart This has certainly worked because it is from an actual SLA done about 10 years ago. Availability Management ensures the availability of the IT Services in a way that all components of the IT Services: IT infrastructure. Almost every IT organization that I’ve worked with measures and reports the availability of their services. And since ITSM focuses on IT service deliveries to customers, ITIL strictly focuses on changes to those services.“Change management” as a whole, however, includes other types of organizational and business changes.These can include structural, organizational, cultural, and hierarchical changes.In ITIL, though, the … You may find it helpful to draw up a table that indicates the business impact that follows losing each of these functions relative to one another. support the achievement of agreed availability goals. Be sure to use availability statistics that make sense to everyone and that measure availability over the required timeframe. Calculate the percentage availability using a very similar formula to the one we saw earlier. Table 1 is an example: NB: Figures are not intended to add up to 100%. So, it’s important to understand that you need to specify the time period over which calculation and reporting take place, as this can have a dramatic effect on the numbers that you’ll be reporting. There are many different ways that you could collect data about IT service availability. Usually, the targets will form part of a service level agreement (SLA) between the IT organization and the customer – but be careful that meeting numbers in an SLA does not become your goal. After the rollout of an IT service, ITIL processes shift the focus to maintaining the … Thank you. CSF describes what has to be achieved (if we want to say that something is successful) and KPI measures it (i.e. It’s all about what you are trying to achieve – your critical success factors (CSFs). He was an author for ITIL® Practitioner Guidance, the lead author of RESILIA™: Cyber Resilience Best Practice, and the author of ITIL Service Transition, 2011 edition. Document change history, including last reviewed date and next scheduled review 3. IT staff often work very hard to see that the agreed target is met, and provide figures proving it has been met when reporting to customers. If you forget to factor in planned downtime when you’re working out how to report availability, you could end up reporting availability figures that don’t fairly reflect your service provision. The two people who were impacted where the CEO and one of his reports. © Copyright Quick Content Limited. For example, a billing run that takes two days to complete and must be restarted after any outage will be seriously impacted by every short outage, but one outage that lasts a long time may be less significant. Investigate why your outages happened. Please let us know by emailing blogs@bmc.com. Your metric should be clearly understood and related to the critical business processes being measured. To do this: In this example, you would calculate the availability as: You can use this same technique to calculate the impact of lost availability of IP telephony in a call center in terms of PotentialAgentPhoneMinutes and LostAgentPhoneMinutes; and for applications that deal with transactions or manufacturing you can use a similar approach to quantify the business impact of an incident. I’ve said that percentage availability is not very useful for talking to customers about the frequency and duration of downtime. The table below shows how much downtime we can expect at different availability percentages. Metrics are a system of parameters or ways of quantitative assessment of a process that is to be measured. Some processes directly reference speed in their objective such as Incident Management which aims to “restore normal service operation as quickly as possible” making performance measures especially important (Source ITIL Service … He writes blogs and white papers for many organizations, including his own website. However, you don’t need much imagination to understand how customers would feel if a service is being reported as available while many people just can’t access it. First-touch resolution rate is the percentage of incidents resolved the … After you’ve agreed and documented your availability targets, you need to think about practical aspects of how you can measure and report availability. Some of the more common ways that availability data can be collected include: Service availability is a simple idea, but the difficulty is in the details. Does anyone agree measuring Availability for a transaction processing service on the basis of failed v successful transactions is a valid approach? In addition to his day job, he is also an ITSM.tools Associate Consultant. You reached your availability targets, but your customers are unhappy. One is to have the planned downtime happen during a specific window that’s not included in availability calculations. Your email address will not be published. … In my opinion, SLAs for the most part, have managed to create a wholly negative culture between IT organizations and service providers. This can actually measure end-to-end service availability, provided that the requirement is included early in the application design. Some organizations include code in their applications to report end-to-end availability. The biggest challenge in calculating availability is in gathering all the necessary service time values. Which of your business functions are so critical that protecting them from downtime is a priority? And stealing from my earlier service desk blog, I consider the following a good example to model your organization’s help desk/service desk KPI and metric … If the business goal is to enter and process orders while the business is open, it will dilute your measurements to factor in uptime during off-hours, weekends, and holidays. Joe Hertvik works in the tech industry as a business owner and an IT Director, specializing in Data Center infrastructure management and IBM i management. Joe also provides consulting services for IBM i shops, Data Centers, and Help Desks. I like table 2 but I do not necessarily agree with the number of users impacted story. Most incidents don’t cause complete loss of service for all users. Information Availability Management is dealing with the implementation and monitoring of a predefined Availability level for the IT environment. It’s the same with IT services being consumed by business customers. If AST is 100 hours and downtime is 2 hours then the availability would be: The trouble with this is that, while this calculation is easy enough to perform, and collecting the data to do it seems straightforward, it’s really not at all clear what the number you end up with is actually telling you. Look at how you can parse your availability numbers to understand and fix your issues. Unfortunately, this sometimes means that IT service organizations focus on the percentage measure and lose sight of their true goal – providing value for customers. The ITIL Availability Management process works jointly with Capacity Management, Service Level Management, and IT … The process overview of ITIL Availability Management (.JPG)shows the key information flows (see Fig. In fact, I often argue that the only purpose of including users impacted (an imprecise metric) into the calculation is to dilute the SLA percentage to something less severe. Service metrics comprehensively measure a service, or as stated in the ITIL CSI … It is calculated by using this equation: Availability is measured as the percentage of time your service or configuration item is available. For example, if CSF says that Service Desk efficiency has to be increased as part of a customer service improvement program, KPI would be t… I suspect that this customer would be better off with one more register, so that failure of a single register doesn’t cost them so much, but this depends on the cost/benefit tradeoff so is up to them. This tells the IT organization where to focus their efforts when designing and managing the email service. Some users may be unaffected, while others have no service at all. Don’t get me wrong, we should still measure availability, but we do need more focus on service performance measurements that have better relevance to the consumers of these services. Be aware—this assumption can lead to the “watermelon effect”, where a service provider is meeting the goal of the measurement, while failing to support the customer’s preferred outcomes. As an example, very recently, we had an outage of our VOIP telephone service for a few minutes. High-level Metric Relationship examples: • As the change success rate goes down, unplanned work goes up, project backlog increases, service availability decreases, service performance decreases, … I find that really hard to buy. On the other hand, a web-based shopping site may not be impacted by a one-minute outage, but after two hours the loss of customers could be significant. Firstly, you need to understand the context. Service Restoration time is kind of irrelevant as long as the service is available the next day to process transactions before the next cut off. The reality is services provide a means of getting things done. Business Capacity Management ... Metrics, income and Costs. Availability management is balancing the insurance level for the disaster case with the resources, requirements and costs. All rights reserved, Service Management and Security Management Consultant. Interested in your comments about transactions. Availability should be measured against the time the service is required or its required service level. You compare the number of transactions that would have been expected without downtime to the number of actual transactions, or the amount of production that you would have predicted to the actual production. A compliance metric is often expressed in terms of a Boolean pass/fail or yes/no result. Benjamin, it certainly is a challenge. An excellent starting point is the impact of downtime. Ashok, That can certainly work if the customer is happy with it. PE6 8AL,United Kingdom. These aspects of availability management are discussed in most online ITIL trainings and may be … Availability Management ensures the operative services meet all agreed availability goals. A service level agreement, or SLA, is a common term for formal service commitments that are made to customers by service providers.The following are illustrative examples of commitments that are commonly included in service … IT and IT service management (ITSM)have always been highly influenced by SLAs, influencing behaviours, prioritizations of resources and steerage of relationships. There are real consequences in keeping service availability under control. It’s clear from this table that the service has no value at all if it cannot send and receive emails, and that the value of the service is reduced to half its normal level if public folders cannot be read. Availability Management is part of Service Design . One way you can quantify impact is to calculate the percentage of user minutes that were lost. Only measure availability against its required agreed function. The purpose of Service Level Management (SLM) is to ensure that the service targets are created, negotiated, agreed, documented, monitored, reviewed and reported to the customer.SLM acts like a liaison between the customer and the service … It reports on the past and estimates the future of a service. This means that subtle failures may be missed, for example if a change means that clients running a particular web browser no longer work correctly, but the dummy clients use a different browser. • It is cheaper to design the right level of service availability into a service … Tools that support this data collection often report service performance, as well as availability, and this can be a useful addition. I’ll go into a bit more detail about this later in the blog. Nevertheless, it does have its uses and it continues to be widely used. This can be very effective, but may miss subtle failures, for example a minor database corruption could result in some users being unable to submit particular types of transaction. For example, some organizations may not count downtime that has been scheduled a month in advance. However, it can lead to disputes about the accuracy of the availability data. If you do not think availability tracking is important, ask your executives if they would like to have their online store unavailable for 3.65 days each year. Version details 2. And use availability information to improve the process. Earlier in this blog, I talked about the limitations of percentage availability as a useful measure. ITIL key performance indicators (KPIs) are a measure of performance that enables organizations to obtain information about many relevant factors such as the effectiveness and efficiency of their … Some organizations use dummy clients to submit known transactions from particular points on the network to check whether the service is functioning. Figure 1: The 23 most commonly used IT support metrics. 1). You may want to focus on just one approach, or you may need to combine some of these to generate your reports. of outage due to incidents (unplanned unavailability) Percentage of outage (unavailability) due to … You agree the amount of time that the service should be available over the reporting period. Service Metrics. Sometimes it is difficult to pin-point the number of users affected. The problem here is that we are approaching the measurement of services to often just using availability as the primary indicator of performance. Try to find a measure of business impact and apply it to all outages. At the opposite extreme, you might decide to say that a service is available so long as someone can still access it. No percentages to play around with. If you take a look at ITIL books, you will quite often find something that is called Key Performance Indicators (KPI) or Critical Success Factors (CSF). Availability analysis and business requirements as starting point, based on Service definitions, SLAs and costs to define the customer requirements on Availability level. Sadly, many IT organizations focus on the numbers in an SLA, and completely fail to meet their customers’ needs – even if they deliver the agreed numbers. This is the agreed service time (AST). Worse still, from the customer’s point of view, you may be reporting that you have met agreed goals, while leaving the customer totally unsatisfied. Service availability: the amount of time the service is available for use. If we report availability every week then the AST (Agreed Service Time) is 24 x 7 hours = 168 hours, Measured monthly the AST is (24 x 365) / 12 = 730 hours, Measured quarterly the AST is (24 x 365) / 4 = 2190 hours. One aspect of availability measurement and reporting that’s often overlooked is planned downtime. Stuart, thanks for the article. For example, the new service has undergone operational acceptance testing or measurement of tasks against a burn down chart. Fact 1 = During peak hours, if one of 2 registers are down, queues are longer at remaining 2 and many customers walk out without buying because of time constraints Suggested definition of service availability = During peak hours if any of the three cash registers is down, we will consider the entire retail service is down, SLA breach is incurred and downtime is accumulated. The relative impact of a single long incident or many shorter incidents will be different, depending on the nature of the business and the business processes involved. The consumers of services want things like transaction throughput and responsiveness/speed at the times they need to use the service. Document approvals Last Review: MM/DD/YYYY Next Scheduled Review: MM/DD/YYYY You take the downtime away from the agreed service time, and turn this into a percentage. This is why vendors sell products with five nines availability, and customers want SLAs where their services are guaranteed 99.999% uptime. That 98% tells me more than the 98.96% that is reported when you include the number of users impacted. Use availability information for your continuous improvement cycle. Use appropriate tools and instrumentation to help you measure and report availability. The first calculation that you stated provides no valuable information is, in fact, the undisputed metric of availability for the service in question during the reporting period. It is calculated by using this equation: Agreed service time is the expected time the service will be in operation. According to ITIL®, availability refers to the ability of a configuration item or IT service to perform its agreed function when required. Whatever you decide to do, it’s important that your SLA clearly defines how planned downtime will be reported. Joe can be reached via email at joe@joehertvik.com, or on his web site at joehertvik.com. Please comment below. If only one cash register is down retail service is considered up. Answers to questions like these can have a big effect on your perceived availability and help you to avoid the watermelon effect. This method can also miss the impact of shared components, for example one of my customers had regular downtime for their email service due to unreliable DHCP servers in their HQ, but the IT organization did not register this as email downtime. If your service level specifies that users must have access to an ERP system from 6:00 AM to Midnight on workdays, your agreed service … Quarterly availability = 100% x (2190 – 8) / 2190 = 99.6%. Metrics … At one extreme, there may be a single user with a faulty PC who cannot access any services. This will allow you to agree realistic goals that consider technology, budgetary, and staffing constraints. During non peak hours if any two of the cash registers are down, we will consider the entire retail service is down, SLA breach is incurred and downtime is accumulated. Suppose there’s an eight-hour outage: Putting these numbers into the availability equation gives: Each of these is a valid figure for the availability of the service, but only one of them shows that the target was met. This e-book introduces metrics in enterprise IT. Availability should not purely react to service and component failure. Sample Metrics For ITIL Processes Pink Elephant’s consultants are often asked for a laundry list of sample metrics for IT processes. Needless to say, they were not very happy and our reporting a stellar performance metric did not do anything to improve their confidence in the numbers being reported. IT resources. Organizations of all shapes and sizes can use any number of metrics. You need to talk to your customers and reach agreement about the importance of the various functions to their business. Use tools and methods that get the information you need. If you want to measure, document, and report availability in ways that will be helpful to your organization and your customers you need to do two things. For example… Support costs can be dramatically reduced … For example, an ATM may support cash dispensing and statement printing. IT organizations are often seen by the business as underperforming, discon… This approach involves instrumenting all the components required to deliver the service and calculating service availability based on understanding how each component contributes to the end-to-end service. Go beyond simple availability to report on the frequency and duration of your downtime. For example: It’s essential to measure and report availability in terms that can be compared to targets that have been agreed with customers and that are based on a shared understanding of what the customer’s availability needs actually are. Here’s an example of how you could measure and document availability to reflect the fact that the impact of downtime varies: If you use a table like this when you’re discussing the frequency and duration of downtime with your customers, the numbers are likely to be much more useful than percentage availability, and they’ll certainly be more meaningful to your customers. You are absolutely right, but it is not just availability and performance that matter. Buenísimo el articulo, un buen checklist de puntos a tener en cuenta. In other words, you need to talk to your customers to ensure you understand what they need and, if necessary, to help them understand that “I want it to be available all the time” is probably going to cost more than it’s ever going to be worth. 1.3. When a service that should be available for 100 hours has 98% availability that means there were two hours downtime. Required fields are marked *. The numbers in the SLA are simply agreed ways of measuring, the real goal is to deliver services that meet your customers’ needs. ©Copyright 2005-2020 BMC Software, Inc. Here is my example of defining service availability in business terms; Business – say 500 retails shops across country, each shop has 3 cash registers to service shoppers. And metrics serve for the quantitative assessment of a process to be measured.For instance, the percentage of the incidents resolved by the first level support can be metrics for the incident management and problem management process.The number of missed calls to the service … A Service Caatalog should show the services provided to the Business, focused on value for it, not SERVICE REQUEST. To do this you’ll need to talk to your customers. The targets need to make sense to the customer, and to ensure that the IT organization’s efforts are focused on providing support for the customer’s business needs. Here are five questions you should consider asking: Most IT services support several business processes, some of these are critical and others are less important.
2020 service availability metric is an example of in itil