In ITIL terminology, an ‘incident’ is defined as an unplanned interruption to an IT service, or reduction in the quality of an IT service, or a failure of a CI that has not yet impacted an IT service (for example failure of one disk from a mirror set).
It is the process responsible for managing the life cycle of all incidents. Incidents may be recognized by technical staff, detected and reported by event monitoring tools, communications from users usually via a telephone call to the service desk, or reported by third-party suppliers and partners. The purpose of incident management is to restore normal service operation as quickly as possible, and minimize the adverse impact on business operations. Thus, ensuring that agreed levels of service quality are maintained.
‘Normal service operation’ is defined as an operational state, where services and CIs are performing within their agreed service and operational levels. Incident management includes any event which disrupts, or which could disrupt, a service. This includes events which are communicated directly by users, either through the service desk or through an interface from event management to incident management tools.
Incidents can also be reported and/or logged by technical staff. For example, they notice something untoward with a hardware or network component they may report or log an incident and refer it to the service desk).This does not mean, however, that all events are incidents. Many classes of events are not related to disruptions at all, but are indicators of normal operation or are simply informational. Although both incidents and service requests are reported to the service desk, this does not mean that they are the same.
Service requests do not represent a disruption to agreed service, but are a way of meeting the customer’s needs and may be addressing an agreed target in an SLA. Service requests are dealt with by the request fulfillment process.
The objectives of the incident management process are to:
Ensure that standardized methods and procedures are used for efficient and prompt response, analysis, documentation, ongoing management and reporting of incidents.
Increase visibility and communication of incidents to business and IT support staff.
Enhance business perception of IT through use of a professional approach in quickly resolving and communicating incidents when they occur.
Align incident management activities and priorities with those of the business.
Maintain user satisfaction with the quality of IT services.
Value to Business
Incident management has high visibility to business, and it is therefore easier to demonstrate its value than most other areas of service operations. This is the reason why incident management is often one of the first processes to be implemented in service management projects. The additional benefit is that incident management can be used to highlight other areas that need attention.
The value of incident management includes: It has the capability to reduce unplanned labor and costs for both the business and IT support staff caused by incidents. It has the capability to detect and resolve incidents which results in lower downtime to the business, which in turn means higher availability of the service. This means that the business is able to exploit the functionality of the service as designed.
It has the capability to align IT activity to real-time business priorities. This is because incident management includes the capability to identify business priorities, and dynamically allocate resources as necessary.
It has the capability to identify potential improvements to services. This happens as a result of understanding what constitutes an incident, and also from being in contact with the activities of business operational staff. Also, the service desk can, during its handling of incidents, identify additional service or training requirements found in IT or the business, which is another value proposition of Incident Management.
Many incidents are not new. They involve dealing with something that has happened before and may well happen again. For this reason, many organizations will find it helpful to pre-define ‘standard’ incident models, and apply them to appropriate incidents when they occur.
An incident model is a way of pre-defining the steps that should be taken to handle a process (in this case a process for dealing with a particular type of incident) in an agreed way. Support tools can then be used to manage the required process. This will ensure that ‘standard’ incidents are handled in a predefined path and within predefined timescales. Incidents which would require specialized handling can be treated in this way.
For example, security-related incidents can be routed to information security management, and capacity-or performance-related incidents that would be routed to capacity management.
The incident model should include: The steps that should be taken to handle the incident. The chronological order these steps should be taken in, with any dependencies or co-processing defined.
Responsibilities: who should do what.
It explains how the precautions to be taken before resolving the incident such as backing up data, configuration files, or steps to comply with health and safety related guidelines.
Thresholds for completion of the actions
Escalation procedures; who should be contacted and when
The models should be input to the incident handling support tools in use and the tools should then automate the handling, management and escalation of the process. Incident models should be stored in the SKMS
A separate procedure, with shorter timescales and greater urgency, must be used for ‘major’ incidents. A definition of what constitutes a major incident must be agreed and ideally mapped onto the overall incident prioritization scheme such that they will be dealt with through the separate procedure. Where necessary, the major incident procedure should include the establishment of a separate major incident team under the direct leadership of the incident manager, and formulated to concentrate on this incident alone to ensure that adequate resources and focus is provided in finding a swift resolution.
If the service desk manager is also fulfilling the role of incident manager (Say in a small organization), then a separate person may need to be designated to lead the major incident investigation team. So, as to avoid conflict of time or priorities, but he should ultimately report back to the incident manager. If the cause of the incident needs to be investigated at the same time, then the problem manager would be involved as well. But the incident manager must ensure that service restoration and underlying cause are kept separate.
Throughout, the service desk would ensure that all activities are recorded and users are kept fully informed of progress