Written on 18 October 2018, 09:58pm
An incident is an event that is not part of the standard operation of a service and that causes an interruption or a reduction of service.
In simpler words, an incident is an unplanned interruption of service.
Contents of a post-incident report
(The post-incident report alternative names: incident report, postmortem report)
- Timeline: what exactly happened and at what times?
- Metrics: how well did we react? (time to detect, time to react, time to close)
- Procedures: were they adequate? were they being followed?
- Root cause analysis: is the root cause understood?
- Lessons learned: what corrective actions can we take?
Tip: If the incident caused financial loss, attach the current and potential security controls to the timeline. Which controls limited the loss, and which controls could be acquired in the future? Also, it’s a good idea to calculate potential losses if the existing controls would not have intervened. This will help establish the overall return of security investment (ROSI).
Why a post-incident report?
- To understand and address the root causes
- To build lessons learned
- To maintain an accurate archive of past incidents
Case study: How Google is learning from failure
A postmortem is a written record of an incident, its impact, the actions taken to mitigate or resolve it, the root cause(s), and the follow-up actions to prevent the incident from recurring.
When to create one? Interruption of service, data loss, monitoring failure, etc.
3 best practices: avoid blame, keep it constructive, collaborate and share.
For a postmortem to be truly blameless, it must focus on identifying the contributing causes of the incident without indicting any individual or team for bad or inappropriate behavior. A blamelessly written postmortem assumes that everyone involved in an incident had good intentions and did the right thing with the information they had.The blameless culture