Prompt Library Incident Report Generator

Incident Report Generator

Create detailed incident reports to document and communicate issues, their impact, and resolution steps.

Plus these additional features
  • Tailor this prompt to your needs
  • Generate or optimize your existing AI prompts
  • Access 500+ expert AI prompts
Get superior AI results
Optimize your existing AI prompts with our cutting-edge prompt engineering algorithm. We analyze your input and craft an optimized prompt tailored to your needs.
Learn more.

Example

Input prompt
Write an incident report about a power outage in our data center that took some servers offline.
Result from enhanced prompt

Incident Report: Data Center Power Outage

Incident Details

  • Incident Type: Data center power outage
  • Date/Time: July 8, 2023 at 2:34am PT
  • Location: Las Vegas data center, Rack 4A

Description

A power surge caused an electrical fault and fire in Rack 4A of the Las Vegas data center, which took down 3 production database servers and 6 application servers. The halon fire suppression system activated and extinguished the fire.

Impact

  • 8 customer-facing applications were offline for 2 hours and 43 minutes
  • Approximately 1.2TB of customer data may have been lost due to database corruption from the hard shutdown of the database servers
  • No customer data was exfiltrated or compromised

Resolution

  1. The affected servers were isolated from the network and thoroughly inspected by the infrastructure team
  2. Corrupted hard drives were replaced
  3. Data was restored from the last full backup from 1:00am PT
  4. Failover to the secondary data center was initiated for disaster recovery
  5. Root cause was determined to be a faulty power distribution unit in Rack 4A
  6. The incident was mitigated and all systems were fully restored by 5:17am PT

Root Cause Analysis

The root cause of the incident was traced to a malfunctioning power distribution unit (PDU) in Rack 4A of the Las Vegas data center. The faulty PDU caused a power surge which led to an electrical fire in the rack, damaging servers and triggering the fire suppression system. Key contributing factors:

  • Lack of redundant power distribution to Rack 4A
  • Inadequate real-time monitoring and alerting for electrical systems
  • PDU was past recommended replacement cycle

Preventative Measures

To avoid similar incidents in the future, the following preventative measures are recommended:

  1. Upgrade all data center PDUs and implement redundant power distribution to all critical racks
  2. Setup comprehensive electrical monitoring and alerting to identify anomalies in real-time
  3. Review and enhance data center maintenance procedures, replacing equipment proactively based on recommended lifecycle
  4. Conduct regular failover drills to secondary data centers to minimize downtime
  5. Investigate options for real-time database replication to secondary data center to reduce RPO

Incident Timeline

  • 2:34am PT: Power surge and electrical fire occurs in Rack 4A
  • 2:35am PT: Halon fire suppression activates and extinguishes fire, outage begins impacting applications
  • 2:40am PT: Incident detected and Ops team engaged, troubleshooting begins
  • 3:15am PT: Corrupted servers isolated, root cause identified as PDU failure
  • 3:50am PT: Decision made to failover to secondary data center
  • 4:30am PT: Data restore from backup initiated
  • 5:17am PT: Incident mitigated and all systems restored

Author

John Smith Senior SRE, Acme Inc.