THE IT INCIDENT COMMANDER

Applying Fireground Tactics to Digital Disasters.

🚒 💻

As a Sysadmin and Volunteer Firefighter/EMT, I live in two worlds. One manages chaos with code, the other with water and discipline. The Fire Service has spent 100 years perfecting Incident Command (ICS). IT often tries to reinvent the wheel during an outage. Here are 5 battle-tested strategies to transform your War Room.

1. Span of Control

The Rule of 3-to-7

In NIMS (National Incident Management System), no single commander can effectively manage more than 7 people. The ideal number is 5.

In IT outages, we often have one "Incident Manager" trying to listen to 20 engineers, 3 VPs, and Customer Support simultaneously. This leads to cognitive overload and missed critical information.

The Fix: Delegate. Break the incident into sectors (e.g., Database Sector, Frontend Sector). The IC talks to the Sector Leads, not every engineer.

Cognitive efficiency drops rapidly as direct reports exceed 7.

2. The 360 Size-Up

Firefighters never run into a burning building without walking around it first. In IT, we often SSH into the first server that alerts without checking the "Blast Radius." Stop. Look. Think. Act.

The "Tunnel Vision" Admin focuses only on the specific error code, ignoring the wider system impact.

3. Radio Discipline (CAN)

On the fireground, radio airtime is scarce. We use CAN Reports to cut through the noise. Stop the stream-of-consciousness chatter on the bridge.

  • C Conditions: What do you see right now?
  • A Actions: What are you doing about it?
  • N Needs: What resources do you require?

4. The Staging Area

When a big fire happens, every volunteer drives to the scene. If they all park in front of the building, the trucks can't get in.

In IT, this is the "Too Many Helpers" problem. 30 people join the Zoom call "just to help."

Create a separate "Staging" channel. Resources wait there until the IC requests them.

THE HOT ZONE (Incident Channel)

👷
👩‍💻
📣
The Firewall

STAGING AREA (Waiting Room)

💾
📡
⌨️
👔

5. Transfer of Command

Incidents can last hours. Fatigue sets in. You cannot just hand the pager to the next person and leave.

A formal Face-to-Face transfer is required, following a strict protocol. If you don't do this, the new IC starts with zero situational awareness.

TRANSFER_PROTOCOL.log

  • SITUATION STATUS:
    "Current state is stable but fragile."
  • DEPLOYMENT:
    "DB Team is restarting primary. Ops is monitoring."
  • PRIORITIES:
    "1. Verify Data Integrity. 2. Open Traffic."
  • CONFIRMATION:
    "Command is transferred. Time: 14:00."