You can't promise your team a quiet launch. You can give them the bad night in advance. That's what a pre-mortem is.
In this drop
The point: A pre-mortem isn't pessimism; it's naming the hard moment in daylight so it's recognised, not feared, at 3am.
Why it matters: Teams that rehearsed the bad moment treat the first wobble as information; teams that assumed calm treat it as catastrophe.
Try this next week: Run a ten-minute pre-mortem before your next launch: it's 3am and this broke, tell me the story.
The point
A launch I'll never forget, for the wrong reasons. Tested, load-tested, dashboards green. Forty minutes in it wobbled, and the team fell apart. Not because the wobble was huge. Because nobody had told them it was coming, so the first trouble read as 'it's all going wrong' instead of 'here's the hard moment'.
I heard a story this week about a dad and his ten-year-old before a cup final. Instead of 'relax, have fun', he said: you're going to have to suffer today, there'll be a moment you feel you can't win. It came. The boy played through it, because he'd been warned. The other team, winning then losing, fell apart, because nobody had told them.
You cannot promise a quiet launch. What you can do is tell the team where it's going to hurt before it hurts. That's a pre-mortem: it's 3am, two days from now, this has gone wrong, what broke, who's awake, what do they reach for.
Reality check
What changes everything: a team that rehearsed the bad night meets the first wobble with recognition, not panic.
One proof
Gary Klein formalised the pre-mortem in Harvard Business Review in 2007: imagine the project has failed, then work backwards to why. Field note: on one rota, after we started running a ten-minute pre-mortem before significant changes, time-to-rollback-decision on the incidents that followed dropped from roughly twenty-five minutes to under ten across the next quarter. The team wasn't braver. They'd just met the night before.
Where this breaks
This breaks if the pre-mortem becomes theatre, a document nobody reads on the night. The output isn't the doc; it's a team that recognises the failure shape. Keep it to ten minutes and to the people who'll actually be on call.
Try this next week
Before your next launch, get the on-call people together for ten minutes. No longer.
Ask one question and write the answers: it's 3am, this change broke, tell me the story (symptom, dashboard, who gets the call, first action, rollback, who calls it).
Don't aim to predict the exact failure. Aim to make the shape of a bad night familiar.
Three links I'm watching
Gary Klein, 'Performing a Project Premortem' (HBR, 2007): the original, two pages, still the best version.
Google SRE Workbook on incident response: the rollback-decision and command structure this habit feeds.
Signal Drop, 'Progress Isn't Linear': the companion on expecting the rough patch rather than fearing it.
One question for you
When did a launch last go sideways on you, and would a ten-minute pre-mortem have changed how the night felt?
Allan
📕 The book is out. Metrics & Mayhem: A CTO's Guide to Observability That Actually Works. Kindle is live now; paperback and hardback launched on 1 June.
|
Everything in one place New here? Start with the free chapter. | ||||
More from Metrics & Mayhem
|
Analytics on Live Data Without Leaving Postgres
When analytics on Postgres slows down, most teams add a second database. TimescaleDB by Tiger Data takes a different approach: extend Postgres with columnar storage and time-series primitives to run analytics on live data, no split architecture, no pipeline lag, no new query language to learn. Start building for free. No credit card required.


