This website uses cookies

Read our Privacy policy and Terms of use for more information.

In partnership with

What you get from this one: a clean way to tell the difference between a metric that is failing and a metric that is just having a Tuesday.

In this drop

  • The point: reliability metrics move up and down every month, and most of that movement is noise, not signal.

  • Why it matters: treating every wobble as a verdict makes teams thrash. New tool, new process, every quarter, and nothing runs long enough to work.

  • Try this next week: define a noise band for your headline metric before you look at next month's number.

The point

A few years ago, I sat in a service review where MTTR had ticked up by a few minutes. By the third slide, someone had said the word 'tooling.' We were ready to rewrite a plan we'd run for a single quarter, on the strength of one month's numbers.

I've watched that reflex play out for years since. The number moves, so we change something. It moves again, so we change something else. One team I worked with cycled through three monitoring changes in four quarters, chasing a figure that never sat still. Always busy. Never running one model long enough to learn from it.

Here is the part I missed for too long. A monthly MTTR is an average of a handful of incidents. One ugly P1, root cause three vendors deep, drags the whole month sideways on its own. Your strategy did not change. One bad week landed in the sample.

Reality check

Reality check: a metric you would rewrite your strategy over every month was never measuring your strategy. It was measuring your nerves.

One proof

Source: DORA's State of DevOps research has spent years making one quiet point. You judge delivery performance on a small, stable set of metrics, read as a trend over time and benchmarked against your own past, not on any single month's reading.

Field note: The team above made three tooling changes across four quarters. By the end, nobody in the room could describe the current approach in a single sentence. The trash had eaten the strategy.

Where this breaks

A noise band is not a blindfold. A sharp, sudden collapse, a metric that falls off a cliff inside a week, is not a wobble, and the window does not apply. This habit is for slow month-to-month drift, not for a system that is visibly on fire. If you genuinely cannot tell the two apart, that is its own problem, and worth a separate conversation.

Try this next week

      Pick the one reliability metric your leadership actually watches. Only one.

      Before you open next month's figure, write the noise band: how much movement, sustained for how long, would genuinely change your mind. For most teams, three months in the same direction.

      Name who owns that call. When the metric breaks the band, that person decides. Until then, the strategy runs untouched.

  1. Listen to the episode: Signal Drop: Progress Isn't Linear
    Use it as a quick reset if your team has started to accept too much drift.

  2. DORA (dora.dev): the research home for the four key delivery metrics and why you read them as trends, not snapshots.

  3. Google SRE Workbook, 'Implementing SLOs': the cleanest practical guide to picking a metric and a window before you need them..

One question for you

What is the longest you have watched a reliability metric drift the wrong way before someone in a meeting said the word 'tooling'? Hit reply. I read everyone

Allan

PS: The episode runs about five minutes. Listen here: SPOTIFY.

The IIoT Postgres Limits No One Talks About Until Production

Most IIoT teams don't realize Postgres is at its limit until queries start failing in production.

Our new white paper, The IIoT PostgreSQL Performance Envelope, maps exactly where Postgres hits its limits with industrial sensor data and what you can do before you're forced into a split architecture. No hand-waving. Real benchmarks, real query patterns, real thresholds.

If you're building on IIoT telemetry and still deciding whether Postgres can scale with you, this is the data you need.

Reply

Avatar

or to participate

Keep Reading