This website uses cookies

Read our Privacy policy and Terms of use for more information.

It is a little after two in the morning on a payments incident bridge. Two of the bank's largest payment applications are degraded, the kind of degradation that does not show on an uptime dashboard because the platforms are all green. Cards are clearing. Something downstream is not. There are eleven people on the call, and one of them is an AIOps tool, which has just posted a confident root cause into the channel: a database connection pool, with a tidy probability score next to it.

Half the bridge wants to act on it. The other half has been here before. The incident commander, who is accountable to a regulator for this exact business service, asks the only question that matters at two in the morning. How do we know the tool is right.

Nobody can answer cleanly. The model correlated the right signals, but correlation at two in the morning is not the same thing as causation, and the person who has to sign the post-incident report cannot put "the AI said so" in the box marked root cause. So the bridge does what good bridges do. It treats the suggestion as a lead, not a verdict, and works the problem. The tool was, as it happens, half right. The pool was a symptom. The cause was a change three layers away.¹ 

What the keynotes promise, and what the data actually says

If you only read the conference stages, you would think autonomous, self-healing operations arrived in 2024 and we are all just slow to notice. The data tells a quieter story.

Gartner surveyed 782 infrastructure and operations leaders late in 2025 and found that only 28 per cent of AI use cases in I&O fully succeed and meet their return on investment, with one in five failing outright. Of the failures, 57 per cent blamed expecting too much, too fast. That is not a technology problem. That is a sales narrative meeting an operations reality.

The most honest field account I have read this year comes from Thoughtworks, whose Zichuan Xiong wrote up what the firm learned across twenty AIOps proofs of concept in 2025, eleven of which reached production. The verdict was blunt. Autonomous remediation and auto-healing "have not scaled beyond controlled environments". The role AIOps actually plays is "cognitive augmentation, not autonomous agency". Where it worked well, level-one and level-two ticket volume fell by 35 to 40 per cent and root-cause cycles shortened from hours to minutes. Real value. Assistive value.

Even the vendors' own numbers admit the gap when you read past the headline. Riverbed's 2025 AIOps survey leads with 87 per cent of respondents reporting good return. Read on and only 12 per cent have reached full enterprise deployment, and only 46 per cent trust their own data. An industry that does not trust its own data is not ready to let that data drive a P1 unattended.

Here is the part the slides leave out. The work did not disappear. It moved. Catchpoint's SRE Report 2026 found teams still spending around a third of their time on toil, with one in six saying AI has increased it, because the new toil is the machine itself: prompt engineering, model monitoring, and explaining the AI's recommendations to everyone downstream. We automated the easy detection and inherited a new job: verifying the machine. 

Both sides are right about the wrong thing

It is tempting to read all that and pick a side. The enthusiast position writes itself: real wins, real percentages, real careers being made. The sceptic position writes itself too: a slower-than-promised reality, a new tax on the people downstream, a confident model that is sometimes wrong in expensive ways.

The enthusiasts are not wrong. The Thoughtworks numbers are real. Intercom's Fin engineering organisation doubled R&D output in nine months and tripled it over sixteen, while downtime from breaking code changes fell 35 per cent even as deployments doubled. (I have been quietly jealous of that organisation for years, and I will be quietly jealous for a few more.) Teams that lean into this discipline are pulling ahead, and waiting it out is not a strategy.

The sceptics are not wrong either. Toil is up. Trust in the data is down. The supervisory burden is real, and the person carrying it is rarely the person who signed off the deployment. Catchpoint's 2026 survey catches the split in a single pair of numbers: 60 per cent of directors say AI cut their team's toil, and only 38 per cent of the engineers underneath them agree. (And the ones doing the carrying are, in my experience, the ones least likely to be in the room when next year's slide is written.)

The trouble is not that one side has the better story. Charity Majors nailed the diagnosis in her June essay on AI enthusiasts and sceptics: the asymmetry is structural, not malicious. The wins and the costs land on different teams, on different days, in different forums. The wins get trumpeted in all-hands and conference talks. The costs surface in SRE retros and the kind of grumbling that does not make it into the OKR pack. Neither side ever hears the other cleanly, so each side starts arguing with a caricature of the other, and the missing feedback loop under the noise never gets fixed.

The wins and the costs land on different teams, on different days, in different forums.

That is the conversation I want to have. Not "is AIOps overhyped" or "are sceptics behind the times", but: what does the honest version of this look like inside a regulated bank, where the cost of getting it wrong is measured in regulator letters, not Reddit threads. 

The bit that genuinely works

 None of this means AIOps is hollow. It means the value is concentrated in a narrower band than the marketing suggests, and that band is real.

The production-proven capability is event correlation and noise reduction. Take a flood of alerts from a dozen tools, deduplicate them, cluster them into incidents, enrich them with topology and change context, and you remove most of the screaming so a human can hear the signal. I have seen what that is worth. On the TAMM platform in Abu Dhabi, across a year-long engagement serving 3.2 million citizens, we cut alert noise by 45 per cent and halved mean time to resolution, with the operations team owning the alerting policy and the platform team owning the telemetry. Those are field numbers, measured over the engagement, not a vendor benchmark.

Anomaly detection and AI-suggested root cause are the next band up, and they are genuinely useful as long as you remember the word "suggested". The reliable engines here lean on a topology map rather than statistics alone, which is why causal approaches tend to beat pattern-matching for the "what changed" question. But the honest practitioner framing holds: the model proposes, the human disposes.

Then there is the band everyone wants and almost nobody runs at scale: closed-loop automation that fixes the problem without a human. It exists. It works. It is confined, sensibly, to small, reversible, pre-approved actions. Restart a stuck process. Bounce a hung session. Scale a pod. The Bank of England and FCA's most recent survey of AI in UK financial services found that 55 per cent of AI use cases involve some autonomous decision-making, but only 2 per cent are fully autonomous.² Two per cent. That is the real frontier, and the firms running it are not the ones shouting about it. 

Two per cent. That is the real frontier, and the firms running it are not the ones shouting about it.

The uncomfortable truth: in a bank, the ceiling on automation is accountability, not technology 

The reason a bank cannot simply turn the dial to autonomous is not that the technology is too weak. It is that someone has to be accountable for the decision, and you cannot delegate accountability to a model.

Incident commanders keep saying a version of the same line to me, half-laughing and half-not. "I would love to trust the tool. The regulator does not trust me when I do." That is what accountability gravity sounds like in plain English. The technology argument and the regulatory argument are not running in parallel. The regulatory one always wins, in the end, because it is the one whose name appears on a letter.

Watch how the regulations converge on this, from every direction, without any of them mentioning AIOps by name. Under the EU's DORA regulation, in force since January 2025, a major ICT incident must be notified within hours and reconstructed in a final report within a month. If an AI tool shaped how that incident was classified or handled, its role is now inside the regulated reporting loop, and "the model decided" is not a reconstruction. The UK's operational resilience regime asks every firm to stay within an impact tolerance for each important business service. A tool you cannot explain that could push a critical service past its tolerance is a finding waiting to happen.

Closest to home for any bank in the Gulf, the Central Bank of the UAE published its Guidance Note on responsible AI in February 2025. Two of its expectations matter enormously for operations. Institutions must be able to explain how an AI system works and the logic behind its decisions. And institutions remain fully accountable for AI outcomes, including models bought from a third party, with no exceptions. Most AIOps is bought from a third party. Read those two sentences together (and read them slowly, ideally with a fresh coffee, because the operating model genuinely writes itself from them) and you have the spine of every control you are about to need.

Even the model-risk rulebook has just moved. In April 2026, the US regulators replaced the long-standing SR 11-7 model-risk guidance with SR 26-2, and notably placed generative and agentic AI outside formal model-risk scope, pushing it instead into the firm's broader risk governance. That is not a loophole. It means the newest, most opaque tools do not get a lighter touch. They get governed by everything else you have, and the burden of proof sits with you. 

Climb the ladder by earning each gate

So how should a bank actually progress, without either freezing in fear or buying a slide? I use a simple ladder, and the point of it is the gaps between the rungs, not the rungs themselves.

Most published maturity models share the same five-step structure, progressing from reactive monitoring to full autonomy. The trouble is where they put the summit. The widely copied five-level AIOps model ends at "full automation with no human interaction". For a bank, that is not the summit. That is the cliff.

The only serious treatment of the human checkpoint I know comes from outside our field entirely, in Parasuraman and Sheridan's 2000 work on levels of automation, where the middle levels are defined precisely by whether the human approves the action, or has a window to veto it. Aviation has been here for forty years. The pilots I most trust fly with both hands near the yoke even on autopilot, because they know exactly what the autopilot is and is not authorised to do, and what the next thirty seconds look like if it disengages. Operations needs the same posture, and our vendor models tend to leave that bit out.³

The five stages are familiar: reactive, correlated, assisted, bounded autonomy, supervised autonomy at scale. The work is in the gate you must pass to move up.

To go from reactive to correlated, pass the data-quality gate. Sophisticated models do not rescue bad telemetry; they amplify it, which is why nearly half of teams admit they do not trust their own data. To go from correlated to assisted, pass the explainability gate. You can show an incident commander and a validator why the tool flagged what it did. To go from assisted to bounded autonomy, pass the accountability gate, and this is the one vendors skip, and regulators require: a named human owns the outcome and can override, the model is independently validated, every action is auditable on the regulator's clock, and a tested non-AI fallback exists. To go beyond that, you need a track record, drift monitoring, and a regulator who is comfortable. Almost no bank passes that gate for major incidents, and saying so out loud is the credible position, not the weak one.

Notice what each gate is. It is not a feature you buy. It is a control the regulator already expects you to have. Which means the work of becoming more automated and the work of becoming more defensible are the same work. 

Where this breaks

This is not a counsel of caution dressed up as a framework, so let me be clear about the failure modes.

The most sobering evidence I have seen comes from a study presented at RSA Conference 2025, bluntly titled "When AIOps Become AI Oops". Researchers showed that an AI operations agent could be steered to the wrong remediation by planting crafted text in ordinary logs and traces, and, worse, the agent then dressed its false conclusion in real, specific detail it fetched itself, pulling the exact software version to make a fabricated diagnosis look authoritative. The standard defences barely dented it. An assistant that can be confidently, plausibly wrong is more dangerous than one that is obviously wrong, because the plausible one survives the human check.

An assistant that can be confidently, plausibly wrong is more dangerous than one that is obviously wrong, because the plausible one survives the human check.

The other way this breaks is human, not technical. Point an outcome metric at a team without first agreeing who owns it, and an AIOps signal stops being a diagnostic and becomes a blame engine. Fix ownership before you fix tooling. The model can tell you what changed. It cannot tell you whose job it was to know. 

What to do on Monday

If you run operations in a bank, you do not need to pick a side in the autonomy debate this week. You need to find out which gate you are standing at.

Pick your most critical business service. Ask three questions about it. Do we trust the telemetry feeding any AI that touches it? Can we explain to someone who was not on the bridge why the tool says what it says? And if the tool is wrong at two in the morning, who overrides it, and how fast? Wherever the first honest "no" lands, that is your gate, and that is where the next quarter's work is. 

The bank that climbs this ladder one gate at a time ends up with operations that are both more capable and easier to defend. The bank that buys autonomy off a slide ends up explaining to its regulator why a model it cannot reconstruct made a call it cannot undo. The regulator was never the brake. The regulator was the design spec all along.

¹ The model was not "wrong", in the strict sense. It was statistically reasonable, given the signals it saw. It just was not right. The distinction matters when you are filling in the box marked root cause and the person reading the form has not seen the dashboard.

 ² The third and most recent published edition of the Bank of England and FCA survey, from November 2024. The fourth is being re-run during 2026; until it lands, these are the current numbers, and I would not expect the autonomous share to have moved far.

 ³ I keep waiting for a vendor to ship an "approve / veto / take over" UX as a first-class control surface, not a buried tickbox. When one does, the maturity conversation gets easier almost overnight. Until then, we are building that control ourselves out of

Reply

Avatar

or to participate

Keep Reading