Fautons
Contact sales
Contact sales
6 min read AI ROIMeasurement

How to measure AI ROI without fooling yourself

How to measure AI ROI without fooling yourself

Why most AI ROI math falls apart

The standard enterprise AI report is a license count, a login chart, and a quote from an enthusiastic early adopter. None of that is ROI. Seats are a cost line; logins are curiosity; enthusiasm is week-two behavior. The numbers that matter — hours returned, cost per unit of work, cycle time, error rates, revenue per rep — rarely appear, because nobody wrote them down before the rollout.

That's the most charitable reading of McKinsey's finding that only 39% of organizations report any EBIT impact from AI, with most of those under 5% of EBIT. Some of that gap is value that doesn't exist yet. A meaningful share is value that exists but was never instrumented, so nobody can defend it in a budget review.

The classic failure is the unfalsifiable claim: "we saved 10,000 hours." Saved relative to what baseline? Measured how? Did anyone redeploy those hours into something the business can see? Without answers, the number evaporates the first time a CFO pushes on it.

The three layers: capability, usage, outcomes

Capability — can your people actually use AI for their role? Measured with skills checks and task-based tests, not training attendance. Capability is the leading indicator of everything downstream.

Usage — do they use it, weekly, inside real workflows? The honest signal is retention after week four, when novelty wears off. Usage that survives a month is habit; usage that doesn't was a demo.

Outcomes — does a baselined business number move? Hours redeployed, cost per ticket, days of cycle time, error rates, revenue per rep. Outcomes only count when attributable to the workflow that changed.

Each layer predicts the next, which means you can manage them in order. (This is the same Capability–Usage–Momentum lens behind our free AI proficiency assessment — twelve questions, instant score, no account.)

A baseline-first method, in five steps

The discipline is unglamorous and entirely doable:

  • Pick the workflow and write the baseline first — hours, cost, error rate, cycle time, dated and signed.
  • Instrument usage where the work happens, not in a survey three months later.
  • Attribute honestly: stagger the rollout across teams, or keep a comparison group, so the delta means something.
  • Convert hours carefully — time saved only counts when it becomes redeployed capacity, avoided cost, or measurable throughput.
  • Report a range, not a point estimate, and recompute quarterly. Precision you can't defend is worse than a defensible interval.

What good looks like at 3, 6, and 12 months

Three months: capability scores up, weekly usage holding after the novelty dip, and one workflow with a baseline number visibly moving.

Six months: two or three workflows with attributable gains, one scaled from a team to a function — and at least one pilot killed on schedule, because a kill list is what measurement discipline looks like from the outside.

Twelve months: function-level cost or throughput changes visible in ordinary operating reviews, and AI line items owned by the functions that benefit, not parked in an innovation budget. That's the point where AI ROI stops being a special report and becomes ordinary management.

Frequently asked questions

What's a realistic timeframe to see AI ROI?

Workflow-level impact in about 90 days if you baselined first. Function-level impact in six to twelve months. Enterprise EBIT impact is slower — which is consistent with McKinsey finding most reported impact still under 5% of EBIT.

Is "hours saved" a real ROI metric?

Only after conversion. Hours count when they become redeployed capacity, avoided hiring or vendor cost, or measurable throughput. Unconverted hours-saved claims should be discounted heavily — they rarely survive a CFO's second question.

What metrics belong on an AI adoption dashboard?

One per layer: a capability score from skills checks, weekly active usage inside target workflows (with week-four retention), and the baselined business number each workflow is supposed to move.

Why do so few companies report EBIT impact from AI?

Two stacked reasons: most pilots never change behavior (MIT found 95% show no P&L impact), and much of the value that does exist was never baselined or instrumented, so it can't be credibly claimed.

Sources

More from our Blog

June 15, 2026 8 min read

Can non-developers build software with Claude? A straight answer

Yes, to a point. Claude lets non-developers build real internal tools, prototypes, and MVPs. The honest limit isn't building the thing; it's knowing when it's safe to rely on it.

ClaudeBuilding with AIAI literacy
Read the article
June 15, 2026 7 min read

Building business apps with Claude: the long tail of tools nobody had time to build

Every company has a backlog of small internal tools that were never worth a developer's time. Claude changes that maths, if you also handle the governance.

ClaudeBuilding with AI
Read the article
June 15, 2026 7 min read

Building a CRM with Claude: when a custom one beats an off-the-shelf one

A CRM is mostly structured data and a few workflows, which is exactly what Claude Code is good at. The question isn't whether you can build one, but whether you should.

ClaudeBuilding with AI
Read the article