Fautons
Contact sales
Contact sales
9 min read AI literacyProduct managementUX

Five research-backed frameworks for designing human–AI products

Five research-backed frameworks for designing human–AI products

You are not the first to design this

Every team building an AI feature hits the same wall: how much should we explain? When should it ask permission? How do we stop people trusting it too much, or too little? It feels like uncharted territory, so teams improvise, and usually improvise badly.

It isn't uncharted. Human-computer interaction researchers have studied exactly these questions for over two decades, and the good answers are written down. Below are the five frameworks I keep coming back to in training. None require a research background to use. Each gives you a shared vocabulary and a checklist you can apply this week.

1. Google's PAIR Guidebook: start from needs and failure

Google's People + AI Research team published the People + AI Guidebook as a practical playbook for designing AI products, organised into six chapters: user needs and defining success, mental models, explainability and trust, data and evaluation, feedback and control, and errors and graceful failure.

The two chapters teams skip are the two that matter most: mental models (if users can't predict what the system will do, they can't trust it) and graceful failure (AI is wrong sometimes, so design the wrong-answer path as carefully as the right one). The whole framework comes with worksheets, which makes it the easiest of the five to actually run in a design review.

2. Microsoft's 18 Guidelines: the evidence-based checklist

In 2019, a Microsoft Research team led by Saleema Amershi distilled two decades of AI design thinking into 18 concrete guidelines for human-AI interaction, then had 49 practitioners stress-test them against real products. That empirical grounding is why senior PMs respect this one: it's a tested checklist, not opinion.

The guidelines are grouped by when they apply: initially (G1: make clear what the system can do; G2: how well it does it), during interaction, when the system is wrong (support efficient correction and dismissal), and over time. If you read only two, read G1 and G11 (make clear why the system did what it did). Most AI trust complaints trace straight back to violating one of them.

3. Mixed-initiative interaction: design the handoff

Back in 1999, Eric Horvitz described mixed-initiative user interfaces: instead of pure automation or pure manual control, the human and the machine take turns, with control passing to whichever is better placed at each step. The paper predates large language models by twenty years and suddenly reads like it was written for them.

It's the cleanest way to think about agents. A coding agent that plans, waits for your approval, then executes is textbook mixed-initiative: the machine proposes, the human decides, the machine acts. Naming the pattern stops the unhelpful "autonomous vs. not" argument and focuses the design on where the handoffs should be.

4. Trust calibration: the goal isn't more trust

This is the most important framework for anyone shipping AI into high-stakes work, and the most misunderstood. In 2004, Lee and See showed that the goal isn't to maximise trust in automation. It's to calibrate it. Users should trust the system exactly as much as it deserves: no more, no less.

Both failure modes are real. Overtrust and people stop checking, so they miss the errors. Undertrust and they ignore correct output, so the tool is wasted. Confidence indicators, source citations, and honest "I'm not sure" fallbacks aren't polish. They're trust-calibration instruments. For a security or finance audience this reframes the whole conversation, because they already think in calibrated risk.

5. Levels of autonomy: name how much it does alone

The fifth is the autonomy scale (L1 assistive through L5 full) adapted from the levels used in self-driving cars. It gives a team a shared way to place a feature: is this suggesting, acting under supervision, or acting between checkpoints? I've written about it in depth separately, so here it's enough to say it belongs in this set.

Used together, these five cover the lifecycle: PAIR for what to design, Microsoft's guidelines for whether you did it, mixed-initiative for who's in control, trust calibration for how much to rely on it, and autonomy levels for how much it does alone. The full breakdown of the levels is in the five levels of AI autonomy.

How to actually use them

You don't adopt five frameworks at once. Pick the one that fits the argument you're stuck in. Designing the error path? PAIR. Reviewing a feature for completeness? Microsoft's 18. Arguing about how autonomous to make it? Mixed-initiative plus the autonomy levels. Worried people will over-rely on it? Trust calibration.

The deeper win is cultural. A team that shares this vocabulary stops debating whether the AI is "good" and starts debating specific, decidable design questions, which is the difference between shipping an AI feature and shipping a good one. Getting a team to that point is most of what our hands-on AI training does; it pairs naturally with the model-app-harness-tool vocabulary we teach in the same first session.

Frequently asked questions

What are the main frameworks for human-AI interaction design?

Five cover most of the ground: Google's People + AI (PAIR) Guidebook, Microsoft's 18 Guidelines for Human-AI Interaction, mixed-initiative interaction (Horvitz, 1999), trust calibration (Lee & See, 2004), and the levels of autonomy. Together they address what to design, whether you did it well, who holds control, how much to rely on the system, and how much it acts alone.

What is trust calibration in AI?

Trust calibration, from Lee and See's 2004 research, is the idea that the goal isn't to maximise users' trust in an AI system but to match it to how reliable the system actually is. Overtrust makes people miss errors; undertrust makes them ignore correct output. Confidence scores, citations, and honest uncertainty are the tools for calibrating it.

What is mixed-initiative interaction?

A 1999 framework from Eric Horvitz describing interfaces where control alternates between human and machine based on who is better placed at each step, not full automation, not fully manual, but a designed handoff. Modern AI agents that plan, seek approval, then act are a direct example.

Do I need an HCI background to use these frameworks?

No. Each one gives you a plain-language vocabulary and a checklist you can apply in a design review without academic training. The value is shared language. It turns vague debates about whether the AI is 'good' into specific, decidable design questions.

Sources

More from our Blog

June 15, 2026 8 min read

Can non-developers build software with Claude? A straight answer

Yes, to a point. Claude lets non-developers build real internal tools, prototypes, and MVPs. The honest limit isn't building the thing; it's knowing when it's safe to rely on it.

ClaudeBuilding with AIAI literacy
Read the article
June 15, 2026 7 min read

Building business apps with Claude: the long tail of tools nobody had time to build

Every company has a backlog of small internal tools that were never worth a developer's time. Claude changes that maths, if you also handle the governance.

ClaudeBuilding with AI
Read the article
June 15, 2026 7 min read

Building a CRM with Claude: when a custom one beats an off-the-shelf one

A CRM is mostly structured data and a few workflows, which is exactly what Claude Code is good at. The question isn't whether you can build one, but whether you should.

ClaudeBuilding with AI
Read the article