Calling AI Agents “Employees” Causes Humans to Miss 18% More Errors

A Boston University study found people caught 18% fewer errors when AI output was framed as coming from an "AI employee" rather than a chatbot tool.

LUMIENJune 30, 20264 min read

Calling AI Agents “Employees” Causes Humans to Miss 18% More Errors

A study by Emma Wiles, a business professor at Boston University, found that managers caught 18% fewer errors in work they were told came from an agentic "AI employee" compared to work they were told came from a chatbot. The only variable was the framing: same output, different label. The finding has direct implications for companies that are giving AI agents human names, job titles, and defined responsibilities as a way to make adoption feel more natural.

What happened

Emma Wiles, a professor at Boston University’s business school, ran a study on how people supervise AI-generated work. Participants reviewed output from what was described to them in two different ways: as work from a software chatbot, or as work from an agentic “AI employee” with a name and a defined role.

The AI output was identical in both cases. But when participants believed they were reviewing work from a named “AI employee,” they caught 18% fewer errors than when they thought they were reviewing chatbot output.

The research was reported by MIT Technology Review, which covered Wiles’s findings as part of a broader look at how companies are anthropomorphizing their AI tools.

Why it matters

A lot of companies are right now making a very specific design choice: they are giving their AI agents human names, org-chart positions, and stated responsibilities. The logic is that it helps employees relate to the tools and speeds up adoption.

Wiles’s study suggests that choice comes with a real cost. When a person thinks they are working alongside a “coworker,” even a non-human one, they apply social assumptions built up over years of working with other people. Those assumptions include a baseline level of trust. You do not scrutinize a colleague’s work the same way you scrutinize output from a piece of software.

That shift in scrutiny is not a small thing. An 18% drop in error detection is a meaningful quality control failure, particularly in domains where AI agents are being used for tasks like drafting contracts, writing code, producing financial summaries, or generating customer-facing content.

The problem is not that AI agents make more mistakes when called employees. The problem is that humans catch fewer of them.

Our take

We have seen this play out in client work. When a team talks about their AI tool the way they talk about a junior hire, something shifts. Reviews get shorter. Feedback gets softer. People stop asking “is this actually correct?” and start asking “did Alex do a good job?”

Those are very different questions. The first is about output quality. The second is about performance management, and it is the wrong frame entirely when the “employee” is a language model.

The Wiles study gives a number to something that felt anecdotally true. And 18% is not a rounding error. If your team is using AI agents to produce anything that a human used to check carefully, the name you give that agent and the way you describe its role in your workflow is a real variable, not just a branding decision.

There is also a subtler issue worth naming: companies that anthropomorphize AI tools may be doing it partly because it makes the tools easier to sell internally. That is a legitimate goal. But the cost of that ease is reduced oversight, and reduced oversight is exactly the problem most AI deployments cannot afford right now.

What to do about it

If your business is deploying AI agents, review how you are describing them to staff. A few specific things to consider:

Avoid human names and job titles for AI tools. Descriptive labels like “contract drafting tool” or “ad copy generator” keep the focus on what the output is and why it needs review.
Build explicit review steps into workflows. Do not rely on people to self-regulate their scrutiny. If an AI agent produces a draft, make “check for errors” a named, required step, not an assumption.
Train staff on the framing effect. Show them the number: 18% fewer errors caught. That is concrete enough to change behavior.
Test your own error rates. Seed known mistakes into AI output and see how many your team catches. Do this before and after any rebranding of your tools.

The label you put on your AI agent is a design decision with measurable consequences. Treat it like one.

Source: MIT Technology Review

More from AI News

What happened

The research was reported by MIT Technology Review, which covered Wiles’s findings as part of a broader look at how companies are anthropomorphizing their AI tools.

Why it matters

The problem is not that AI agents make more mistakes when called employees. The problem is that humans catch fewer of them.

Our take

Those are very different questions. The first is about output quality. The second is about performance management, and it is the wrong frame entirely when the “employee” is a language model.

What to do about it

If your business is deploying AI agents, review how you are describing them to staff. A few specific things to consider:

Avoid human names and job titles for AI tools. Descriptive labels like “contract drafting tool” or “ad copy generator” keep the focus on what the output is and why it needs review.

Build explicit review steps into workflows. Do not rely on people to self-regulate their scrutiny. If an AI agent produces a draft, make “check for errors” a named, required step, not an assumption.

Train staff on the framing effect. Show them the number: 18% fewer errors caught. That is concrete enough to change behavior.

Test your own error rates. Seed known mistakes into AI output and see how many your team catches. Do this before and after any rebranding of your tools.

The label you put on your AI agent is a design decision with measurable consequences. Treat it like one.

Calling AI Agents “Employees” Causes Humans to Miss 18% More Errors

What happened

Why it matters

Our take

What to do about it

More from AI News

DiScoFormer: One Model That Estimates Density and Score Together

Anthropic Cuts Claude Pricing in Half for California Government

OpenAI Teases a Codex Hardware Device, Launching July 15

Calling AI Agents “Employees” Causes Humans to Miss 18% More Errors

What happened

Why it matters

Our take

What to do about it

More from AI News

DiScoFormer: One Model That Estimates Density and Score Together

Anthropic Cuts Claude Pricing in Half for California Government

OpenAI Teases a Codex Hardware Device, Launching July 15