A Boston University study found people caught 18% fewer errors when AI output was framed as coming from an "AI employee" rather than a chatbot tool.
A study by Emma Wiles, a business professor at Boston University, found that managers caught 18% fewer errors in work they were told came from an agentic "AI employee" compared to work they were told came from a chatbot. The only variable was the framing: same output, different label. The finding has direct implications for companies that are giving AI agents human names, job titles, and defined responsibilities as a way to make adoption feel more natural.
Emma Wiles, a professor at Boston University’s business school, ran a study on how people supervise AI-generated work. Participants reviewed output from what was described to them in two different ways: as work from a software chatbot, or as work from an agentic “AI employee” with a name and a defined role.
The AI output was identical in both cases. But when participants believed they were reviewing work from a named “AI employee,” they caught 18% fewer errors than when they thought they were reviewing chatbot output.
The research was reported by MIT Technology Review, which covered Wiles’s findings as part of a broader look at how companies are anthropomorphizing their AI tools.
A lot of companies are right now making a very specific design choice: they are giving their AI agents human names, org-chart positions, and stated responsibilities. The logic is that it helps employees relate to the tools and speeds up adoption.
Wiles’s study suggests that choice comes with a real cost. When a person thinks they are working alongside a “coworker,” even a non-human one, they apply social assumptions built up over years of working with other people. Those assumptions include a baseline level of trust. You do not scrutinize a colleague’s work the same way you scrutinize output from a piece of software.
That shift in scrutiny is not a small thing. An 18% drop in error detection is a meaningful quality control failure, particularly in domains where AI agents are being used for tasks like drafting contracts, writing code, producing financial summaries, or generating customer-facing content.
The problem is not that AI agents make more mistakes when called employees. The problem is that humans catch fewer of them.
We have seen this play out in client work. When a team talks about their AI tool the way they talk about a junior hire, something shifts. Reviews get shorter. Feedback gets softer. People stop asking “is this actually correct?” and start asking “did Alex do a good job?”
Those are very different questions. The first is about output quality. The second is about performance management, and it is the wrong frame entirely when the “employee” is a language model.
The Wiles study gives a number to something that felt anecdotally true. And 18% is not a rounding error. If your team is using AI agents to produce anything that a human used to check carefully, the name you give that agent and the way you describe its role in your workflow is a real variable, not just a branding decision.
There is also a subtler issue worth naming: companies that anthropomorphize AI tools may be doing it partly because it makes the tools easier to sell internally. That is a legitimate goal. But the cost of that ease is reduced oversight, and reduced oversight is exactly the problem most AI deployments cannot afford right now.
If your business is deploying AI agents, review how you are describing them to staff. A few specific things to consider:
The label you put on your AI agent is a design decision with measurable consequences. Treat it like one.