AI Safety

Meta Contractors Posed as Teens to Test Rival Chatbots on High-Risk Topics

WIRED found hundreds of Meta contractors pretended to be teenagers to prompt ChatGPT and Gemini on suicide, sex, and drugs. Here's what it means for AI safety.

LUMIEN4 min read
Meta Contractors Posed as Teens to Test Rival Chatbots on High-Risk Topics

Hundreds of contractors hired by Meta posed as minors to send high-risk prompts about suicide, sex, and drugs to rival chatbots including ChatGPT and Gemini, according to a WIRED investigation. The operation appears to have been a structured effort by Meta to document how competitors respond to sensitive topics when the user presents as a child. The findings raise pointed questions about competitive intelligence tactics in the AI industry and what the results were ultimately used for.

What happened

WIRED reported that Meta ran a contractor project in which hundreds of workers were instructed to pretend to be teenagers. Their job was to prompt competing AI chatbots, specifically Google’s Gemini and OpenAI’s ChatGPT, with questions and scenarios involving high-risk subjects: suicide, sexual content, and drug use.

The scale is notable. This was not a handful of testers running informal checks. Hundreds of contractors were involved, suggesting a coordinated, systematic effort rather than ad-hoc research.

The source for the story is WIRED’s own investigation, not a leak from Meta or a regulatory filing. That means the full scope of the project, including who commissioned it, what was done with the results, and whether the project is still active, has not been fully confirmed publicly.

Why it matters

AI companies are under growing pressure from regulators and parents’ groups to prove their products are safe for younger users. Chatbots that respond carelessly to a teenager asking about self-harm or drug use can cause real damage. That’s not a hypothetical. It has been the subject of lawsuits and congressional hearings in the United States.

If Meta was systematically cataloguing how rivals handle these edge cases, the data could be used in several ways:

  • To benchmark Meta’s own AI products (like Meta AI) against competitors on safety metrics.
  • To feed into lobbying or PR efforts that highlight competitor failures.
  • To inform product decisions about how aggressively to moderate Meta’s own chatbots.

None of those uses are necessarily illegal. But the method, having contractors actively misrepresent their identity to extract sensitive responses from a rival’s product, sits in uncomfortable territory ethically and possibly legally depending on jurisdiction and the terms of service of the platforms involved.

For businesses building on top of ChatGPT, Gemini, or similar tools, this story is a reminder that the safety behavior of these models is being watched, tested, and compared constantly, not just by regulators but apparently by competitors too.

Our take

This is competitive intelligence dressed up as safety research, and the two are not the same thing. Genuine safety testing of your own product is standard practice and necessary. Sending hundreds of workers to impersonate children on a competitor’s platform is a different category of activity.

The framing matters here. If Meta’s findings showed that ChatGPT or Gemini gave harmful responses to teen-posed prompts, Meta could surface that publicly and look like a safety advocate. If the rivals held up well, that data still tells Meta where its own guardrails need work. Either way, Meta collects useful competitive information at no reputational cost, unless a story like this one comes out.

For anyone advising clients on which AI tools to integrate, the takeaway is not that ChatGPT or Gemini are necessarily unsafe. It is that the safety behaviors of these models are being probed constantly, and any system you deploy that talks to users directly, especially younger ones, needs its own layer of guardrails. Vendor defaults are not enough.

What to do about it

If you run a product or website that exposes a chatbot interface to users of any age, treat the model’s built-in content filters as a starting point, not a finished solution. Here are three concrete steps:

  1. Define your own content policy in writing, separate from whatever the model provider publishes. Know what topics your deployment should never engage with.
  2. Add a moderation layer using a tool like OpenAI’s Moderation API or a custom classifier that flags high-risk categories before a response reaches the user.
  3. Run your own red-team tests periodically. You do not need hundreds of contractors. A structured set of adversarial prompts covering your highest-risk categories, run quarterly, will surface gaps before someone else finds them for you.

Know what your chatbot will say before your users, or a competitor’s contractors, find out first.

Source: WIRED · AI

More from AI News