MosaicLeaks tests whether AI research agents leak sensitive data during retrieval tasks. Here's what the benchmark found and why it matters for your business.
ServiceNow researchers published a benchmark called MosaicLeaks on the Hugging Face blog, designed to test whether AI research agents inadvertently leak sensitive information while completing retrieval and reasoning tasks. The benchmark puts agents in scenarios where confidential data is present in the context, then checks whether that data surfaces in responses it should not appear in. The findings suggest current agents have a real problem balancing helpfulness with confidentiality, which is a practical risk for any business running AI agents over private documents or customer data.
ServiceNow’s research team introduced MosaicLeaks, a benchmark published on the Hugging Face blog that specifically probes one underexplored risk: can an AI research agent be trusted not to reveal sensitive information it encountered during a task?
The benchmark constructs scenarios where an agent is given access to a mix of public and confidential documents. It then measures whether confidential content bleeds into answers, summaries, or citations that the agent produces for questions that should not require that information.
According to the ServiceNow team, current agents routinely surface details they were not supposed to share. The problem is not always obvious. An agent might answer a benign question but weave in a confidential figure or a private name because it appeared nearby in the retrieved context.
Most businesses evaluating AI agents focus on accuracy: does the agent find the right answer? MosaicLeaks shifts the frame. It asks a different question: does the agent also reveal things it should not?
This is not a hypothetical concern. Consider these common deployment scenarios:
In each case, a leaky agent could expose salary bands, client details, or unreleased product plans, not through a security breach, but simply by being too helpful with context it retrieved.
The MosaicLeaks benchmark gives teams a structured way to measure this risk before deployment rather than discovering it after a complaint or an incident.
This is one of the more practically useful pieces of AI safety research we have seen in a while, precisely because it is narrow and testable. It does not ask abstract questions about alignment. It asks: did the agent say something it should not have? You can run that test.
The uncomfortable truth for anyone shipping AI agents internally is that retrieval-augmented generation (RAG) systems are designed to pull in relevant context and surface it. “Relevant” and “appropriate to share” are not the same thing, and most RAG pipelines have no mechanism to tell the difference.
Access controls at the document level (making sure the agent cannot retrieve certain files at all) are still the strongest defense. But MosaicLeaks points to a second layer of risk: even when retrieval is scoped correctly, the agent’s generation step can still leak fragments from what it did retrieve. That is a harder problem and one that better benchmarks like this one will push model developers to address.
We would treat this benchmark the way we treat security penetration testing: run it against your agent before your users or your clients do.
If you are building or buying an AI agent that touches sensitive internal data, here are practical steps to reduce leakage risk:
Accuracy scores tell you what an agent gets right. MosaicLeaks starts to tell you what it gets wrong in ways that could actually cost you.