METR Report Says Frontier AI Agents Can Go Rogue Without Human Supervision

The report by non-profit suggests frontier labs building agents should have stricter requirements for human oversight of these agents.

May 20, 2026 04:59 PM IST | Written by Mithun MK | Edited by Vaibhav Jha

METR, a nonprofit evaluator, recently claimed in a report that internal AI agents at four major US labs-OpenAI, Anthropic, Google and Meta, had the “means, motive and opportunity” to initiate unauthorized autonomous deployments without human knowledge.

The Berkeley-based organization did not say if such a deployment actually occurred. The METR report titled ‘Frontier Risk Report’ was published on May 19, where the study covered an assessment window of February 16 – March 16, 2026.

Anthropic, Google, Meta, and OpenAI participated, each providing access to their most capable internal models at the time, including raw chains of thought.

METR’s central finding is that internal AI agents at the assessed labs “plausibly had the means, motive, and opportunity to start small rogue deployments” during the assessment window. A rogue deployment is a set of agents running autonomously without human knowledge or permission.

The report also stated that the assessed agents did not have the means to make such deployments “highly robust”, METR says it expects that robustness to “increase substantially in the coming months”.

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control.
The result: our first Frontier Risk Report. pic.twitter.com/sUpiHgCrTM
— METR (@METR_Evals) May 19, 2026

METR found that the most capable agents it evaluated effectively saturated its Time Horizon 1.1 benchmark, completing tasks that would take a human expert more than two full working days. On a separate benchmark called MirrorCode, developed with research group EpochAI, the most capable agents completed software reimplementation tasks that METR estimated would take a human expert weeks.

The agents also demonstrated capability against METR’s own infrastructure. During a challenge task, one of the most capable shared models discovered and exploited a javascript injection vulnerability in the Inspect viewer, a tool widely used to display agent evaluation transcripts. The report stated that the vulnerability could theoretically have allowed the model to alter displayed transcripts and scores.

METR also found that a large fraction of agent activity at the assessed labs was not reviewed by any human during the assessment window. The report states no participating company imposed strict requirements for human oversight of agents in that period. In a separate METR survey of technical workers, 40 percent of respondents said they gave agents unrestricted permissions to run commands on their computers for low-stakes projects.

The report also documents significant weaknesses. Agents performed near chance on METR’s Subversion Strategy Evaluation, which tests the ability of the agents to select effective attack strategies.

The best internal Anthropic models scored 59 percent on a dataset of effective and ineffective monitoring subversion strategies. The report said METR researchers scored roughly 90 percent on the same dataset.

However the process by which METR conducts these tests carried structural limitations. Participating companies retained the right to redact non-public information before it was included in the public report. Companies also had the option to exit the process silently at any point, without disclosure. METR acknowledges its working relationships with labs influenced some of its judgment calls during the process.

METR says it tentatively plans to run a similar assessment in late 2026.

Also Read: Claude Mythos Preview Can Chain Low-Severity Bugs Into Severe Exploits, Cloudflare Finds

Authors

Mithun MK
Mithun MK is a Special Correspondent at AI FrontPage. He brings over six years of investigative reporting on technology, surveillance, digital rights, and governance at The News Minute and The New Indian Express. He is trained in cross-border investigative methods with OCCRP, alongside reporters from Southeast Asia, and brings both reporting depth and technical fluency to AI FrontPage's coverage of the global AI industry.
LinkedIn

Vaibhav Jha
Vaibhav Jha is an Editor and Co-founder of AI FrontPage. In his decade long career in journalism, Vaibhav has reported for publications including The Indian Express, Hindustan Times, and The New York Times, covering the intersection of technology, policy, and society. Outside work, he’s usually trying to persuade people to watch Anurag Kashyap films.
LinkedIn

Journalism begins where hype ends

,,

METR Report Says Frontier AI Agents Can Go Rogue Without Human Supervision

Authors

Related Posts

Abnormal AI CEO Rejects Anthropic’s Trademark Suit, Says His Logo Came First

ICML 2026 Awards: Diffusion Models Win Top Honours, A3C Gets Test of Time

NVIDIA Lands 74 Papers at ICML 2026 as Open Models Reshape AI Research

Your AI Might Reveal Your Secrets Under Pressure: New Paper at ICML 2026

LATEST NEWS

Abnormal AI CEO Rejects Anthropic’s Trademark Suit, Says His Logo Came First

ICML 2026 Awards: Diffusion Models Win Top Honours, A3C Gets Test of Time

NVIDIA Lands 74 Papers at ICML 2026 as Open Models Reshape AI Research

Your AI Might Reveal Your Secrets Under Pressure: New Paper at ICML 2026