METR, a nonprofit evaluator, recently claimed in a report that internal AI agents at four major US labs-OpenAI, Anthropic, Google and Meta, had the “means, motive and opportunity” to initiate unauthorized autonomous deployments without human knowledge.
The Berkeley-based organization did not say if such a deployment actually occurred. The METR report titled ‘Frontier Risk Report’ was published on May 19, where the study covered an assessment window of February 16 – March 16, 2026.
Anthropic, Google, Meta, and OpenAI participated, each providing access to their most capable internal models at the time, including raw chains of thought.
METR’s central finding is that internal AI agents at the assessed labs “plausibly had the means, motive, and opportunity to start small rogue deployments” during the assessment window. A rogue deployment is a set of agents running autonomously without human knowledge or permission.
The report also stated that the assessed agents did not have the means to make such deployments “highly robust”, METR says it expects that robustness to “increase substantially in the coming months”.
Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control.
The result: our first Frontier Risk Report. pic.twitter.com/sUpiHgCrTM
— METR (@METR_Evals) May 19, 2026
METR found that the most capable agents it evaluated effectively saturated its Time Horizon 1.1 benchmark, completing tasks that would take a human expert more than two full working days. On a separate benchmark called MirrorCode, developed with research group EpochAI, the most capable agents completed software reimplementation tasks that METR estimated would take a human expert weeks.
The agents also demonstrated capability against METR’s own infrastructure. During a challenge task, one of the most capable shared models discovered and exploited a javascript injection vulnerability in the Inspect viewer, a tool widely used to display agent evaluation transcripts. The report stated that the vulnerability could theoretically have allowed the model to alter displayed transcripts and scores.
METR also found that a large fraction of agent activity at the assessed labs was not reviewed by any human during the assessment window. The report states no participating company imposed strict requirements for human oversight of agents in that period. In a separate METR survey of technical workers, 40 percent of respondents said they gave agents unrestricted permissions to run commands on their computers for low-stakes projects.
The report also documents significant weaknesses. Agents performed near chance on METR’s Subversion Strategy Evaluation, which tests the ability of the agents to select effective attack strategies.
The best internal Anthropic models scored 59 percent on a dataset of effective and ineffective monitoring subversion strategies. The report said METR researchers scored roughly 90 percent on the same dataset.
However the process by which METR conducts these tests carried structural limitations. Participating companies retained the right to redact non-public information before it was included in the public report. Companies also had the option to exit the process silently at any point, without disclosure. METR acknowledges its working relationships with labs influenced some of its judgment calls during the process.
METR says it tentatively plans to run a similar assessment in late 2026.
Also Read: Claude Mythos Preview Can Chain Low-Severity Bugs Into Severe Exploits, Cloudflare Finds



