Researchers at Google DeepMind said an AI-proof system has autonomously solved nine open problems associated with Paul Erdős , the Hungarian mathematician. The researchers said two of the Erdős problems solved by the agent had remained open for 56 years.
The announcement came days after OpenAI said one of its models had disproved a long-standing conjecture in discrete geometry.
Another 9 open Erdos problems solved, this time by DeepMind team.
Interesting loop of LLM – Lean agents working autonomously, and only after it’s verified formally, going through human review. pic.twitter.com/DqNC6sleUg
— Przemek Chojecki | PC (@prz_chojecki) May 24, 2026
The Google paper posted on arXiv on May 21 said the lab built Alpha-Proof Nexus, a framework that combines large language models with lean, a formal proof system where each logical step can be checked by software. The researchers said the most capable agent solved 9 of 353 open Erdős problems and proved 44 of 492 conjunctions from the online encyclopedia of integer sequences.
The paper said the system is being used in combinatorics, optimization theory, graph theory, algebraic geometry, and quantum optics research.
Speaking at the Big Technology Podcast the Google DeepMind CEO Demis Hassabis, said, today’s systems are nowhere near AGI. “It doesn’t matter how many Erdős problems are solved, I think it’s far, far from what a true invention or someone like a Ramanujan would have been able to do”, he added.
🚨 Google DeepMind CEO Sir Demis Hassabis:
“Today’s systems, are nowhere near [AGI]. Doesn’t matter how many Erdős problems you solve… I think it’s far, far from what a true invention or someone like a Ramanujan would have been able to do”
it’s over for the Erdős hype pic.twitter.com/n8WJBoy18p
— NIK (@ns123abc) May 25, 2026
Development comes after OpenAI said on May 20th that an internal model had disproved a conjecture in the planar unit distance problem, first posted by Erdős in 1946. The problem asks how many pairs of points among n points in a plane can be exactly one unit apart.
The revealing belief was that square grid constructions were close to the best possible way to maximize such unit distance pairs. The company said its model found an infinite family of examples that gave a polynomial improvement, and that the proof had been checked by external mathematicians.
The two announcements point to a shift in how AI labs are testing mathematical reasoning. OpenAI said its results came from a general-purpose reasoning model, not from a system trained specifically for mathematics or built to search that problem. Google’s paper took a different road, using formal proof research, lean verification, and agent designs that could refine proof sketches through feedback.
Google’s research has also flagged limits. In failures, the agents sometimes push the difficult part of the problem to an unproved helper lemma (a proven proposition or theorem that is used as a stepping stone to achieve a larger, more significant result). In the other case, the system claimed that unproved lemmas were established results, which the authors described as hallucinations after manual inspections.
The claim from both labs is not that mathematicians are being replaced but rather, that AI systems are beginning to produce work that survives formal verification or expert review. OpenAI said people still choose the problems, interpret the results, and decide what questions should be pursued next.
Also Read: Google DeepMind Launches AI Powered Mouse Pointer



