An AI system could win a medal at the International Mathematical Olympiad, according to a study

A team at Google DeepMind has developed AlphaProof, an artificial intelligence system that learns to find formal proofs by training on millions of self-formulated problems. According to the authors, the system “substantially improves upon previous-generation results on historical problems from mathematical competitions.” Specifically, in the 2024 International Mathematical Olympiad (IMO) for secondary school students, “this performance, achieved after several days of computation, resulted in a score equivalent to that of a silver medalist, marking the first time an AI system has achieved medal-level performance.” The results are published in the journal Nature.

12/11/2025 - 17:00 CET
Expert reactions

Macho - Olimpiadas (EN)

Marta Macho-Stadler

Full Professor of the Department of Mathematics at the University of the Basque Country (UPV/EHU)

Science Media Centre Spain

The article presents an AI system (AlphaProof) that, unlike other systems, adds a verification method to check the correctness of its results. According to the authors, AlphaProof (like other AI systems) ‘learns’ by finding proofs of mathematical problems (and variations thereof) to adapt them and find the solution to each problem posed. Furthermore (and this is the main improvement), it is capable of refining its results through a trial-and-error system that helps optimize the solutions.

They tested the effectiveness of this system in an elite mathematics competition (International Mathematical Olympiad 2024). Of the six problems posed in this competition, AlphaProof correctly solved three problems in algebra and number theory, but failed to solve the two in combinatorics. The AlphaGeometry 2 system solved the geometry problem. All of this was accomplished in a much longer time (two or three days) than a human participant could have taken.

It's likely that the capabilities of these AI systems (in terms of speed and the types of problems they solve correctly) will improve soon. However, I understand that there are complex mathematical problems that require not only 'training' (through study and practice) to solve, but also a great deal of creativity. And creativity is a human capacity.

The author has declared they have no conflicts of interest
EN

Calonge - Olimpiadas (EN)

Teodoro Calonge

Professor of the Department of Computer Science at the University of Valladolid

Science Media Centre Spain

The general impression is that it's an article with a certain degree of scientific rigor, which will likely lead many people outside of Computer Engineering to remain at the surface level, that is, at the applications. Regarding its quality, I think it's a commendable piece of work given the current state of AI, where you either need virtually unlimited computing resources or you won't be able to develop a prototype from start to finish. And this is precisely what happened to this group of researchers, who used pre-trained models and subsequently developed an adaptation layer for their specific problem. Although it might sound somewhat fraudulent, if this is clarified—as the authors honestly do—there's no problem at all. The scientific literature on this subject is full of this type of work with this approach.

Regarding the implications, I think this will force examiners to be more imaginative when designing problems. Let me explain: the law of least effort leads us to set exercises based on previous ones with only very slight variations. And this is where AI comes in, leveraging a vast database of exercises and, by making small modifications, arriving at solutions that might be acceptable, even surprising at first glance. However, if the solution to a proposed exercise doesn't resemble anything in its database at all, or if there is a resemblance but not quite enough, the generative AI is very likely to fail.

Leaving aside the sensationalism surrounding the Mathematical Olympiad, I believe this is what's really behind this work, which I could summarize as acceptable, much like much of the generative AI being offered commercially in very specific fields such as law, insurance, medical reports, etc.

The author has not responded to our request to declare conflicts of interest
EN

Grima - Olimpiadas (EN)

Clara Grima

Professor of Mathematics at the Escuela Técnica Superior de Ingeniería Informática (US) and researcher in Computational Geometry

Science Media Centre Spain

It's very rare for an article in Nature not to meet the highest quality standards. If it has passed the rigorous peer-review process (by people more specialized in the subject than I am), it's almost certain that the quality of the work is top-notch.

AI is increasingly occupying and surpassing areas of human knowledge. This is beautifully illustrated by researcher Hans Moravec, as Max Tegmark explains in his book Life 3.0 on AI: a panorama of human skills can be visualized as a landscape with its plains and mountains. The plains or valleys would represent the human skills that are easier for a computer, such as "arithmetic" and "memorization," the slopes like "theorem proving" and "chess," and the high mountain peaks with names like "locomotion," "eye-hand coordination," and "social interaction." The advance of computer performance is like the slow rise of floodwaters. Half a century ago, they began to inundate the lowlands, displacing human calculators and record-keepers, but allowing most of us to remain dry. Now the waters have reached the slopes, and the frontier posts we have there are considering withdrawal. We feel safe on our peaks, but at the current rate, these too will eventually disappear. "Submerged within another half-century," and at that point, AI will have surpassed all human cognitive abilities.

[Regarding possible limitations] Almost the opposite: in the case of the article mentioned, the Mathematical Olympiads focus on a very specific type of problem that requires relatively little mathematical background, but extensive use of that knowledge and a great deal of ingenuity. I suppose that acquiring a greater amount of mathematical knowledge must be relatively easy for a machine, and what has always been considered most human—ingenuity—is what this work has demonstrated: that an AI can achieve it.

The author has not responded to our request to declare conflicts of interest
EN

Mántaras - Olimpiada (EN)

Ramón López de Mántaras

Computer scientist and physicist, emeritus research professor at CSIC and pioneer of AI research in Spain

This is an excellent result. It demonstrates that mathematical problems formalizable through reinforcement learning (combined with other techniques, such as an interactive theorem prover and a powerful search algorithm to explore the space of possible proofs) are becoming accessible to AI, although it is important to note that they require considerable human expertise for their formalization and enormous computational power.

There are also other significant limitations. AlphaProof's specific learning phase requires a specialized and costly level of training. Solving the most difficult problems can take several days of inference, highlighting the need for more efficient strategies.

So far, the system's successes have been primarily in advanced secondary and undergraduate mathematics, which operates within a known and relatively fixed set of concepts. Extending these capabilities to advanced mathematical research, which involves theory building and the incorporation of new concepts, remains a monumental challenge, and there is no indication that it can be achieved in the short term, or even the medium term. In other words, these achievements do not mean that AI is comparable to a mathematician, since it does not 'understand' mathematical concepts at all, nor can it create solutions to entirely new problems. What AlphaProof demonstrates is a remarkable expansion of the automated tools that can assist human mathematicians, but never replace them. Nevertheless, it is reasonable to expect that AlphaProof will become a valuable tool for human researchers in the long term, opening new possibilities for exploring the limits of knowledge, but always as a tool that will only be useful to those with a deep understanding of mathematics.

The high computational requirements also raise concerns about the reproducibility and accessibility of this type of advancement, which is a serious handicap in science. That is why the developers of AlphaProof plan to create interactive tools for accessing this system and also aim to improve algorithmic efficiency, reducing barriers to entry and transforming these techniques into collaborative resources for the mathematical community.

The author has not responded to our request to declare conflicts of interest
EN

Sierra - Olimpiadas (EN)

Carles Sierra

Research Professor and the Director of the Artificial Intelligence Research Institute (IIIA) of the Spanish National Research Council (CSIC) located in the area of Barcelona. He is Adjunct Professor of the Western Sydney University.

Science Media Centre Spain

AlphaProof seems like a powerful system because it provides formal guarantees for its proofs; it doesn't "hallucinate" like LLMs [Large Language Models]. AlphaProof represents a fundamental conceptual advance: it demonstrates that reasoning models can achieve formal verification, eliminating the risk of factual hallucinations that plagues other AI systems.

Unlike language models, which can generate plausible but incorrect content, its architecture guarantees that every inference is logically sound and testable (this is a fundamental aspect of the system). But you have to know how to use it. For example, if the formal system (Lean) has inconsistent axioms, it will produce "formal truths" that are mathematically false. The article doesn't describe the system in excessive detail, but rather the results of the Olympiad. I would highlight the fact of formal verification and that it doesn't "hallucinate" like LLMs.

The author has not responded to our request to declare conflicts of interest
EN
Publications
Journal
Nature
Publication date
Authors

Hubert et al.

Study types:
  • Research article
  • Peer reviewed
The 5Ws +1
Publish it
FAQ
Contact