Google DeepMind's AI systems, AlphaProof and AlphaGeometry 2, have achieved remarkable success by solving four out of six problems at this year's International Mathematical Olympiad (IMO). Their performance equalled that of silver medalists in the prestigious global competition for high school students.
The ability to tackle various math problems with step-by-step proofs has long been considered a significant challenge in machine learning, remaining out of reach for current state-of-the-art AI systems. "These are extremely hard mathematical problems, and no AI system has ever achieved a high success rate in these types of problems," stated Pushmeet Kohli, vice president of research focused on AI for science at DeepMind, during a press briefing.
AlphaProof operates through reinforcement learning, teaching itself by trial-and-error without human intervention. This method has previously powered DeepMind's AlphaGo, which mastered the game of Go, and AlphaStar, which excelled at Starcraft. For the IMO, the team first fine-tuned Google's Gemini model to translate 1 million mathematical problem statements from English into Lean programming language. AlphaProof then generated potential solutions for these problems, checking them against possible proof steps and feeding successful solutions into the model for continuous improvement.
AlphaProof successfully solved three IMO problems—two in algebra and one in number theory. While one problem was solved in minutes, the other two took up to three days. In contrast, students have two 4.5-hour sessions to submit their answers. However, AlphaProof couldn't solve two combinatorics problems.
AlphaGeometry 2, the other AI system, solved the competition's geometry problem in just 19 seconds. Given the scarcity of data to train math-focused AI models, the DeepMind team used synthetic data generated by AI itself to train AlphaGeometry 2. This system can solve 83% of math Olympiad problems from the past 25 years, a significant improvement over its predecessor, which could solve 53%. Overall, the AI systems scored 28 out of 42 possible points, placing them in silver-medal territory and just one point shy of the gold-medal threshold. At the recent competition, 58 out of 609 high school contestants worldwide were gold medalists.
Fields medalist Timothy Gowers, a mathematician at Collège de France and one of the judges who checked the AI's work, expressed surprise at the AI's ability to come up with clever ideas for solving problems, describing it as "a magic key." He praised the AI's performance as "a significant jump from what was previously possible" while noting that further research is needed to understand how the AI achieved these results.
David Silver, DeepMind's vice president of reinforcement learning, emphasized that while the AI systems are not yet contributing to the body of mathematical knowledge created by humans, they have reached a point where they can solve problems that challenge even the best young mathematicians in the world.