Tony’s Take January 2024

This month’s topics:

We have two items this month on the use of large language models (LLMs) in mathematics, both reported in recent Nature articles. In one item, the LLM was able to push forward a research-level problem in combinatorics; in the other, it solved geometry questions from the International Mathematical Olympiad, a math competition for high schoolers. LLMs work by predicting, based on a huge number of samples, the next word in a sentence, or, as in these examples, the next line of code in a program or the next step in a proof. They work amazingly well, but since they have no internal model of reality to work with, the predicted text may have no truth value, the predicted code may not work, and the predicted step may lead nowhere. In each instance, the authors relied on more conventional computer mathematics to keep the LLMs on track.

“DeepMind AI outdoes human mathematicians.”

The first of these applications of LLMs appeared in Nature back in December 2023, and was highlighted in a “News” piece by Nature‘s math commentator Davide Castelvecchi: “DeepMind AI outdoes human mathematicians on unsolved problem.” The journal article was written by a team of twelve, mostly from Google DeepMind in London, with leaders Bernardino Romera-Paredes, Pushmeet Kohli and Alhussein Fawzi.

The researchers used an LLM to attack a research question that Castelvecchi describes as a generalization of the card game “Set.” The “Set” deck has 81 cards, each bearing one, two or three figures. The figures on a single card are all identical. But between cards, the figures differ: They can be ovals, diamonds, or squiggles; can be colored red, purple, or green; and can be outlined, solid, or shaded. So one card could bear two solid purple diamonds, while another could have three shaded green ovals. There are 3$\times$ 3$\times$ 3$\times$ 3 = 81 possible combinations of number, shape, color, and shading, one on each of the 81 cards in the deck. A SET is three cards which in each of the four aspects are either all the same or all different.

SETs
Each column forms a SET. In the three cards in each SET each of the aspects (number, shape, color, shading) are either all the same or all different. Image by Tony Phillips.

Mathematicians have worked out that if you lay out 21 different Set cards, there must be at least one SET among them. But what happens if we complicate the game? Imagine that each figure has five properties, or six, or $n$. (Keep the number of options for each property fixed at three.) In general, the size of the cap set, the largest set of cards possible without allowing a SET, is unknown. In an equivalent geometrical formulation this is called the cap set problem; the authors of the Nature article tell us that Terence Tao (possibly the world’s leading combinatorialist) once called it “perhaps my favorite open question.”

The authors studied the cap set problem, along with other combinatorics problems, with an AI strategy they call “FunSearch.” (The Fun is for “function.”) Rather than directly look for solutions (like a new cap set), they look for a program that will in turn find solutions. The implementation involves a Large Language Model trained on around one million samples of code, an evaluator that applies the LLM output programs to the problem at hand and grades their performance, and a genetic programming component (it uses an analogue of natural selection) that “breeds” the best-rated programs to generate new ones, which are put back into the LLM. At any point, the team can stop the process and pick out the highest-functioning program.

A flow chart showing the cycle the LLM goes through: It generates programs, evaluates them, and then adds them to its database. A program then serves as a prompt for it to generate more programs.
The steady-state part of the flow chart for the FunSearch routine, showing the Large Language Model, the Evaluation mechanism and the genetic programming component generating the next input for the LLM. Figure 1 from Romera-Paredes et al., used under Creative Commons Attribution 4.0 International License.

When FunSearch was applied to the cap set problem with 8 “properties” (to continue with Castelvecchi’s Set-game formulation) it generated a program that constructed a 512-element cap set. Previously, 496 was the size of the largest cap set known for $n=8$. The authors remark that FunSearch was able to discover a genuinely new cap set “from scratch,” on its own, and by methods different from those that had previously been used.

AI takes on Geometry in the Math Olympiads.

The second LLM-related item, “Solving olympiad geometry without human demonstrations,” appeared in Nature, January 17, 2024. The team (He He from New York University and Trieu H. Trinh, Yuhuai Wu, Quoc V. Le, and Thang Luong from Google DeepMind in Mountain View, CA) reported on the prowess of their “geometry theorem prover” AlphaGeometry. They tested it on a set of 30 problems taken from the International Mathematical Olympiads since 2000.

The authors make it clear that they have not solved the general problem of how to communicate geometric data to a machine. As they say, this is a “separate and extremely challenging research topic.” Instead, they restrict themselves to problems which can be translated into the input language they use, one that works with “human-like non-degeneracy and topological assumptions,” as they explain it. They report that this language was suitable for 75% of the IMO geometry problems. Problems involving geometric inequalities and combinatorics were also set aside. Of the remaining 30 problems, AlphaGeometry solved 25. Compare to the average Silver medalist at the IMO, who solved 22.9 questions, while the average Gold medalist solved 25.9.

The authors distinguish between a geometry problem that can be solved using only data included in the problem statement, and a problem that requires one or more additional constructions: adding to the figure points, lines, etc. that are not specified in the premises. The latter type of problems require creativity on the part of the solver. AlphaGeometry combines a symbolic deduction engine (which can be used to derive all the possible consequences of a set of premises) and a language model which has been trained on nearly a billion problems and can be used to predict a relevant construction, much as ChatGPT can predict the next word in a sentence. That new material is fed back into the deduction engine, and the process cycles until one of the deduced consequences is the solution to the problem.

The authors give as an example of AlphaGeometry’s work its solution of Problem 3 from the 2015 Olympiad.

Left, a geometry problem from the IMO with a diagram. Right, AlphaGeometry has added points into the diagram, and creates a proof based on these added constructions.
Problem 3 from the 2015 Math Olympiad, and a very abridged account of AlphaGeometry’s solution (only three of the 109 steps are shown). In the second figure, the geometrical elements added during the proof are colored blue: in particular the new points D, E and G, midpoints respectively of BH, MK and CH. Images 1e and 1f from the cited article, used under Creative Commons Attribution 4.0 International License.

Note: The exhibited proof of IMO 2015 P3 uses a nonstandard notation that may confuse readers as it confused me. This is the convention of directed angles (thanks to Thang Luong for this link). Part of the convention is (a) the notation $\angle$XOY means the angle involved in turning the segment XO counterclockwise to match YO and (b) all angle measurements are taken modulo 180$^{\circ}$. In particular the statement $\angle$GMD = $\angle$GO$_2$D must be interpreted this way.

directed angles convention
In terms of directed angles, if points A, B, C, D are on a circle, then $\angle$ABD = $\angle$ACD (and conversely) whether or not B and C are on the same side of AD. Image by Tony Phillips.

$\ell$-adic numbers in Scientific American.

Simple Math Creates Infinite and Bizarre Automorphic Numbers” by Manon Bischoff ran in Scientific American, January 11, 2024. It starts with automorphic numbers (see below) but it’s mostly about $\ell$-adic numbers: taking $\ell = 10$, a 10-adic number is an infinite sum like $5 + 2\cdot 10 + 6\cdot 10^2 + 3\cdot 10^3 + \dots$. Let’s not worry for now about convergence in the usual real-number structure, but accept it as an analogue of an infinite decimal like $a_0.a_1a_2a_3\dots$, which can be written as $a_0 + \frac{a_1}{10} + \frac{a_2}{10^2} + \frac{a_3}{10^3} + \dots$. Continuing the formal analogy, we can add and multiply 10-adic numbers exactly as we do the same operations with decimal expansions. This is the arithmetic of 10-adic numbers.

10-adic arithmetic can be tricky: consider automorphic numbers. An automorphic number is a natural number which reappears at the end of its square. Bischoff explains that when you take the sequence of automorphic numbers $5, 25, 625, 90625, 890625, …,$ and extend it forever, you get a number $n$ which equals its square; it’s infinitely long, but we can calculate that it ends in $…8212890625$. So the 10-adic number $5 + 2\cdot 10 + 6\cdot 10^2 + 0\cdot 10^3 + \dots +$ satisfies $n^2 = n$. But then $(n-1) \times n = n^2 – n = 0$, even though $n-1$ and $n$ are both nonzero, exhibiting non-zero zero divisors, which is very bad for arithmetic. But, as Bischoff tells us, the problem goes away if we work with a prime number as our base instead of 10, and consider 2-adic numbers like $1 + 0\cdot 2 + 1\cdot 4 + 0\cdot 8 + \dots$, 3-adic numbers like $1 + 2\cdot 3 + 0\cdot 9 + 1\cdot 27 + \dots$, etc. The digits in a $p$-adic number take the values $0, 1, \dots, p-1$. Any calculation that can be done with real numbers can be duplicated with the $p$-adics—we even have $p$-adic analysis, a flourishing area of research extending calculus to the $p$-adics.

There is however an enormous difference in topology, the way the numbers fit together. In the $\ell$-adics two numbers are close if their difference is divisible by a high power of $\ell$. So, for example, the $10$-adic numbers $0$ and $2$ are far apart, while $10^{20}$ and $10^{30}$ are close together; whereas in the usual real-number topology, where two numbers are close if the absolute value of their difference is small, $10^{20}$ and $10^{30}$ are very distant. In particular, in the $\ell$-adics, the higher powers of $\ell$ themselves get closer and closer to 0. This is why a sequence like $5, 25, 625, 90625, \dots$ converges in the $10$-adic topology: the difference between successive terms accumulates more and more zeroes at its end, making it divisible by higher and higher powers of 10; it goes to zero.