Tic-tac-toe is a very simple game, which even humans can figure out completely by using elementary reasoning, and yet a computer first approaching it would face these huge numbers…

Looking ahead

Bill Casselman
University of British Columbia

Looking ahead

The question of whether the beginning position … is for one of the players already a winning position remains still open. With its answer chess would of course lose its status as a game.
—Ernst Zermelo, in his talk at the 1912 International Congress of Mathematicians in Cambridge

Recent interest in ‘Artificial Intelligence’ has provoked me to look at the early days in which people attempted to program computers to simulate Natural Intelligence, and even to outdo humans at certain human intellectual activities. My original intention was to try to understand what fundamental changes have taken place in the field—i.e., changes other than sheer enhancement of machine capability in speed and storage. But I became distracted by some rather technical questions, and I wound up focussing on attempts to make computers good—or at least competent—at games like checkers and chess.

The basic strategy for both humans and machines is to explore possible moves and counter-moves—requiring a look ahead at possible sequences of future play. But humans are able to restrict explorations to moves that seem more plausible winners. In effect, they apply imagination and experience, perhaps geometrical intuition, to cut down wasted effort. Computers did not have access to these. Instead, they could explore a much wider range of options at high speed, although perhaps spending a huge amount of time that a human would consider fruitless.

Exactly what kind of games am I going to consider? I have two rather different models in mind—one is chess, and the other is tic-tac-toe. But my discussion applies to any game meeting the following conditions.

  • There are two players, making moves alternately.
  • A game amounts to a sequence of positions $q_{0}, q_{1}, q_{2}, \dots $, and a move means that the current position changes from $q_{i}$ to $q_{i+1}$, according to some specified rules of the game. From each position only a finite number of moves are allowed.
  • A certain number of positions are terminal, and when one of these occurs the game is over. There are at least two kinds of terminal positions—those in which one of the players wins.
  • But now the two models differ. My ‘chess model’ will be a modified version of chess in which stalemates are checkmates, and a game can be played forever, in which case it is considered a draw. In the other model, every game is of finite length. In this case, there is a new kind of termination, which is a draw after a finite number of moves.

One of the simplest games meeting these conditions is tic-tac-toe.

  • A position is a display of $\times$ and $\circ$ in a $3 \times 3$ matrix.
  • Each player is assigned either $\times$ or $\circ$ as their mark, and a move by them amounts to their placing their mark in an empty spot in the current matrix.
  • The initial position is an empty matrix.
  • There are two kinds of terminal positions:
    • There are three $\times$ or $\circ$ in a line—horizontal, vertical or diagonal), or
    • there is no such triplet, and the matrix is full.
    In the first case, the player whose marks make up the triplet wins. The second case is a draw.

By convention, the first player to move is $\times$.

A sequence of moves in a tic-tac-toe game.

The game tree

.

To each of these games is associated a graph with oriented edges. The nodes are the initial sequences of positions in a game with moves determining the edges. That is to say, a position together with the route by which it came about.

Such a structure is called a tree: it has as root node an initial position, and to every other node is associated a unique path from that root.

In tic-tac-toe there are $9$ possible initial positions after the first move, and for each of these there are $8$ possible immediate sequels, etc. Without taking the terminal positions into consideration, there are therefore at most $$ 1 + 9 + 9 \cdot 8 + 9 \cdot 8 \cdot 7 + \cdots + 9! = 623,530 $$ possible nodes in this graph. By contrast, there are nine possible places in the $3 \times 3$ matrix. It can be empty or contain one of the two marks $\times$, $\circ$ So there are $3^{9} = 19,683$ possible positions. Tic-tac-toe is a very simple game, which even humans can figure out completely by using elementary reasoning, and yet a computer first approaching it would face these huge numbers. As I have said, computers generally replace intuition and experience by brute force. But then it takes very little effort for a machine to scan through the entire game tree, whereas for a human it would be a formidable task.

An initial fragment of the game tree is illustrated below:

A portion of the game tree showing some of the possibilities in the first five moves of a tic-tac-toe game.
It suffices to exhibit only a matrix, since the path leading to it is determined.

Forcing

What does a computer do with the game tree? Well, there is a remarkable ideal.

The basic idea in many games is to force an opponent’s moves. What does it mean to force a win? It means that a player has a strategy in mind that tells them how to respond to every move by an opponent in such a way that the eventual outcome is a victory for them.

This is familiar in chess problems, usually in which one is presented with a board populated by a small number of pieces, and asked to find a sequence of moves by, say, White, leading necessarily to checkmate in some given number of moves. Tic-tac-toe also offers similar phenomena, of a more trival nature. The position

A tic-tac-toe board with X in the center and O above it
is very bad news for $\circ$, since if they have placed themself in that position there is no way they can escape losing. This is something every player realizes very soon. It takes a lot more work to verify thoroughly that the position
A tic-tac-toe board with X in the center and O in the upper left-hand corner
is a forced draw.

The basic idea originated in 1912, at the International Congress of Mathematicians held that year in Cambridge, England. In one of his two lectures at that Congress, the German mathematician Ernst Zermelo announced that something like this held for all games like my chess model.

Zermelo’s Theorem. If a game satisfies my conditions on the chess model, then exactly one of three possibilities holds: (1) The first player can force a win, (2) the second player can force a win, or (3) either player can force a draw in the sense that they can force the other player to keep on going.

This is true even for really complicated games like chess, so in principle nobody ever need play a game—its outcome, at least in principle, is predetermined as soon as White and Black have been decided.

Zermelo’s proof of his theorem relied on subtle if elementary reasoning. It is made to look like a kind of tautology—with win/lose/draw looking somewhat like the dichotomy true/false. Unfortunately there were some flaws in his approach, as was pointed out a dozen or so years later by the Hungarian mathematician Dénes König, who fixed some of them (and in doing so contributed one of my favourite results in the theory of infinite graphs, as we shall see later). A year later, his compatriot Lázló Kalmár fixed everything. Zermelo’s errors lay mostly in how he handled the combinatorics of infinite games, and both Dénes König and Lázló Kalmár introduced some subtle and even profound ways to deal with this. In fact, Dénes König and Lázló Kalmár proved significantly more general versions of ‘Zermelo’s Theorem’. In particular, they extended it to cover any games that must end in a finite number of moves. They did not, however, give any indication of practical matters. In principle, one ought to be able to look at the rules of a game and announce a criterion for the game’s output. In fact, one can indeed write a program that will do that. But, as we have seen suggested by tic-tac-toe, the program might take a very, very long time.

Instead, the arguments of Zermelo and the Hungarians apply very elementary logic.

Evaluation

Practical implementation of Zermelo’s Theorem was first brought up in the classic book on game theory by von Neumann and Morgenstern.

The algorithm of von Neumann and Morgenstern, as modified to play real games, searches as many sequences of future moves as possible, trying to find the ‘best’ one. Ideally, in view of Zermelo’s Theorem it would in fact explore all options, and choose as its next move that which leads to the truly best outcome. Of course for serious games, this is impossible, since the number of options grows exponentially with the depth of the search. Real game-playing programs, starting with Arthur Samuel’s checkers program, replace ‘best’ by an estimate of the worth of various positions. For example, in checkers one might use the difference in the number of players’ pieces remaining on the board. That is to say, the computer assigns a value to each outcome it unpacks, and chooses an optimal path based on that.

Forget about approximate values. How are the true values of arbitrary positions—equivalently, nodes—found in a finite game? This was answered by von Neumann and Morgenstern. First of all, to each terminal position is assigned a value, say to the first player a $1$ if they win, a $-1$ if they lose, and $0$ if the outcome is a draw.

We now want to assign a value to every node in the game tree. First of all, define the terminal distance of a node of the game tree to be the maximum number of edges in the tree to a terminal position.

For example, the graph below is a fragment from the tic-tac-toe graph, listing all the successors of the top matrix.

A section of the tic-tac-toe tree showing several games ending with all squares filled.
Here are the terminal distances:
A tree showing the scores corresponding to the previous tic-tac-toe positions.
The basic fact is this, whose proof is an easy exercise.

Lemma. If a node has terminal distance $d$, then each of its successor nodes has terminal distance at most $d-1$, and at least one has terminal distance exactly $d-1$.

This allows us to compute easily the values of all nodes of terminal distance $1$, then $2$, $3$ etc. We do this in the following fashion. Suppose $x$ is a node with successors $y_{1}$, $y_{2}$, … By the time we want to evaluate the node $x$, according to the Lemma, we shall already know the values of the $y_{i}$. If $x$ is the first player, we choose as the value $v(x)$ of $x$ the maximum value of the $v(y_{i})$, while if it is the second player’s turn we choose the minimum.

Below are the evaluations of the terminal positions, followed by evaluations of positions of depth 1. Squares are where the first player is to move. The first player wants to move so as to maximize values. The second player wants to minimize them.

Evaluations of some possible positions. Evaluations of possible further positions.
Here are the evaluations of positions of terminal distance 2 and 3 in the same example:
Tree with evaluations of positions. Tree with further evaluations of positions.
Here is a brief recursive routine in pseudocode to assign values:

def v(n):
    if n is terminal: 
        return 1,0,-1 according to the game rules 
    else: 
        s = x's descendants (of smaller terminal distance than x) 
        if x is the first player:
            return the maximum value of v(y) for y in s
        else:
            return the minimum value of v(y) for y in s 

Strategies

What does this have to do with finding whether any given positions is win, lose, or draw? The point is that finding the value of a node gives you at the same time a best choice of move from that node. At the position taken by that move, the opponent is also given a best move for them. Etc. If $x$ is a node at which $\times$ is to play, the best move will go from $x$ to $y$ if $v(x) = v(y)$ is maximal among the successors of $x$; and if $y$ is a node at which $\circ$ is to play, the best move will go to a successor $x$ such that $v(y) = v(x)$ is maximal among the successors $y$. An argument by mathematical induction will show that a path from $x$ following these moves will be a terminal node with the same value as $x$, and that any move other than the optimal one cannot result in a better final outcome.

Pruning

A full search for evaluations takes a great deal of time and, likely, memory space. It can be made much more efficiently by a technique called pruning, which allows shorter searches. This is crucial for efficients searching. I cannot explain it in detail, because it is a bit complicated, but I can point out that even in my example it can be applied.

Suppose I am assigning values to the nodes in this diagram:

Tree with further evaluations of positions.

The scan takes place from left to right. When I evaluate the second node at level two, I can stop looking right there, because I know that no subsequent node will surpass $1$ in value, and I am looking for a maximum. There is a second kind of pruning available when I meet the third node at level three. It has value $-1$. Now I know that the true value of its parent node is looking for a minimum value among its successors. That minimum will certainly be at most $-1$. But at the level two higher, we will be looking for a maximum. I already have a maximum value of $1$, so I can stop my searching immediately.

But techniques for pruning make up another story.

More

What exactly did König do? He formulated and proved something that achieved some fame, and now carries his name.

König’s Lemma. Suppose we are given a tree with nodes $x$, and for each node a finite list of succeeding nodes. Then either the set of all nodes in the tree is finite, or there exists an infinite sequence of successors $(x_{i})$ in the tree.

Comprehending this is like looking at one of those optical illusions in which foreground shifts constantly to background—is it trivial, or not? The definitive answer is that it is not. Some idea of its illusory power can be found in the discussion in $\S$ 2.3.4.3 of Knuth’s book. Another illustration of its magic can be found in an intriguing footnote on page 60 of von Neumann-Morgenstern.

But I can present right here one application. Suppose we are given a game in which each node has a finite number of successors. Then assuming only a finite number of moves at each stage, and no infinite chains, König’s Lemma then implies that there are only a finite number of possible games.

Reading further

Chess and checkers

Technical articles

  • Oskar Morgenstern and John von Neumann, The theory of games and economic behaviour, Princeton University Press, 1953.

  • Ulrich Schwalbe and Paul Walker, Zermelo and the early history of game theory, Games and Economic Behavior 34.

    There are many accounts of what Zermelo, König, and Kalmár did. This the first that is both accurate and thorough. They point out many misconceptions in the literature, some rather astonishing. As far as I can see the quantity of misconceptions in the meantime has only increased, and they seem to far outnumber the accurate ones.

  • Wikipedia on Zermelo’s Theorem.

    As though to illustrate the observations of Schwalbe and Walker, this is a very inaccurate account. And refers to their article!

  • Heinz-Dieter Ebbinghaus (in Cooperation with Volker Peckhaus), Ernst Zermelo: an approach to his life and work, Springer, 2007.

    Section 3.4.1 contains an interesting account of Zermelo’s ICM lecture.

  • Donald E. Knuth, Fundamental algorithms, Addison-Wesley, Second edition 2003.

Even more technical articles

I must apologize for listing these here. Not only are they in German, but they are not easy to read even for specialists. Nonetheless, I believe it is useful to have the links.

    Pruning

  • Donald Knuth, An analysis of alpha-beta pruning, Artificial Intelligence 6 (1975).

Leave a Reply

Your email address will not be published. Required fields are marked *

HTML tags are not allowed.

49,947 Spambots Blocked by Simple Comments