Payoffs and strategies

It turns out that the game with alternate moves can be converted into a game with just one move...

Payoffs and strategies

Bill Casselman
University of British Columbia

Payoffs and strategies

In a recent Feature Column I discussed a basic search tool used by computers to explore options in playing certain games such as tic-tac-toe or a restricted version of chess. In this column, I want to extend that discussion, focusing on the question of an ideal strategy.

For the games at hand:

  • There are two players, making moves alternately.
  • A particular instance of a game amounts to a sequence of positions $p_{0}, p_{1}, p_{2}, \dots $, and a move means that the current position changes from $p_{i}$ to $p_{i+1}$, according to some specified rules of the game. From each position only a finite number of moves are allowed.
  • A certain number of positions are terminal, and when one of these occurs the game is over. At that point, there is a payoff. I'll adopt the convention that a positive payoff means that the first player (whom I'll call 'Max') wins that amount from the second ('Min'), and a negative one means that the second player wins, taking the payoff from the first player. A zero payoff corresponds to a draw.
  • Every game is of finite length.

Chess would qualify, if we imposed a rule (as in tournament play) disallowing a game that goes on forever. One of the simplest of games that definitively qualifies is tic-tac-toe. Here, the game ends in a win, loss, or draw. A natural choice of payoff would be that the losing player gives the winner one dollar, so the payoff would be $\$1$ if Max wins, $-\$1$ if he loses, and $\$0$ in case of a draw.

Incidentally, there is something slightly confusing in terminology. A 'game' has two meanings: one as a particular game played by two people, and the other the specification of how particular games are played. Thus one says that chess is a game, but one might also say that so-and-so lost the game.

The game tree

I want to allow games with payoffs more complicated than those determined just by winning or losing—I want to look at games that you might think of as more mathematical. To describe these payoffs, let's look more carefully at the structure of our games.

To each of the games under consideration is associated a graph with oriented edges. Its nodes are the initial sequences of positions in a game with moves determining the edges. That is to say, a position together with the route by which it came about.

Such a structure is called a tree: it has a root node as initial position, and to every other node is a unique path from that root.

For example, in tic-tac-toe there are $9$ possible initial nodes after the first move, and for each of these there are $8$ possible immediate sequels, etc. Because positions in which three Xs or Os lie in a line are not pursued, the total number of nodes is less than

$$ 1 + 9 + 9\cdot 8 + \cdots 9!/2! + \cdots + 9! + 9! = 986,410 \, . $$

In fact, computer simulations find there are $549,946$ nodes.

An initial fragment of this game tree is illustrated below:

A partial tree showing possible evolutions of a tic-tac-toe game

With our assumptions about which games are under consideration, the game tree is always finite—although it might be very large, even for a simple game.

In fact, every finite tree is the game tree for some game, although that game might not have an intuitive description. The rules of the game are quite simple—the positions of the game are the nodes of the graph, and the positions immediately accessible to a given position are the nodes of the graph one step away, on the side away from the root of the tree. The terminal positions are those without such 'children'. The payoffs can be any assignment of values to the terminal nodes.

I summarize: a game tree is a finite tree with payoffs assigned to its terminal nodes.

This characterization of games is a useful mathematical abstraction, although it has little to do with real games that are played by real people. The principal advantage of the abstraction is that if $n$ is any node of the tree, then the set of all its descendants is also a game tree. This allows analysis by mathematical induction.

In the following diagram, the colouring of discs indicate whose turn it is. A game amounts to a path from the top of the graph—the root of the tree—down to a terminal position—a leaf of the tree. The game is heavily biased in favour of Max, since all payoffs are positive. This might be compensated for by an initial payment from Max to Min, in order to make the game fair.

A tree showing different possible payouts.

In the particular play sequence indicated in the diagram, Max is paid $2$ units at the end of play.

The height of a game tree is the length of the longest path descending from the root. Thus the height of the tree in the diagram above is $6$. The height of any other node is the height of the game tree made up by its descendants. For example:

An example tree with nodes labeled by heights.

A digression

Suppose for the moment that Max and Min play a different game. In this one, each of them chooses secretly a number in the range $1$ to $4$. They then throw out, simultaneously, a hand displaying the corresponding number of fingers.

A person holds up 3 fingers.
Min chooses 3 fingers. (Image by LollyKnit, CC BY 2.0)

The payoff is dictated by the somewhat arbitrary $4 \times 4$ matrix below. Thus if both players choose their first strategy, the payoff is $\$7$.

A payoff matrix.

It turns out that there is in some sense a best way for each player to move. Recall that Max wants to maximize the payoff, while Min want to minimize it. The most that Max can gain is $\$7$, which suggests that he should choose strategy $\#1$. But if Min chooses $\#4$ he will then get only $\$1$, so this doesn't seem like a safe choice. If Max considers all the strategies, he observes that were he to choose $\#3$ he can be assured that he will get at least $\$3$ no matter what Min does, and that no other strategy choice can guarantee that much. So if he wishes to play conservatively he will choose $\#3$. Similarly, Min observes that if she chooses strategy $\#2$ the playoff will be at most $\$3$, and that any other choice opens up the possibility that she will lose more. These considerations by the players are indicated by annotations to the payoff matrix:

Payoff matrix showing maximum in each column and minimum in each row.

In other words, both players have a best conservative choice of strategy, and both choices produce the same payoff. The game is said to have a saddlepoint solution, for reasons that will appear if you interpret payoffs as altitudes on a map.

Not all games like this have saddlepoints.
They do if and only if there exist choices of strategies $a$ and $b$ for Max and Min such that

$$ p_{a,j} \ge p_{a, b}, \quad p_{i, b} \le p_{a,b} $$

for all $i$, $j$.

Strategies

What does the saddlepoint in the previous section have to do with the first kind of game we have been discussing? How can a game in which players put down strategy choices simultaneously in one single move have anything to do with a game in which players move alternately? It turns out that the game with alternate moves can be converted into a game with just one move.

In real game play, a player makes each move after scanning the consequences of alternate moves, at some distance into the future. But the total number of games—the number of descending paths in the game tree—is finite. So, in theory, a player can actually scan through all possible game plays to explore all possible consequences of various moves. A strategy for one of the players is a choice of moves for every possible position—in other words, a complete plan of action. It poses a response to every conceivable situation. The following diagram illustrates sample strategies for both players—every node in the game tree has a designated move to a node beneath it.

A game tree.

If both players choose strategies then they have also determined a particular play of a game. The corresponding path in the game tree is the unbroken path determined by the strategies. Thus, each pair of strategies is assigned a payoff or value. We are now in a situation much like that in the previous section— a strategy in the sense indicated here is essentially a variation of what it was in the previous section, it's just that the number of strategies each player has at hand can be very, very large.

Thus, in this version of the game with single comprehensive moves, each player makes up a strategy in secret, and then both players exhibit their choice.

I recall now what the discussion in the previous section says about how to recognize a saddlepoint. First look at what Max has to do. For each of his strategies he looks over all possible strategies for Min. Each gives rise to a payoff. He notes down the smallest of these as the value of that strategy. He then looks over the value of all his strategies, and takes as his best strategy that with the largest value.

If carried out in exactly this way, there is a terrific amount of work involved. Scanning all possible strategies is not usually feasible. So it is rather remarkable that this work can largely be avoided. The basic reason for this is that a best strategy for a game determines also a best strategy for the game associated to any of its nodes. This suggests a construction by induction.
We arrive at:

Theorem. To every node $n$ in the game tree at which it is Max's turn to play can be assigned a value $v(n)$ with these properties:

  • there exists a game starting at $n$ with payoff $n$;
  • no matter how Min plays, Max can find a sequence of moves that delivers a payoff $\ge n$.

A similar result holds for Min, but with $\le n$ instead of $\ge n$.

The value at one of Max's nodes is the maximum of the values at the children of that node. Similarly, the value at one of Min's nodes is the minimum of the values at the children of that node.

This tells us how to calculate the value of any node. If the node is terminal, its value is its payoff. If it is not terminal, it has children, which have height less than that of the given node. The value of the node is determined by the values of these, but how exactly how this happens depends on whose turn it is to move. If it is Max's, the value is the maximum of the values assigned to its children, while if it is Min's, it is the minimum of these values.

Thus calculation proceeds by induction on the height of a node. It can be easily implemented by a recursive routine:

def Value(a node):
    if the node is terminal:
        return the payoff assigned to it by to the game rules 
    else: 
        if it is Max's turn to play:
            return the maximum value among the children of the node 
        else:
            return the minimum value among the children of the node. 
Game tree with values.

Finding the value of a node gives at the same time a best choice of move from that node—to the child from which the node acquires its value. Doing this for all the nodes, in this way each player is assigned a strategy. I claim that these two strategies are optimal, and determine a saddle point. For Max's strategy to be optimal, for example, means two things:

  1. There exists at least one strategy for Min whose payoff is the value, say $n$, of the game;

  2. no matter what strategy Min chooses, the payoff will be at least $n$.

The first of these claims is an immediate consequence of the evaluation of the game and the construction of the strategies. The second follows by mathematical induction on the height of the game.

The verification for Min's strategy is an obvious modification of this.

Reading further

  • Oskar Morgenstern and John von Neumann, The theory of games and economic behaviour, Princeton University Press, 1953.

    This first introduced (in Section 15) the algorithm to evaluate a game, but the discussion is long and intimidating.

  • J. D. Williams, The compleat strategyst, published in the RAND series of MacGraw-Hill.

    The $4 \times 4$ game with a saddlepoint is the first example in this classic text, which goes on to consider games defined by a matrix that might not have a saddlepoint.

Leave a Reply

Your email address will not be published. Required fields are marked *

HTML tags are not allowed.

57,030 Spambots Blocked by Simple Comments