This morning, I came accross Cameron Sun’s Probabilistic Tic-Tac-Toe on HN. However, the comments were disappointing, with people proposing probabilistic models, linear programming, minimax algorithms, and heuristic evaluations to develop AI approaches for the game, none of which quite correct. In this post, I’ll present an optimal solution.

An earlier version of this post included a mistake. Thanks to Orson Peters for pointing it out and even sending me an email. Check out his blog!

The game

You should definitely try it but as a reminder, probabilistic Tic-Tac-Toe is like tic-tac-toe but each cell is given a probability distribution. For example, if you play the center cell in the grid below, you have a 30% chance of marking it with your symbol, a 15% chance of not doing anything, and a 55% chance of marking it with the opponent’s symbol.

Probabilistic Tic-Tac-Toe

Dynamic programming

If one ignores the probability of not marking a cell, the game is an obvious example of dynamic programming. You have less than $3 ^ 9 = 19683$ states (the exact number is 5478) and one can compute the probability of the “cross” player winning in each state with the following pseudo-code:

def value(state, turn):
    if state is a winning state for cross:
        return 1
    if state is a winning state for nought:
        return 0
    if turn == cross:
        return max(
            success(c) * value(apply(state, c, cross), nought)
            + failure(c) * value(apply(state, c, nought), nought)
            for c in available_cells(state)
        )
    else:
        return min(
            success(c) * value(apply(state, c, nought), cross)
            + failure(c) * value(apply(state, c, cross), cross)
            for c in available_cells(state)
        )

where success(c) and failure(c) are the probabilities of marking and not marking cell c.

Handling loops

The above pseudo-code is not enough to solve the game because of the possibility of doing nothing. If we were to simply add neutral(c) * value(state, turn) to the above pseudo-code, we would end up with an infinite loop.

However, let’s look at what the problem looks like for the cross player:

$V(s) = \max_c s_c \times V’(s + c) + f_c \times V’(s - c) + n_c \times V’(s)$

where $V(s)$ is the value of state $s$ for the cross player, $s_c$ is the probability of marking cell $c$ with a cross, $f_c$ is the probability of marking $c$ with a nought, $n_c$ is the probability of doing nothing, and $V’(s)$ is the value of state $s$ when it is the turn of the nought player. I also note that $s + c$ is the state $s$ where cell $c$ has been marked by the cross player and $s - c$ is the state $s$ where cell $c$ has been marked by the nought player.

But $s_c \times V’(s + c) + f_c \times V’(s - c)$ is something we can pre-compute without making loops since it depends on states with more marked cells. Let’s note that quantity $x_c$.

The problem now becomes $V(s) = \max_c x_c + n_c \times V’(s)$ where $n_c$ is less than 1.

Again, we can expand $V’(s)$ and get $V’(s) = \min_c s_c \times V(s - c) + f_c \times V(s + c) + n_c \times V(s) := \min_c x’_c + n_c \times V(s)$ where $x’_c$ is a quantity we can pre-compute and $n_c$ is less than 1.

The full equation becomes thus:

\[\begin{align*} V(s) &= \max_c x_c + n_c \times V'(s) \\ V'(s) &= \min_c x'_c + n_c \times V(s) \end{align*}\]

We can rewrite the above equations as:

\[\begin{align*} y = \max_i f_i(x) = f(x) \\ x = \min_i g_i(y) = g(y) \end{align*}\]

where $f_i$ and $g_i$ are linear functions.

Plot

Here, we are going to admit that there is a unique solution to the above equations. The curious reader can try to prove it.

There is an efficient way to solve this problem, that can generalize to any number of curves in $\mathcal{O}(N \log N)$ using some convex hull trick. However we can solve the subproblem in an easy way by using binary search.

def hull_intersection(f, g):
    """
    f and g are two lists of tuples (a, b) representing linear functions
    solves
    y = max_i f_i(x)
    x = min_i g_i(y)
    """
    a, b = 0, 1
    while b - a > 1e-9:
        x = (a + b) / 2
        y = max(a * x + b for a, b in f)
        x1 = min(a * y + b for a, b in g)
        if x1 < x:
            b = x
        else:
            a = x
    x = (a + b) / 2
    y = max(a * x + b for a, b in f)
    return y, x

Solution

We can now solve the problem. I invented a simple way to generate grids that seems to imitate the original game.

import random
from functools import lru_cache


def hull_intersection(f, g):
    """
    f and g are two lists of tuples (a, b) representing linear functions
    solves
    y = max_i f_i(x)
    x = min_i g_i(y)

    Also returns the indices of the optimal functions
    """
    a, b = 0, 1
    while b - a > 1e-9:
        x = (a + b) / 2
        y = max(a * x + b for a, b in f)
        x1 = min(a * y + b for a, b in g)
        if x1 < x:
            b = x
        else:
            a = x
    x = (a + b) / 2
    y, i = max((a * x + b, i) for i, (a, b) in enumerate(f))
    x, j = min((a * y + b, i) for i, (a, b) in enumerate(g))
    return (y, i), (x, j)


def generate_cell():
    neutral = random.choice(range(5, 35, 5))
    success = random.choice(range(30, 100 - neutral + 5, 5))
    failure = 100 - neutral - success
    return success / 100, neutral / 100, failure / 100


def generate_grid():
    return tuple(generate_cell() for _ in range(9))


def apply(state, cell, player):
    return tuple(player if i == cell else v for i, v in enumerate(state))


def winner(state):
    for i in range(3):
        if state[i] == state[i + 3] == state[i + 6] and state[i] is not None:
            return state[i]
        if (
            state[3 * i] == state[3 * i + 1] == state[3 * i + 2]
            and state[3 * i] is not None
        ):
            return state[3 * i]
    if state[0] == state[4] == state[8] and state[0] is not None:
        return state[0]
    if state[2] == state[4] == state[6] and state[2] is not None:
        return state[2]
    return None


def available_cells(state):
    return [i for i, v in enumerate(state) if v is None]


@lru_cache(3**9)
def value(grid, state=(None,) * 9):
    """
    Returns V(s) and V'(s) along with the optimal actions
    """
    w = winner(state)
    if w == "x":
        return (1, None), (1, None)
    elif w == "o":
        return (0, None), (0, None)

    cells = available_cells(state)
    if not cells:
        return (0.5, None), (0.5, None)

    f = []
    g = []
    for cell in cells:
        success, neutral, failure = grid[cell]
        s1 = apply(state, cell, "x")
        (v1, _), (vp1, _) = value(grid, s1)
        s2 = apply(state, cell, "o")
        (v2, _), (vp2, _) = value(grid, s2)
        x_c = success * vp1 + failure * vp2
        xp_c = success * v2 + failure * v1
        f.append((neutral, x_c))
        g.append((neutral, xp_c))
    (v, i), (vp, ip) = hull_intersection(f, g)
    return (v, cells[i]), (vp, cells[ip])

With this code, we can solve the game for any grid. For example, the grid in the image above can be solved with the following code:

grid = (
    (0.65, 0.05, 0.3),
    (0.65, 0.2, 0.15),
    (0.55, 0.3, 0.15),
    (0.3, 0.2, 0.5),
    (0.3, 0.15, 0.55),
    (0.35, 0.05, 0.6),
    (0.3, 0.05, 0.65),
    (0.35, 0.2, 0.45),
    (0.45, 0.1, 0.45),
)

print(value(grid))
# ((0.5385368180873334, 2), (0.46146318189602853, 2))

We can also compute the average score of the cross player on random grids:

n = 100
print(sum(value(generate_grid())[0][0] for _ in range(n)) / n)
# 0.5722708880508689

Finally, a little sanity check. If we decide who starts with a fair coin flip, then both players get the same value by symmetry. Since the total value of the game is $1$, the value of the game for both players is $\frac12$. Hence, the value of the cross player must be $\frac12$, aka $\frac{V(∅) + V’(∅)}{2} = \frac12$.

(v, _), (vp, _) = value(generate_grid())
print((v + vp) / 2)
# 0.49999999999061806

Louis Abraham's Home Page

Solving Probabilistic Tic-Tac-Toe

11 Jun 2024

The game

Dynamic programming

Handling loops

Solution