With Monte-Carlo Tree Search
In Connect 4, each player takes turns dropping a chip in one of 7 columns. If they ever have 4 chips in a row, either vertically, horizontally, or diagonally, they win!
I wrote an agent that’s so good I can’t beat it. Here’s how it works.
First, think about the simplest possible agent. When it’s their turn, they see what actions they have available. These actions are each of the columns they could drop the chip into. There are 7 different choices (unless some of the columns are full, then there are fewer). The simple agent just randomly picks an available column. It doesn’t know anything about Connect4 or what move is best, it just picks one. This agent isn’t very good.
Another approach would be to try and write a smart agent that understands Connect4. Maybe there’s an optimal strategy, it could detect twos and threes in a row to build on or protect against, or maybe it’ll use heuristics for tricking the other player.
Or maybe we could train a machine learning model to analyze past games and figure out which game states are good and which actions lead to them. We could train it on professional Connect4 Twitch streams.
I didn’t do any of that.
The way my agent works is like this. Instead of using any sort of logic about the current state of the game, the agent runs simulations to figure out which move is best.
It needs to figure out which move has the highest probability of winning. It does this by trying every move. It copies the game state for each possible move and takes it. These are all the possible next states. To see which one is the best it simulates 10,000 “random” games that start from each of these states. In the simulated games it just picks random moves for each player. After these 10,000 games, it scores the state.
wins - losses / 10,000
Whichever action led to the state with the highest score is the action it takes in the real game.
So on each turn, the agent plays up to 70,000 games of Connect4 before picking a move.
This algorithm is called Monte-Carlo Tree Search. You can think of the game states as a tree where every action you could take is an edge leading to the next game state. The tree expands out to the leaf nodes which are the endgame states, where a player wins or it’s a tie.
Tree of Game States
This tree is huge since most nodes branch into 7 more. We can’t just traverse the whole tree so we do a bunch of depth-first random samples (the simulated games) to try and find a path more likely to lead to us winning.
This is cool because we don’t need to “understand” Connect4 to write the agent. There’s no logical evaluation of how “good” a move or game state is. We just need to simulate matches. The logic for turn selection is not much different than the random agent, we just pick a lot of random actions until we get a good vibe.
I’ve been doing a lot of research lately on game-playing agents. There are lots of different algorithms for different kinds of games. Connect4 has a lot of game states so it’s hard to just fully compute the probabilities of winning for each state and action. (You could do this in a simpler game like tic-tac-toe and can just calculate every possible move for every possible game.) Connect4 is easy to simulate though so this works well.
The agent is written in rust. It’s not really optimized, just runs all the simulations on a single thread, but it still goes fast enough (on my laptop) to play against.
The code for the agent (which you can play against) is here. https://github.com/saolsen/connect4 Good luck trying to beat it!