I’ve been interested in artificial intelligence and machine learning for quite some time. Some time back, I had experimented with using a Neural Network to attempt to predict whether a stock would go up or down. I had some limited success, but I’m not sure how accurately you can predict something which, without sufficient causal data to describe what is actually happening, generally appears pretty much random.

So, I thought I would apply some machine learning algorithms to something more consistent: the game of Tic-Tac-Toe. I won’t describe the game, since anyone over the age of about six has probably played it. What I will describe, is two different approaches I took to teaching a computer to play this game so well as to be unbeatable.

First, I created a Java class that is capable of supervising a game between two players, asking each player to make their next move in turn, and stopping the game when either someone wins, or all nine of the grid positions have been played resulting in a tie. So far, so good.

Next, I created two players. One, a random player (RandomPlayer.java) which just picks a random position from the list of unplayed positions. The other one, an unbeatable player (IdealPlayer.java) which has the rules of tic-tac-toe hard-coded into its logic. This player knows how to win, and knows how to block, and it will not be beaten. I verified this by pitting the two players against each other for millions of games, and confirming that every game always resulted in either a win for the unbeatable player or a tie. So far so good.

I started thinking about how to go about training a computer to figure out moves on its own. This was a difficult task to wrap my brain around. So, I first started with a neural network with 18 inputs and 9 outputs. Each grid position corresponds to two inputs: one input for the opponent’s mark, and one input for our mark. If a given grid position is blank, both of its inputs will be zero; otherwise, one will be 1 and the other will be zero. The output with the largest positive value among the outputs whose spaces are empty, will indicate the position where we’ll play.

To train it, I wrote an app that calculated all of the possible legal grid permutations which did not comprise a win for either player. Then I eliminated duplicates when it comes to rotation (90, 180, 270 degrees) and reflection (flip). This cut the number to a little over one-eighth of the original number of permutations, considering the fact that some boards are identical when rotated and/or flipped.

Next, I trained the network by having it observe IdealPlayer’s response to every board permutation, in every possible orientation. Whenever the neural network got an answer wrong, I continued to train it on that permutation until it got it right. Once it was trained to the point where it could process every permutation and generate the same results as IdealPlayer, it was fully trained and unbeatable.

This is all fun and games, but there’s a problem. The neural network learned by example. I wanted something that was smart enough to figure things out on its own. Enter the Genetic Algorithm. For those of you not familiar with GA’s, I won’t go into too much depth, except to explain that they use the natural adaptation of species to their environment (survival of the fittest) to come up with near-optimal answers to real-world problems. The gene pool consists of an evolving population where the fitness of each individual is calculated, and the fittest ones survive to reproduce, producing a new population which should be fitter than the previous population.

In order to train the GA, I pitted it against a player which, given a specific game board permutation, can play all of the possible combinations in sequence, so the fitness of a specific individual in the GA’s population can be compared against it. The more games won or tied, the fitter the individual. When a specific game board from the list of possible game board permutations can no longer result in a loss, we have a fit enough set of individuals to be considered unbeatable for that game board permutation, and can stop training for that game board permutation. For each game board permutation, we end up with one individual (the fittest one) which always plays the best move for that game board permutation. The combination of these fittest individuals (one individual per game board permutation) becomes a genetic representation of an unbeatable player. And the algorithm figured it out on its own, rather than by example.

I’ve released the Java source code to my experiments under the BSD license. The project is hosted on SourceForge, and can be found here: Tic Project

Download it and try it out. Feel free to re-use any piece of the code in projects of your own, as long as you follow the rules of the BSD license.

Enjoy, and may God bless you!

Very well done.. I still have to check out the actual implementation, but the idea is great (I’ve always wanted to do something similar, but never found the time)

It’s surely an amazing idea … I Myself Have worked out on a similar project … Check his out ….

Link To my Code

Have a nice day 🙂

Thanks for sharing! I’m sure that anyone who is interested in my project, will also be interested in yours. So I’m leaving the link in your comment clickable. Hopefully that will send you a few extra readers each month. I don’t get much traffic on my blog, but I do what I can to help out my fellow enthusiasts.

Just what I was looking for.

This framework sort of cheats, by training the players against all possible boards, and a perfect player. Consequently, it shows the potential and basic methodology of the various learning methods, but doesn’t represent the practical difficulty well.

Perhaps additional examples would include more realistic learning, where perfect players and complete state maps are not available.

I’ll leave that as an exercise for the user. Maybe initially have the ML algorithms play against a random player which always chooses a random, available cell. Then pit another instance of the ML algorithm against the trained ML algorithm, then repeat until you have a trained ML algorithm which never loses. It’s a more roundabout way to do the training, but should result in the same thing: an automated player which cannot be defeated.

i think you should implement a monte carlo tree search and use that in conjuction with policy and value neural networks, in order to implement AlphaZero, in essence.