A simple tic tac toe mAchine that learns the more it plays


This is my first attempt at Machine Learning. It is a simple program that follows the "Beads and Matches" (MENACE) scenerio. This scenerio is often used to describe to people one of the ways Machines learn. The scenerio is as follows:

There are 304 possible combinations of tic-tac-toe games. So you need a signle box of matches for each possible combination. In each box of matches you have different colored beads, where each color of beads indicate the next possible combinations after the turn has been made. The A.I. randomly selects a bead at its turn from the currect box without looking and plays the move that bead indicates. In the end of the game, if the A.I. won you go to every match box that the A.I. played and add the bead it selected, increasing the total count of that beans. If the A.I. loses the game, you remove those beads.

The first game is going to be completly random, as there is the same number of all the beads. But the more the A.I. plays - the more beads get changed, the better the A.I. plays. After I heard of this scenerio I quickly sat down in front of my computer and coded throughtout the night, until about 6am. It's nights like this that make me certain that I picked the right career path, and I am grateful for it.

I have written a blog post exlpaining this scenerio more in debth and I made the connection to reinforcement learning. For a clearer and deaper understanding please visit the link that is available below.

Where to find

View the article

View the blog post I wrote about MENACE and reinforcement learning

View blog