Monday, July 23, 2012

Let's Make Us A Game II-2: The AI Opponent

The rudiments of Tic-Tac-Toe Bastard are developing nicely.  My goals in this installment are to implement some basic UI elements, and try to get the artificial intelligence opponent working.  I'm sticking to text mode for now -- it's faster to play and debug, and once the core of the game is functioning we can start dressing it up with graphics, sound and a mouse interface.

The latest version of the code, below the fold, adds a couple of general gameplay features:


-- As the game starts up, the human player is allowed to select a difficulty level from 1 to 9 via the keyboard.  In the final version, this will be replaced with a GUI (graphical user interface) slider labeled more abstractly from "EASY" to "HARD" that can be changed on the fly (I think).  While the values from 1-9 will affect our AI algorithm directly, there aren't necessarily any visible differences between levels 3 and 4 or 8 and 9, so I'm cheating a bit here, using this approach to hide those sloppy details from the user.

-- There's also a simulated coin flip at the start of the game to determine who goes first; this has an impact on how the game plays out, so I want to make sure neither side has an unfair advantage.  I just use the results to reorder the PLAYERS array, and then the game proceeds as usual.  I did run into one technical glitch here -- every coin-flip was always coming up as a 1, even though I am seeding the pseudorandom number generator based on the system time!  It turns out that the math.randomseed() and math.random() functions are not initially very random on some platforms, the PC included; calling the math.random() function a second time produces the intended result.

To get the coin flip to work as expected, I had to fix a bug in the main game loop.  There were places it was using the bare index i instead of the PLAYERS[i] identification, which caused some confusion about who was who.  I also added some utility functions to represent the current player ID as a string -- "P" and "C" for our text-mode grid display, and "PLAYER" and "COMPUTER" for other uses, like the GAME OVER message.  This was more readable than the sequential identifiers "1" and "2", especially after I added the coin flip, and while this logic won't survive the text-mode phase of development, similar code will ensure that our graphical identifiers are correct later on. 

Almost all of my initial design elements are defined now, but I have not yet added a general tally for a series of Player vs. CPU matches -- the main game loop accommodates a single match only.  But that won't be hard to fix later -- we'll just have to wrap this game loop inside a larger loop.  Right now I'm interested in debugging the basic structure of the game.

Adding support for the human player's input was not difficult -- similar to the difficulty handler at startup, we just need a routine that prompts the user for a cell number from 1 to 9, and checks to make sure that A) the input is a valid number and B) that cell is not already occupied.  I'm being lazy here -- if either test fails, the user is given the same "That square is already occupied" message.  And I'm not going to correct this, because when we move into graphical, mouse-oriented mode, we simply won't allow the user to click on a square that's already occupied, and there's no way for the user to click on a nonexistent or invalid square.  An updated version of this routine will always return a valid square.


Getting the computer AI opponent working was more of a struggle than I expected.  The fact is that Tic-Tac-Toe on a 3 x 3 grid is not a very sophisticated strategic challenge, and two players can easily fight each other to a draw.  But human beings think intuitively, and computers have to create the appearance of intuition by using math and logic.  Tic-Tac-Toe is very simple compared to chess or even checkers, but I wanted to use similar principles to develop a routine that worked honestly, without using a lot of heuristic shortcuts.  I wanted the routine to be extensible to more complex challenges, like a 5 x 5 grid, and I wanted it to play naturally, determining its moves on the fly and ideally a little bit unpredictably.

This was my initial AI concept -- when it's the computer's turn, I want it to:

-- Look at each empty square on the grid, and calculate a "win probability" score for that square
-- Pick randomly from the possible moves with the highest (same) score

Getting the scoreMove function that calculates the "win probability" working was the tricky part.  I was able to create a hypothetical "next state" of the grid including the new move under consideration, and then reuse my checkVictory function from our previous installment to handle the obvious case -- if a particular move wins the game, then it should be assigned the highest possible score.  This worked immediately -- I let the computer pick randomly (and played poorly myself) until the CPU saw an opening, at which point it would successfully make the winning move.  So far, so good.

Dealing with non-final situations allowed Tic-Tac-Toe Bastard to earn its name as far as my efforts were concerned.  What I intended to do was to recurse over possible present and future moves to come up with a probabilistic score -- in other words, consider how the game could play out if a particular move was made, and then the player made one of the available next moves, and then the computer made the next available move, ending up with a higher score for the move presently at hand based on its likelihood of leading to eventual victory.  This is the classical approach to solving these kinds of problems -- the computer "looks ahead" at all possible sequences, and chooses the move that's most likely to help it win.  This takes some time and has historically been the primary obstacle to really smart AI chess play -- even for a simple game like 3 x 3 Tic-Tac-Toe, there are more than 50,000 possible sequences to consider after the first move is made.

Before I could find out if my basic approach was working, I had to resolve a couple of issues I hadn't considered.  One fix was to add a check to see if the grid is full before proceeding with deeper hypothetical iterations -- I was getting strange errors and invalid scores deep in the analysis, because the system was trying to consider a move when none were available.  This also sped up the algorithm by stopping it when it could no longer do anything useful, and pointed me to a good way to handle the difficulty setting:  the algorithm intentionally stops looking farther into the future if it's already gone through a certain number of levels.  Lowering the look-ahead to 1 essentially makes its moves random, while setting it to 8 or 9 allows it to consider all possible outcomes (depending on who goes first.)  I also added logic in the main game loop to see if the grid has filled up and the players have battled to a draw, so the game can end neatly if nobody wins.

I also made one heuristic compromise.  I realized that if the computer wins the coin flip and goes first, it might as well pick a square totally at random -- letting it consider all 9 possibilities takes considerable CPU time and always comes up with identical scores for each square.  Adding that shortcut sped up the first step (and ultimately obviated any visible difficulty difference between levels 8 and 9, and 7-9 are also the same if the player goes first.)

But my algorithm still wasn't working -- the computer still wasn't making very good choices.  What I was trying to do was calculate the probability of a win by building up a score, adding a portion of the next move's predicted score to reflect the growing uncertainty of each future state, as we can't predict what the player will do.  (Using the computer's algorithm for this prediction suggests that the player will play the same way, which is unlikely but the best available option.)  I was getting scores back from the scoreMove function, but they weren't proving very useful.

So I stepped back and considered this basic scenario for debugging -- if the player marks the upper left-hand corner, marked with a P below, then the computer should see that it has two moves which are least likely to be helpful to its cause, marked with 0, because they don't block the player's potential winning sequences at all:

P |    |
---------
   |    | 0
---------
   | 0 |

Whatever the score assigned (I realized that a true "probability" was not really necessary or possible, I just needed a means of ranking the possible moves), those two squares should be assigned the lowest score -- they don't block the player at all.  Similar patterns should be apparent if we mark any of the four corners.

With this simple check at hand, I found out that my approach was picking the worst moves as the best moves -- in effect, pushing the player toward victory by blocking off the moves that were least likely to help him or her win!  The problem was that, in considering what the player might do after the CPU made a move, I was starting from zero and adding to the potential score, treating the player win as an improvement.  So the recursion code was working, it was just not acting in the CPU's best interest; it had become very good at helping the player win.

So I changed the logic to start the score at a value of 8 (my "this move wins" value being 16, and "impossible moves" left at 0) and then subtract half the score calculated for the other player's next possible move.  The math here isn't precise -- I could have used any sub-1.0 multiplier to reflect future uncertainty as we look deeper, and with 0.5 sometimes the numbers go negative.  But the ranking still works -- now I was seeing the weakest scores for the two useless moves, and the computer is now putting up a proper fight if we let it look ahead to the maximum.  Even better, if we use a low difficulty setting, it has less information to work with because it doesn't look as far ahead, and therefore it makes poorer but still rational choices; it's a clean way to limit its "intellect", though I may try to rescale the difficulty values for more accuracy now that I know more about how it's actually working.

The final test was to have my wife play a few rounds -- she beat it handily on easy, and played to a draw on the harder difficulty settings.  It never beat her, though she was impressed that it was making smart moves, so I think I've accomplished what I wanted to.  This does, however, suggest that I should try to make the engine more flexible, so I can up the ante to make this more like a playable game -- a 5 x 5 or 7 x 7 grid should put up more of a challenge for a human player, while my existing algorithm will be slowed down (and may need to have limited look-ahead to keep performance reasonable) but should still be effective.

Next time, I plan to start wiring in some graphics.  The latest code is below the fold:



-- set up constants

EMPTY_ID = 0;
MY_ID = 1;
CPU_ID = 2;


GRID_SIZE = 9;

POSSIBLE_WINS = { {1, 2, 3}, {4, 5, 6}, {7, 8, 9},
                  {1, 4, 7}, {2, 5, 8}, {3, 6, 9},
                  {1, 5, 9}, {7, 5, 3} };

-- and useful semi-variables

aiDebug = false;

PLAYERS = { MY_ID, CPU_ID };

MAX_SCORE = 16;

MAX_DIFFICULTY = 9;

DIFFICULTY = 1;

-- set up structures

gGrid = { EMPTY_ID, EMPTY_ID, EMPTY_ID, EMPTY_ID, EMPTY_ID, EMPTY_ID, EMPTY_ID, EMPTY_ID, EMPTY_ID };

totalEvaluated = 0;

-- utility functions

function labelPlayer(id, cell)

   if id == EMPTY_ID then
      return tonumber(cell);
   elseif id == MY_ID then
      return "P";
   elseif id == CPU_ID then
      return "C";
   end;

end;

function doPrintGrid(grid)

    print(" ");
    print(string.format("%s | %s | %s", labelPlayer(grid[1],1), labelPlayer(grid[2],2), labelPlayer(grid[3],3)));
    print("----------");
    print(string.format("%s | %s | %s", labelPlayer(grid[4],4), labelPlayer(grid[5],5), labelPlayer(grid[6],6)));
    print("----------");
    print(string.format("%s | %s | %s", labelPlayer(grid[7],7), labelPlayer(grid[8],8), labelPlayer(grid[9],9)));
    print " ";

end;

function doPrintGridRaw(grid)

    print("GRID:");
    print(grid[1], "|", grid[2], "|", grid[3]);
    print("----------------------------------");
    print(grid[4], "|", grid[5], "|", grid[6]);
    print("----------------------------------");
    print(grid[7], "|", grid[8], "|", grid[9]);
    print " ";

end;

function identifyPlayer(id)

   if id == EMPTY_ID then
      return "NOBODY";
   elseif id == MY_ID then
      return "PLAYER";
   elseif id == CPU_ID then
      return "COMPUTER";
   end;

end;

-- game logic functions

function checkVictory(grid, id)

   local won = false;

   for i in pairs(POSSIBLE_WINS) do

       if (grid[POSSIBLE_WINS[i][1]] == id and
           grid[POSSIBLE_WINS[i][2]] == id and
           grid[POSSIBLE_WINS[i][3]] == id) then

          won = true;
       
        end;

   end;

   return won;

end;

function gridFull(grid)

   local full = true;

   for i = 1, GRID_SIZE do

       if (grid[i] == EMPTY_ID) then full = false; end;

   end;

   return full;

end;

function gridEmpty(grid)

   local empty = true;

   for i = 1, GRID_SIZE do

       if (grid[i] ~= EMPTY_ID) then empty = false; end;

   end;

   return empty;

end;


-- GUI functions

function getPlayerInput(grid, id)

   local playerMove;

   repeat
      io.write("Make your move [1-9]: ");
      io.flush();
      playerMove=tonumber(io.read());
      if (grid[playerMove] ~= EMPTY_ID) then
         print "That square is already occupied";
      end;
   until grid[playerMove] == EMPTY_ID;

   print("Player marks square ", playerMove);

   return playerMove;

end;

-- AI functions

-- Determine the id of the current player's opponent

function other_player(id)
   for k = 1, table.getn(PLAYERS) do
      if PLAYERS[k] ~= id then
         return PLAYERS[k];
      end;
   end;

end;

-- Score a move's likelihood of winning given the current state of the grid

function scoreMove(grid, id, move, currentLevel, maxLevel)

   local score = 1;
   local newGrid = {};

   local indentString = string.rep(" ", currentLevel); -- for readability of debug output

   if (aiDebug) then print(string.format("%sNew scoreMove call - %i of %i, player %i move %i...", indentString, currentLevel, maxLevel, id, move)); end;

   totalEvaluated = totalEvaluated + 1;
  
   for i = 1, GRID_SIZE do
      newGrid[i] = grid[i];
   end;

   newGrid[move] = id; -- mark this move as a hypothetical next state

   if checkVictory(newGrid, id) then

      -- This is the (or a) winning move, so score it at the maximum

      score = MAX_SCORE;

      if (aiDebug) then print(string.format("%sFound a winning move for player %i, move %i",                               indentString, id, move)); end;
  
   elseif not gridFull(newGrid) and (currentLevel < maxLevel) then

      -- Assess possible next moves made after this move, based on likelihood of victory
      -- Want to come up a single score in the range 1 to MAX_SCORE that leads toward a win

      if (aiDebug) then print(string.format("%sEvaluating other player's move...", indentString)); end;

      local thisScore = 8;
      local movesEvaluated = 0;

      if (aiDebug) then doPrintGrid(newGrid); end;

      for j = 1, GRID_SIZE do

          if (newGrid[j] == EMPTY_ID) then

              thisScore = thisScore - 0.5 * scoreMove(newGrid, other_player(id), j, currentLevel + 1, maxLevel);
              movesEvaluated = movesEvaluated + 1;

           end;

      end;     

      if (aiDebug) then print(string.format("%sthisScore, movesEvaluated: %f %i",                               indentString, thisScore, movesEvaluated)); end;

      score = math.random(1, MAX_SCORE);
      score = thisScore / movesEvaluated;

      if (aiDebug) then print(string.format("%sEvaluated move %i for player %i at %i",                                       indentString, move, id, score)); end;

   end;

   return score;

end;

-- Simplest case:  Pick a blank square at random

function getCPUInputEasy(grid, id)

   local cpuMove;

   repeat
      cpuMove = math.random(1, GRID_SIZE);
   until grid[cpuMove] == EMPTY_ID;

   return cpuMove;

end;

-- AI Case: Score possible moves and pick one of the best

function getCPUInput(grid, id)

   local cpuMove;
   local aiScore = { };
   local bestMove;
   local maxScore = -16;

   local startTime = os.clock();

   totalEvaluated = 0;

   print ("CPU is thinking...");

   if gridEmpty(grid) then -- No need to consider moves, just pick at random

      bestMove = getCPUInputEasy(grid, id);

   else

   -- Consider and score each possible move
   for i = 1, GRID_SIZE do
      local score = 0;
      if grid[i] == EMPTY_ID then -- possible move worth evaluating
         score = scoreMove(grid, id, i, 1, DIFFICULTY);
      end;
      aiScore[i] = score;
      if score > maxScore then maxScore = score; end;     
   end;

   if (aiDebug) then doPrintGridRaw(aiScore); end;

   -- Decide which move to make, choosing at random from best moves by score
   bestMoves = {};
   local j = 0;
   for i = 1, GRID_SIZE do
      if (aiScore[i] == maxScore) and (grid[i] == EMPTY_ID) then
         j = j + 1;
         bestMoves[j] = i;
      end;
   end;

   if (table.getn(bestMoves) > 0) then
      bestMove = bestMoves[math.random(1, table.getn(bestMoves))];
   else
      bestMove = getCPUInputEasy(grid);
   end;

   if (aiDebug) then
      print ("CPU took ", os.clock() - startTime);
      print("CPU evaluated ", totalEvaluated, " moves");
   end;

   end;

   print("CPU marks square ", bestMove);

   return bestMove;

end;

function getMove(grid, id)

   local square = nil;

   if id == MY_ID then
      square = getPlayerInput(grid, id);
   else
      square = getCPUInput(grid, id);
   end;

   return square;

end;

-- main code

local gameOver = false;
local winner = EMPTY_ID;

math.randomseed(os.clock());

print(" ");
print("TIC-TAC-TOE BASTARD");
print(" ");

DIFFICULTY = nil;

repeat
   io.write("Select Difficulty 1-9 [1 = Easy, 9 = Hard]: ");
   io.flush();
   DIFFICULTY=tonumber(io.read());
until DIFFICULTY ~= nil;

print(" ");
print("COIN FLIP...");
coinFlip = math.random(1, 10);
coinFlip = math.random(1, 10); -- do it again, first time always seems to yield 1

if coinFlip < 5 then
   print("HEADS! Computer goes first");
   PLAYERS[1] = CPU_ID;
   PLAYERS[2] = MY_ID;
else
   print("TAILS! Player goes first");
   PLAYERS = {MY_ID, CPU_ID};
end;

doPrintGrid(gGrid);

while not gameOver do

   for i in pairs(PLAYERS) do

      print(string.format("%s is up!", identifyPlayer(PLAYERS[i])));

      gGrid[getMove(gGrid, PLAYERS[i])] = PLAYERS[i];

      if checkVictory(gGrid, PLAYERS[i]) then
         winner = PLAYERS[i];
         gameOver = true;
       end;

       if not gameOver and gridFull(gGrid) then
          winner = EMPTY_ID;
          gameOver = true;
       end;

       doPrintGrid(gGrid);

       if gameOver then break; end;

    end;
end;

if (winner ~= EMPTY_ID) then
    print (string.format("GAME OVER! %s WINS!", identifyPlayer(winner)));
else
    print ("IT'S A DRAW!");
end;

No comments:

Post a Comment