# Program To Create Games Like Tic-Tac-Toe and Connect 4 In Python Assignment Solution.

## Instructions

Objective
Write a python assignment program to create games like tic-tac-toe and connect 4 in python.

## Requirements and Specifications

There will be in total 4 games :-
• Tic tac toe alpha beta tree search
• Connect 4
• Tic tac toe pure monte carlo search
• Tic tac toe heuristic alpha beta tree search
Source Code
TIC TAC TOE ALPHA BETA TREE SEARCH
# Adversarial Search: Solving Tic-Tac-Toe with Minimax Search and Alpha-Beta Pruning
## Introduction
Multiplayer games can be implemented as:
1. Nondeterministic actions: The opponent is seen as part of an environment with nondeterministic actions. Non-determinism is the result of the unknown opponent's moves.
2.  __Optimal Decisions:__ Minimax search (search complete game tree) and alpha-beta pruning.
3. Heuristic Alpha-Beta Tree Search: Cut off tree search and use heuristic to estimate state value.
4.  Monte Carlo Tree search: Simulate playouts to estimate state value.
Here we will implement search for Tic-Tac-Toe (see [rules](https://en.wikipedia.org/wiki/Tic-tac-toe)). The game is a __zero-sum game__: Win by x results in +1, win by o in -1 and a tie has a value of 0. Max plays x and tries to maximize the outcome while Min plays o and tries to minimize the outcome.
We will implement * __minimax search__, and * __minimax search with alpha-beta pruning.__ The search time (number if nodes explored) could be further improved by * __move ordering__ (search moves first that have performed well in previous games), * __heuristic alpha-beta tree search__ (cut off search and use heuristic evaluation function to approximate utility), * __forward pruning__ (prune moves that appear poor), and * __table lookups__ (for openings and end game). The algorithms search the game tree and we could return a conditional plan (or partial plan if cut offs are used), but the implementation here only identifies and returns the optimal next move. ## The board I represent the board as a vector of length 9. The values are ' ', 'x', 'o'. import numpy as np import math def empty_board():     return [' '] * 9 Some helper functions. %matplotlib inline %config InlineBackend.figure_format = 'retina' # use higher resolution images in notebook import numpy as np import matplotlib.pyplot as plt from matplotlib.colors import ListedColormap def show_board(board, help = True, dpi = 40, colors = {' ': 'white', 'x': 'red', 'o': 'black'}):     """Show the tic-tac-toe-board. help adds the array index, dpi changes the sice and     colors sets the colors"""     b = np.array(board).reshape((3,3))     with plt.rc_context({'figure.dpi': dpi}):         fig = plt.matshow(np.zeros((3, 3)), cmap = ListedColormap(['w']))     fig.axes.axis('off')     plt.hlines([.5, 1.5], -.5, 2.5)     plt.vlines([.5, 1.5], -.5, 2.5)     for row in range(3):         for col in range(3):             plt.text(row, col, b[col, row],                  fontsize = 64,                  color = colors[b[col, row]],                  horizontalalignment = 'center',                  verticalalignment = 'center')     if help:         for row in range(3):             for col in range(3):                 plt.text(col, row - .35, col + 3 * row,                      fontsize = 12,                      color = 'gray',                      horizontalalignment = 'center',                      verticalalignment = 'center')     plt.show() def check_win(board):     """check the board and return one of x, o, d (draw), or n (for next move)"""     board = np.array(board).reshape((3,3))     diagonals = np.array([[board[i][i] for i in range(len(board))],                           [board[i][len(board)-i-1] for i in range(len(board))]])     for a_board in [board, np.transpose(board), diagonals]:         for row in a_board:             if len(set(row)) == 1 and row != ' ':                 return row     # check for draw     if(np.sum(board == ' ') < 1):         return 'd'     return 'n' def other(player):     if player == 'x': return 'o'     else: return 'x' # Recursive DFS algorithm for Minimax Search See AIMA page 150. For games we need: * actions(s): Legal moves in state s. * result(s, a): Transition model. * terminal(s): Test for terminal states. * utility(s): Utility for player Max. Note: Minimax search assumes that both players play optimally from the current state to the end of the game. The player who is Max can be specified using the player parameter. It defaults to x. def actions(board):     """return possible actions as a vector ot indices"""     return np.where(np.array(board) == ' ').tolist()     # randomize the action order     #actions = np.where(np.array(board) == ' ')     #np.random.shuffle(actions)     #return actions.tolist() def result(state, player, action):     """Add move to the board."""     if(state[action] != ' '):         print("Error: Illegal move!")     # making a deep copy is very important so you do not overwrite other     # states in your search tree.     state = state.copy()     # Note: x.copy() makes a deep copy of a Python list. For a numpy array/matrix you     # need to use numpy.copy() instread!     state[action] = player     return state Return utility for a state. Terminal states return the utility for Max as +1, -1 or 0. Non-terminal state have no utility and return None. def utility(state, player = 'x'):     """check is a state is terminal and return the utility if it is. None means not a terminal mode."""     goal = check_win(state)     if goal == player: return +1     if goal == 'd': return 0     if goal == other(player): return -1 # loss is failure     return None # continue print(utility(['x'] * 9)) print(utility(['o'] * 9)) print(utility(empty_board())) Check if a state is a terminal state. __Note:__ In the code below I will use utility(state, player) is not None to identify terminal states in the code below to avoid calling check_win twice. def is_terminal(state):     """check is a state is a terminal state"""     return check_win(state) != "n" print(is_terminal(['x'] * 9)) print(is_terminal(empty_board())) Note that the minimax search algorithm is very similar to AND-OR search where max does OR and min is represented by AND nodes. # global variables DEBUG = 1 # 1 ... count nodes, 2 ... debug each node COUNT = 0 def minimax_search(board, player = 'x'):     """start the search."""     global DEBUG, COUNT     COUNT = 0     value, move = max_value(board, player)     if DEBUG >= 1: print(f"Number of nodes searched: {COUNT}")     return { "move": move, "value": value} def max_value(state, player):     """player's best move."""     global DEBUG, COUNT     COUNT += 1     # return utility of state if it is a terminal state     v = utility(state, player)     if DEBUG >= 2: print("max in: " + str(state) + str([v]) )     if v is not None: return v, None     v, move = -math.inf, None     # check all possible actions in the state, return move with the largest value     for a in actions(state):         v2, a2 = min_value(result(state, player, a), player)         if v2 > v:             v, move = v2, a     if DEBUG >= 2: print("max out: " + str(state) + str([v, move]) )     return v, move def min_value(state, player):     """opponent's best response."""     global DEBUG, COUNT     COUNT += 1     # return utility of state if it is a terminal state     v = utility(state, player)     if DEBUG >= 2: print("min in: " + str(state) + str([v]) )     if v is not None: return v, None     v, move = +math.inf, None     # check all possible actions in the state, return move with the smallest value     for a in actions(state):         v2, a2 = max_value(result(state, other(player), a), player)         if v2 < v:             v, move = v2, a     if DEBUG >= 2: print("min out: " + str(state) + str([v, move]) )     return v, move __Notes:__ * This DSF code check the available actions in order and picks the first one with the largest/smallest value. There are many ties. We could randomize actions(). ## Some Tests ### x is about to win (choose 8) board = empty_board() board = 'x' board = 'o' board = 'o' board = 'x' print("Board:") show_board(board) print() %time display(minimax_search(board)) **Note:** DFS does not pick the the shortest path to a win! Discounting the utility can help. ### x can draw if it chooses 7 board = empty_board() board = 'x' board = 'o' board = 'x' board = 'o' print("Board:") show_board(board) print() %time display(minimax_search(board)) ### o is about to win no matter what x does board = empty_board() board = 'o' board = 'o' board = 'o' board = 'x' board = 'x' print("Board:") show_board(board) print() %time display(minimax_search(board)) ### Empty board board = empty_board() print("Board:") show_board(board) print() %time display(minimax_search(board)) __Note:__ Starting with an empty board searched the complete game tree and takes a while. The number of nodes above is the actual size of the complete game tree. A table with know 'openings' (e.g., place x in a corner, o chooses the center, etc.) would be helpful to speed things up. # Recursive DFS algorithm for Minimax Search with Alpha-Beta Pruning See AIMA page 154. Adds alpha-beta pruning to minimax search. Alpha and beta are used to maintain bounds on the minimax value of a node in the form [alpha, beta]. alpha means that the value is at least that high and beta means that the actual value is at most that high. Subtrees below a node that are worse than the currently known bound do not need to be further explored and can be pruned. Max uses alpha and Min uses beta for pruning. The implementation is for player 'x' returns a the optimal next move. # global variables DEBUG = 1 # 1 ... count nodes, 2 ... debug each node COUNT = 0 def alpha_beta_search(board, player = 'x'):     """start the search."""     global DEBUG, COUNT     COUNT = 0     value, move = max_value_ab(board, player, -math.inf, +math.inf)     if DEBUG >= 1: print(f"Number of nodes searched: {COUNT}")     return { "move": move, "value": value } def max_value_ab(state, player, alpha, beta):     """player's best move."""     global DEBUG, COUNT     COUNT += 1     # return utility of state is a terminal state     v = utility(state, player)     if DEBUG >= 2: print(f"max: {state} [alpha,beta]=[{alpha},{beta}] v={v}")     if v is not None:         if DEBUG >= 2: print(f" found terminal state. backtracking.")         return v, None     v, move = -math.inf, None     # check all possible actions in the state, update alpha and return move with the largest value     for a in actions(state):         v2, a2 = min_value_ab(result(state, player, a), player, alpha, beta)         if DEBUG >= 2: print(f"max: {state} (backtracked) [alpha,beta]=[{alpha},{beta}] v={v2}")         if v2 > v:             v, move = v2, a             alpha = max(alpha, v)         if v >= beta:             if DEBUG >= 2: print(f" v>=beta ({v}>={beta}): pruning remaining subtree (actions). backtracking.")             return v, move     return v, move def min_value_ab(state, player, alpha, beta):     """opponent's best response."""     global DEBUG, COUNT     COUNT += 1     # return utility of state is a terminal state     v = utility(state, player)     if DEBUG >= 2: print(f"min: {state} [alpha,beta]=[{alpha},{beta}] v={v}")     if v is not None:         if DEBUG >= 2: print(f" found terminal state. backtacking.")         return v, None     v, move = +math.inf, None     # check all possible actions in the state, update beta and return move with the smallest value     for a in actions(state):         v2, a2 = max_value_ab(result(state, other(player), a), player, alpha, beta)         if DEBUG >= 2: print(f"min: {state} (backtracked) [alpha,beta]=[{alpha},{beta}] v={v2}")         if v2 < v:             v, move = v2, a             beta = min(beta, v)         if v <= alpha:             if DEBUG >= 2: print(f" v<=alpha ({v}<={alpha}): pruning remaining subtree (actions). backtracking.")             return v, move     return v, move __Notes:__ * This DFS code check the available actions in order and picks the first one with the largest/smallest value. There are many ties. We could randomize actions(). ## Some Tests ### x is about to win (play 8) DEBUG = 2 # show more debugging info board = empty_board() board = 'x' board = 'o' board = 'o' board = 'x' print("Board:") show_board(board) print() %time display(alpha_beta_search(board)) The code does not pick the the shortest path to a win! Discounting could help. ### x can draw if it chooses 7 DEBUG = 1 board = empty_board() board = 'x' board = 'o' board = 'x' board = 'o' print("Board:") show_board(board) print() %time display(alpha_beta_search(board)) ### o is about to win no matter what x does board = empty_board() board = 'o' board = 'o' board = 'o' board = 'x' board = 'x' print("Board:") show_board(board) print() %time display(alpha_beta_search(board)) ### Empty board: Only a draw an be guaranteed board = empty_board() print("Board:") show_board(board) print() %time display(alpha_beta_search(board)) **Note:** Alpha-Beta search expands fewer nodes and is **much faster** than minimax search. ## Move ordering Smart move reordering can make alpha-beta pruning more effective. I think the center  and corners [0, 2, 6, 8] are good. I implement move reordering in the actions(). def actions(board):     """return possible actions as a vector ot indices"""     actions = np.where(np.array(board) == ' ').tolist()     priority = [1,0,1,                 0,2,0,                 1,0,1]     priority = [priority[i] for i in actions]     actions =[a for _,a in sorted(zip(priority,actions), reverse=True)]     return actions board = empty_board() show_board(board) actions(board) ### Empty board: Only a draw an be guaranteed board = empty_board() print("Board:") show_board(board) print() %time display(alpha_beta_search(board)) __Note:__ Compare to the number of nodes searched without move reordering (right above). ## Experiments ### Baseline: Randomized Player A completely randomized player agent should be a weak baseline. def random_player(board, player = None):     """Simple player that chooses a random empy square. player is unused"""     return np.random.choice(actions(board)) show_board(board) %time display(random_player(board)) ### The Environment Implement the environment that calls the agent. The percept is the board and the action is move. def switch_player(player, x, o):     """Switch player symbol and agent function between turns.     player is a player symbol and x and o are the players' agent functions."""     if player == 'x':         return 'o', o     else:         return 'x', x def play(x, o, N = 100):     """Play N games. x and o are the players' agent functions."""     results = {'x': 0, 'o': 0, 'd': 0}     for i in range(N):         board = empty_board()         player, fun = 'x', x         while True:             a = fun(board, player)             board = result(board, player, a)             win = check_win(board)             if win != 'n':                 results[win] += 1                 break             player, fun = switch_player(player, x, o)     return results ### Random vs. Random %time display(play(random_player, random_player)) ### Minimax with Alpha-Beta Pruning vs. Random DEBUG = 0 def alpha_beta_player(board, player = 'x'):     return alpha_beta_search(board, player)["move"] print("alpha-beta vs. random:") %time display(play(alpha_beta_player, random_player)) print() print("random vs. alpha-beta") %time display(play(random_player, alpha_beta_player)) **Note on runtime:** Compared to AND-OR Search, Minimax search is slower, because AND-OR Search terminates once it finds the first subtree that has only goal nodes, buy Minimax search traverses the whole tree (minus what is pruned if alpha-beta pruning is used). Also, I was lazy and recreate the search tree every time the agent is asked for a move. The agent could create the tree with the Minima values once and then use it to decide on actions. A problem could be that the tree may be too large to store. CONNECT 4 # Adversarial Search: Playing Connect 4 Student Name: [Add your name] ## Instructions Total Points: Undegraduates 100, graduate students 110 Complete this notebook and submit it. The notebook needs to be a complete project report with your implementation, documentation including a short discussion of how your implementation works and your design choices, and experimental results (e.g., tables and charts with simulation results) with a short discussion of what they mean. Use the provided notebook cells and insert additional code and markdown cells as needed. ## Introduction You will implement different versions of agents that play Connect 4: > "Connect 4 is a two-player connection board game, in which the players choose a color and then take turns dropping colored discs into a seven-column, six-row vertically suspended grid. The pieces fall straight down, occupying the lowest available space within the column. The objective of the game is to be the first to form a horizontal, vertical, or diagonal line of four of one's own discs." (see [Connect Four on Wikipedia](https://en.wikipedia.org/wiki/Connect_Four)) Note that [Connect-4 has been solved](https://en.wikipedia.org/wiki/Connect_Four#Mathematical_solution) in 1988. A connect-4 solver with a discussion of how to solve different parts of the problem can be found here: https://connect4.gamesolver.org/en/ ## Task 1: Defining the Search Problem [10 point] Define the components of the search problem: * Initial state * Actions * Transition model (result function) * Goal state (terminal state and utility) # Your code/answer goes here. How big is the state space? Give an estimate and explain it. # Your code/ answer goes here. How big is the game tree that minimax search will go through? Give an estimate and explain it. # Your code/ answer goes here. ## Task 2: Game Environment and Random Agent [25 point] Use a numpy character array as the board. import numpy as np def empty_board(shape=(6, 7)):     return np.full(shape=shape, fill_value=0) print(empty_board()) The standard board is $6 \times 7$ but you can use smaller boards to test your code. Instead of colors (red and yellow), I use 1 and -1 to represent the players. Make sure that your agent functions all have the from: agent_type(board, player = 1), where board is the current board position (in the format above) and player is the player whose next move it is and who the agent should play (as 1 and -1). # Visualization code by Randolph Rankin import matplotlib.pyplot as plt def visualize(board):     plt.axes()     rectangle=plt.Rectangle((-0.5,len(board)*-1+0.5),len(board),len(board),fc='blue')     circles=[]     for i,row in enumerate(board):         for j,val in enumerate(row):             color='white' if val==0 else 'red' if val==1 else 'yellow'             circles.append(plt.Circle((j,i*-1),0.4,fc=color))     plt.gca().add_patch(rectangle)     for circle in circles:         plt.gca().add_patch(circle)     plt.axis('scaled')     plt.show() board = [[0, 0, 0, 0, 0, 0, 0],          [0, 0, 0, 0, 0, 0, 0],          [0, 0, 0, 0, 0, 0, 0],          [0, 0, 0, 1, 0, 0, 0],          [0, 0, 0, 1, 0, 0, 0],          [0,-1,-1, 1,-1, 0, 0]] visualize(board) Implement helper functions for: * A check for available actions in each state actions(s). * The transition model result(s, a). * Check for terminal states terminal(s). * The utility function utility(s). Make sure that all these functions work with boards of different sizes (number of columns and rows). # Your code/ answer goes here. Implement an agent that plays randomly. Make sure the agent function receives as the percept the board and returns a valid action. Use an agent function definition with the following signature (arguments): def random_player(board, player = 1): ... The argument player is used for agents that do not store what color they are playing. The value passed on by the environment should be 1 ot -1 for player red and yellow, respectively. See [Experiments section for tic-tac-toe](https://nbviewer.org/github/mhahsler/CS7320-AI/blob/master/Games/tictactoe_and_or_tree_search.ipynb#Experiments) for an example. # Your code/ answer goes here. Let two random agents play against each other 1000 times. Look at the [Experiments section for tic-tac-toe](https://nbviewer.org/github/mhahsler/CS7320-AI/blob/master/Games/tictactoe_and_or_tree_search.ipynb#Experiments) to see how the environment uses the agent functions to play against each other. How often does each player win? Is the result expected? # Your code/ answer goes here. ## Task 3: Minimax Search with Alpha-Beta Pruning ### Implement the Search [20 points] Implement minimax search starting from a given board for specifying the player. You can use code from the [tic-tac-toe example](https://nbviewer.org/github/mhahsler/CS7320-AI/blob/master/Games/tictactoe_alpha_beta_tree_search.ipynb). __Important Notes:__ * Make sure that all your agent functions have a signature consistent with the random agent above and that it [uses a class to store state information.](https://nbviewer.org/github/mhahsler/CS7320-AI/blob/master/HOWTOs/store_agent_state_information.ipynb) This is essential to be able play against agents from other students later. * The search space for a $6 \times 7$ board is large. You can experiment with smaller boards (the smallest is $4 \times 4$) and/or changing the winning rule to connect 3 instead of 4. # Your code/ answer goes here. Experiment with some manually created boards (at least 5) to check if the agent spots winning opportunities. # Your code/ answer goes here. How long does it take to make a move? Start with a smaller board with 4 columns and make the board larger by adding columns. # Your code/ answer goes here. ### Move ordering [5 points] Starting the search with better moves will increase the efficiency of alpha-beta pruning. Describe and implement a simple move ordering strategy. Make a table that shows how the ordering strategies influence the time it takes to make a move? # Your code/ answer goes here. ### The first few moves [5 points] Start with an empty board. This is the worst case scenario for minimax search since it needs solve all possible games that can be played (minus some pruning) before making the decision. What can you do? # Your code/ answer goes here. ### Playtime [5 points] Let the Minimax Search agent play a random agent on a small board. Analyze wins, losses and draws. # Your code/ answer goes here. ## Task 4: Heuristic Alpha-Beta Tree Search ### Heuristic evaluation function [15 points] Define and implement a heuristic evaluation function. # Your code/ answer goes here. ### Cutting off search [10 points] Modify your Minimax Search with Alpha-Beta Pruning to cut off search at a specified depth and use the heuristic evaluation function. Experiment with different cutoff values. # Your code/ answer goes here. Experiment with the same manually created boards as above to check if the agent spots wining opportunities. # Your code/ answer goes here. How long does it take to make a move? Start with a smaller board with 4 columns and make the board larger by adding columns. # Your code/ answer goes here. ### Playtime [5 points] Let two heuristic search agents (different cutoff depth, different heuristic evaluation function) compete against each other on a reasonably sized board. Since there is no randomness, you only need to let them play once. # Your code/ answer goes here. ## Challenge task [+ 10 bonus point will be awarded separately] Find another student and let your best agent play against the other student's best player. We will set up a class tournament on Canvas. This tournament will continue after the submission deadline. ## Graduate student advanced task: Pure Monte Carlo Search and Best First Move [10 point] __Undergraduate students:__ This is a bonus task you can attempt if you like [+10 bonus point]. ### Pure Monte Carlo Search Implement Pure Monte Carlo Search and investigate how this search performs on the test boards that you have used above. # Your code/ answer goes here. ### Best First Move Use Oure Monte Carlo Search to determine what the best first move is? Describe under what assumptions this is the "best" first move. # Your code/ answer goes here. 

TIC TAC TOE PURE MONTE CARLO SEARCH

# Adversarial Search: Solving Tic-Tac-Toe with Pure Monte Carlo Search

## Introduction

Multiplayer games can be implemented as:

1. Nondeterministic actions: The opponent is seen as part of an environment with nondeterministic actions. Non-determinism is the result of the unknown opponent's moves.
2. Optimal Decisions: Minimax search (search complete game tree) and alpha-beta pruning.
3. Heuristic Alpha-Beta Tree Search: Cut off tree search and use heuristic to estimate state value.
4.  __Monte Carlo Search:__ Simulate playouts to estimate state value.

Here we will implement search for Tic-Tac-Toe (see [rules](https://en.wikipedia.org/wiki/Tic-tac-toe)). The game is a __zero-sum game__: Win by x results in +1, win by o in -1 and a tie has a value of 0. Max plays x and tries to maximize the outcome while Min plays o and tries to minimize the outcome.

We will implement * Pure Monte Carlo search ## The board I represent the board as a vector of length 9. The values are ' ', 'x', 'o'. import numpy as np import math %precision 3 def empty_board():     return [' '] * 9 Some helper functions. %matplotlib inline %config InlineBackend.figure_format = 'retina' # use higher resolution images in notebook import numpy as np import matplotlib.pyplot as plt from matplotlib.colors import ListedColormap def show_board(board, help = True, dpi = 40, colors = {' ': 'white', 'x': 'red', 'o': 'black'}):     """Show the tic-tac-toe-board. help adds the array index, dpi changes the sice and     colors sets the colors"""     b = np.array(board).reshape((3,3))     with plt.rc_context({'figure.dpi': dpi}):         fig = plt.matshow(np.zeros((3, 3)), cmap = ListedColormap(['w']))     fig.axes.axis('off')     plt.hlines([.5, 1.5], -.5, 2.5)     plt.vlines([.5, 1.5], -.5, 2.5)     for row in range(3):         for col in range(3):             plt.text(row, col, b[col, row],                  fontsize = 64,                  color = colors[b[col, row]],                  horizontalalignment = 'center',                  verticalalignment = 'center')     if help:         for row in range(3):             for col in range(3):                 plt.text(col, row - .35, col + 3 * row,                      fontsize = 12,                      color = 'gray',                      horizontalalignment = 'center',                      verticalalignment = 'center')     plt.show() def check_win(board):     """check the board and return one of x, o, d (draw), or n (for next move)"""     board = np.array(board).reshape((3,3))     diagonals = np.array([[board[i][i] for i in range(len(board))],                           [board[i][len(board)-i-1] for i in range(len(board))]])     for a_board in [board, np.transpose(board), diagonals]:         for row in a_board:             if len(set(row)) == 1 and row != ' ':                 return row     # check for draw     if(np.sum(board == ' ') < 1):         return 'd'     return 'n' def actions(board):     """return possible actions as a vector ot indices"""     return np.where(np.array(board) == ' ').tolist()     # randomize the action order     #actions = np.where(np.array(board) == ' ')     #np.random.shuffle(actions)     #return actions.tolist() def result(state, player, action):     """Add move to the board."""     state = state.copy()     state[action] = player     return state def other(player):     if player == 'x': return 'o'     else: return 'x' def utility(state, player = 'x'):     """check is a state is terminal and return the utility if it is. None means not a terminal mode."""     goal = check_win(state)     if goal == player: return +1 # win     if goal == 'd': return 0 # draw     if goal == other(player): return -1 # loss     return None # utility is not defined # Pure Monte Carlo Search See AIMA page 161ff. We implement a extremely simplified version. For the current state: 1. Simulate $N$ random playouts for each possible action and 2. pick the action with the highest average utility. __Important note:__ we use here a random playout policy, which ends up creating just a randomized search that works fine for this toy problem. For real applications you need to extend the code with 1. a good __playout policy__ (e.g., learned by self-play) and 2. a __selection policy__ (e.g., UCB1). ## Simulate playouts def playout(state, action, player = 'x'):     """Perfrom a random playout starting with the given action on the fiven board     and return the utility of the finished game."""     state = result(state, player, action)     current_player = other(player)     while(True):         # reached terminal state?         u = utility(state, player)         if u is not None: return(u)         # we use a random playout policy         a = np.random.choice(actions(state))         state = result(state, current_player, a)         #print(state)         # switch between players         current_player = other(current_player) # Playout for action 0 (top-left corner) board = empty_board() print(playout(board, 0)) print(playout(board, 0)) print(playout(board, 0)) print(playout(board, 0)) print(playout(board, 0)) def playouts(board, action, player = 'x', N = 100):     """Perform N playouts following the given action for the given board."""     return [ playout(board, action, player) for i in range(N) ] u = playouts(board, 0) print("Playout results:", u) print(f"mean utility: {np.mean(u)}") p_win = sum(np.array(u) == +1)/len(u) p_loss = sum(np.array(u) == -1)/len(u) p_draw = sum(np.array(u) == 0)/len(u) print(f"win probability: {p_win}") print(f"loss probability: {p_loss}") print(f"draw probability: {p_draw}") __Note:__ This shows that the player who goes first has a significant advantage in __pure random play.__ Real players do not play randomly, a better playout policy would be good. ## Pure Monte Carlo Search DEBUG = 1 def pmcs(board, N = 100, player = 'x'):     """Pure Monte Carlo Search. Returns the action that has the largest average utility.     The N playouts are evenly divided between the possible actions."""     global DEBUG     acts = actions(board)     n = math.floor(N/len(acts))     if DEBUG >= 1: print(f"Actions: {acts} ({n} playouts per action)")     ps = { i : np.mean(playouts(board, i, player, N = n)) for i in acts }     if DEBUG >= 1: display(ps)     action = max(ps, key=ps.get)     return action board = empty_board() display(board) %time print(pmcs(board)) print() print("10000 playouts give a better utility estimate.") %time print(pmcs(board, N = 10000)) Looks like the center and the corners are a lot better. ## Some Tests ### x is about to win (play 8) board = empty_board() board = 'x' board = 'o' board = 'o' board = 'x' print("Board:") show_board(board) print() %time display(pmcs(board)) ### o is about to win board = empty_board() board = 'o' board = 'o' board = 'o' board = 'x' board = 'x' print("Board:") show_board(board) print() %time display(pmcs(board)) print() %time display(pmcs(board, N = 1000)) ### x can draw if it chooses 7 board = empty_board() board = 'x' board = 'o' board = 'x' board = 'o' print("Board:") show_board(board) print() %time display(pmcs(board)) ### Empty board: Only a draw an be guaranteed board = empty_board() print("Board:") show_board(board) print() %time display(pmcs(board, N = 5000)) ### A bad situation board = empty_board() board = 'o' board = 'x' board = 'o' print("Board:") show_board(board) print() %time display(pmcs(board)) __Note:__ It looks like random player o is very unlikely to block x and take advantage of the trap by playing the bottom left corner! ## Experiments ### Baseline: Randomized Player A completely randomized player agent should be a weak baseline. def random_player(board, player = None):     """Simple player that chooses a random empy square. player is unused"""     return np.random.choice(actions(board)) ### The Environment Implement the environment that calls the agent. The percept is the board and the action is move. DEBUG = 1 def switch_player(player, x, o):     if player == 'x':         return 'o', o     else:         return 'x', x def play(x, o, N = 100):     results = {'x': 0, 'o': 0, 'd': 0}     for i in range(N):         board = empty_board()         player, fun = 'x', x         while True:             a = fun(board, player)             board = result(board, player, a)             if DEBUG >= 1: print(f"Player {player} uses action {a}")             win = check_win(board)             if win != 'n':                 if DEBUG >= 1: print(f"Result {board} winner: {win}")                 results[win] += 1                 break             player, fun = switch_player(player, x, o)     return results ### Random vs. Random DEBUG = 0 %time display(play(random_player, random_player)) ### Pure Monte Carlo Search vs. Random def pmcs10_player(board, player = 'x'):     action = pmcs(board, N = 10, player = player)     return action def pmcs100_player(board, player = 'x'):     action = pmcs(board, N = 100, player = player)     return action def pmcs1000_player(board, player = 'x'):     action = pmcs(board, N = 1000, player = player)     return action DEBUG = 1 print("PMCS vs. random:") display(play(pmcs10_player, random_player, N = 1)) DEBUG = 0 print("PMCS (10) vs. random:") %time display(play(pmcs10_player, random_player)) print() print("random vs. PMCS (10)") %time display(play(random_player, pmcs10_player)) DEBUG = 0 print("PMCS (100) vs. random:") %time display(play(pmcs100_player, random_player)) print() print("random vs. PMCS (100)") %time display(play(random_player, pmcs100_player)) DEBUG = 0 print("PMCS (100) vs. PMCS (10):") %time display(play(pmcs100_player, pmcs10_player)) print() print("PMCS (10) vs. PMCS (100)") %time display(play(pmcs10_player, pmcs100_player)) **Note 1:** You can try pmcs_1000, but it will take a few minutes to run 100 games. **Note 2:** The important advantage about Monte Carlo Search is that we do not need to define a heuristic evaluation function to decide what boards are good. 

TIC TAC TOE HEURISTIC ALPHA BETA TREE SEARCH

# Adversarial Search: Solving Tic-Tac-Toe with Heuristic Alpha-Beta Tree Search

## Introduction

Multiplayer games can be implemented as:

1. Nondeterministic actions: The opponent is seen as part of an environment with nondeterministic actions. Non-determinism is the result of the unknown opponent's moves.
2. Optimal Decisions: Minimax search (search complete game tree) and alpha-beta pruning.
3. __Heuristic Alpha-Beta Tree Search:__ Cut off tree search and use heuristic to estimate state value.
4. Monte Carlo Tree search: Simulate playouts to estimate state value.

Here we will implement search for Tic-Tac-Toe (see [rules](https://en.wikipedia.org/wiki/Tic-tac-toe)). The game is a __zero-sum game__: Win by x results in +1, win by o in -1 and a tie has a value of 0. Max plays x and tries to maximize the outcome while Min plays o and tries to minimize the outcome.

We will implement * Heuristic Alpha-Beta Tree Search The algorithms search the game tree and we could return a conditional plan (or partial plan if cut offs are used), but the implementation here only identifies and returns the optimal next move. ## The board and helper functions I represent the board as a vector of length 9. The values are ' ', 'x', 'o'. import numpy as np import math def empty_board():     return [' '] * 9 Some helper functions. %matplotlib inline %config InlineBackend.figure_format = 'retina' # use higher resolution images in notebook import numpy as np import matplotlib.pyplot as plt from matplotlib.colors import ListedColormap def show_board(board, help = True, dpi = 40, colors = {' ': 'white', 'x': 'red', 'o': 'black'}):     """Show the tic-tac-toe-board. help adds the array index, dpi changes the sice and     colors sets the colors"""     b = np.array(board).reshape((3,3))     with plt.rc_context({'figure.dpi': dpi}):         fig = plt.matshow(np.zeros((3, 3)), cmap = ListedColormap(['w']))     fig.axes.axis('off')     plt.hlines([.5, 1.5], -.5, 2.5)     plt.vlines([.5, 1.5], -.5, 2.5)     for row in range(3):         for col in range(3):             plt.text(row, col, b[col, row],                  fontsize = 64,                  color = colors[b[col, row]],                  horizontalalignment = 'center',                  verticalalignment = 'center')     if help:         for row in range(3):             for col in range(3):                 plt.text(col, row - .35, col + 3 * row,                      fontsize = 12,                      color = 'gray',                      horizontalalignment = 'center',                      verticalalignment = 'center')     plt.show() def check_win(board):     """check the board and return one of x, o, d (draw), or n (for next move)"""     board = np.array(board).reshape((3,3))     diagonals = np.array([[board[i][i] for i in range(len(board))],                           [board[i][len(board)-i-1] for i in range(len(board))]])     for a_board in [board, np.transpose(board), diagonals]:         for row in a_board:             if len(set(row)) == 1 and row != ' ':                 return row     # check for draw     if(np.sum(board == ' ') < 1):         return 'd'     return 'n' def actions(board):     """return possible actions as a vector ot indices"""     return np.where(np.array(board) == ' ').tolist()     # randomize the action order     #actions = np.where(np.array(board) == ' ')     #np.random.shuffle(actions)     #return actions.tolist() def other(player):     if player == 'x': return 'o'     else: return 'x' def result(state, player, action):     """Add move to the board."""     state = state.copy()     state[action] = player     return state Return utility for a state. Terminal states return the utility for Max as +1, -1 or 0. Non-terminal state have no utility and return None. Note that I use utility is not None to identify terminal states. That is, I use for is_terminal(s). def utility(state, player = 'x'):     """check is a state is terminal and return the utility if it is. None means not a terminal mode."""     goal = check_win(state)     if goal == player: return +1     if goal == 'd': return 0     if goal == other(player): return -1 # loss is failure     return None # continue # Heuristic Alpha-Beta Tree Search See AIMA page 156ff. ## Heuristic Evaluation Function def eval_fun(state, player = 'x'):     """heuristic for utility of state. Returns score for a node:     1. For terminal states it returns the utility.     2. For non-terminal states, it calculates a weighted linear function using features of the state.     The features we look at are 2 in a row/col/diagonal where the 3rd square is empty. We assume that     the more of these positions we have, the higher the chance of winning.     We need to be careful that the utility of the heuristic stays between [-1,1].     Note that the largest possible number of these positions is 2. I weigh the count by 0.4,     guaranteeing that is in the needed range.     Function Returns: heuistic value, terminal?"""     # terminal state?     u = utility(state, player)     if u is not None: return u, True     score = 0     board = np.array(state).reshape((3,3))     diagonals = np.array([[board[i][i] for i in range(len(board))],                           [board[i][len(board)-i-1] for i in range(len(board))]])     for a_board in [board, np.transpose(board), diagonals]:         for row in a_board:             if sum(row == player) == 2 and any(row ==' '): score += .4             if sum(row == other(player)) == 2 and any(row ==' '): score -= .4     return score, False board = empty_board() show_board(board) print(f"eval for x: {eval_fun(board)}") print(f"eval for o: {eval_fun(board, 'o')}") board = empty_board() board = 'x' board = 'x' board = 'x' show_board(board) print(f"eval for x: {eval_fun(board)}") print(f"eval for o: {eval_fun(board, 'o')}") board = empty_board() board = 'x' board = 'x' board = 'x' board = 'o' board = 'o' show_board(board) print(f"eval for x: {eval_fun(board)}") print(f"eval for o: {eval_fun(board, 'o')}") board = empty_board() board = 'x' board = 'o' board = 'x' board = 'o' show_board(board) print(f"eval for x: {eval_fun(board)}") print(f"eval for o: {eval_fun(board, 'o')}") ## Search with Cutoff We add a cutoff to the Recursive DFS algorithm for Minimax Search with Alpha-Beta Pruning (see AIMA page 156ff). We use the heuristic evaluation function and the back-up the value using minimax search with alpha-beta pruning to determine the next move. # global variables DEBUG = 1 # 1 ... count nodes, 2 ... debug each node COUNT = 0 def alpha_beta_search(board, cutoff = None, player = 'x'):     """start the search. cutoff = None is minimax search with alpha-beta pruning."""     global DEBUG, COUNT     COUNT = 0     value, move = max_value_ab(board, player, -math.inf, +math.inf, 0, cutoff)     if DEBUG >= 1: print(f"Number of nodes searched (cutoff = {cutoff}): {COUNT}")     return {"move": move, "value": value} def max_value_ab(state, player, alpha, beta, depth, cutoff):     """player's best move."""     global DEBUG, COUNT     COUNT += 1     # cut off and terminal test     v, terminal = eval_fun(state, player)     if((cutoff is not None and depth >= cutoff) or terminal):         if(terminal):             alpha, beta = v, v         if DEBUG >= 2: print(f"stopped at {depth}: {state} term: {terminal} eval: {v} [{alpha}, {beta}]" )         return v, None     v, move = -math.inf, None     # check all possible actions in the state, update alpha and return move with the largest value     for a in actions(state):         v2, a2 = min_value_ab(result(state, player, a), player, alpha, beta, depth + 1, cutoff)         if v2 > v:             v, move = v2, a             alpha = max(alpha, v)         if v >= beta: return v, move     return v, move def min_value_ab(state, player, alpha, beta, depth, cutoff):     """opponent's best response."""     global DEBUG, COUNT     COUNT += 1     # cut off and terminal test     v, terminal = eval_fun(state, player)     if((cutoff is not None and depth >= cutoff) or terminal):         if(terminal):             alpha, beta = v, v         if DEBUG >= 2: print(f"stopped at {depth}: {state} term: {terminal} eval: {v} [{alpha}, {beta}]" )         return v, None     v, move = +math.inf, None     # check all possible actions in the state, update beta and return move with the smallest value     for a in actions(state):         v2, a2 = max_value_ab(result(state, other(player), a), player, alpha, beta, depth + 1, cutoff)         if v2 < v:             v, move = v2, a             beta = min(beta, v)         if v <= alpha: return v, move     return v, move ## Some Tests ### x is about to win (play 8) board = empty_board() board = 'x' board = 'o' board = 'o' board = 'x' print("Board:") show_board(board) print() %time display(alpha_beta_search(board, 2)) print() %time display(alpha_beta_search(board, 4)) print() %time display(alpha_beta_search(board)) ### o is about to win board = empty_board() board = 'o' board = 'o' board = 'o' board = 'x' board = 'x' print("Board:") show_board(board) print() %time display(alpha_beta_search(board, 2)) print() %time display(alpha_beta_search(board)) ### x can draw if it chooses 7 board = empty_board() board = 'x' board = 'o' board = 'x' board = 'o' print("Board:") show_board(board) print() %time display(alpha_beta_search(board, 2)) print() %time display(alpha_beta_search(board, 4)) print() %time display(alpha_beta_search(board)) ### Empty board: Only a draw an be guaranteed board = empty_board() print("Board:") show_board(board) print() %time display(alpha_beta_search(board, 2)) print() %time display(alpha_beta_search(board, 4)) print() %time display(alpha_beta_search(board)) ### A bad situation board = empty_board() board = 'o' board = 'x' board = 'o' print("Board:") show_board(board) print() %time display(alpha_beta_search(board, 2)) print() %time display(alpha_beta_search(board, 4)) print() %time display(alpha_beta_search(board)) ## Experiments ### Baseline: Randomized Player A completely randomized player agent should be a weak baseline. def random_player(board, player = None):     """Simple player that chooses a random empy square. player is unused"""     return np.random.choice(actions(board)) ### The Environment Implement the environment that calls the agent. The percept is the board and the action is move. DEBUG = 0 def switch_player(player, x, o):     if player == 'x':         return 'o', o     else:         return 'x', x def play(x, o, N = 100):     results = {'x': 0, 'o': 0, 'd': 0}     for i in range(N):         board = empty_board()         player, fun = 'x', x         while True:             a = fun(board, player)             board = result(board, player, a)             win = check_win(board)             if win != 'n':                 if DEBUG >= 1: print(f"{board} winner: {win}")                 results[win] += 1                 break             player, fun = switch_player(player, x, o)     return results ### Random vs. Random %time display(play(random_player, random_player)) ### Minimax with Alpha-Beta Pruning vs. Random def heuristic2_player(board, player = 'x'):     return alpha_beta_search(board, cutoff = 2, player = player)["move"] def heuristic4_player(board, player = 'x'):     return alpha_beta_search(board, cutoff = 4, player = player)["move"] def alpha_beta_player(board, player = 'x'):     return alpha_beta_search(board, cutoff = None, player = player)["move"] DEBUG = 1 print("heuristic2 vs. random:") display(play(heuristic2_player, random_player, N = 3)) DEBUG = 0 print("heuristic2 vs. random:") %time display(play(heuristic2_player, random_player)) print("heuristic4 vs. random:") %time display(play(heuristic4_player, random_player)) print() print("random vs. heuristic2") %time display(play(random_player, heuristic2_player)) print("random vs. heuristic4") %time display(play(random_player, heuristic4_player)) ### Heuristic vs. Minimax with Alpha-Beta Pruning DEBUG = 0 # Note: No randomness -> play only once print("heuristic2 vs. alpha_beta") %time display(play(heuristic2_player, alpha_beta_player, N = 1)) print() print("alpha_beta vs. heuristic2") %time display(play(alpha_beta_player, heuristic2_player, N = 1)) print() print("heuristic4 vs alpha_beta") %time display(play(heuristic4_player, alpha_beta_player, N = 1)) print() print("alpha_beta vs. heuristic4") %time display(play(alpha_beta_player, heuristic4_player, N = 1)) ### Heuristic vs. Heuristic DEBUG = 0 # Note: No randomness -> play only once print("heuristic2 vs. heuristic4") %time display(play(heuristic2_player, heuristic4_player, N = 1)) print() print("heuristic4 vs. heuristic2") %time display(play(heuristic4_player, heuristic2_player, N = 1)) __Idea:__ Start experiments with different boards that already have a few x's and o's randomly placed on them.