By Drew Dunne (asd222), Jacob Glueck (jng55), Michael Solomentsev (mys29)
ECE 5760: Advanced Microcontroller Design - Spring 2019 Final Project
We built an FPGA-accelerated Monte Carlo Simulation-based Limit Texas Hold'em poker bot.
Given a player's hand, and the community cards, it simulates approximately 2 million hands per second, and determines the probability of that given hand winning. Our system was able achieve significant hardware acceleration. It runs about 4000 times faster than a (unoptimized) C++ version on the ARM Hard Processor System (HPS), and about 10 times faster than the C++ version on an Intel i7-6700HQ laptop processor.
Using this simulator, we implemented a simple game where 1 user can play poker against computers. The computers use the FPGA to determine their probabilities of winning at any given point, and decide to either raise, call, check, or fold based on that information. When the human plays, the game presents them with their hand and currently probability of winning, and asks them to make a choice.
Part of an example game is below:
Our project originated from an interest in designing something both mathematically interesting and enjoyable to use. Two out of the three group members are avid poker players (for the record, Jacob still does not know how to play poker). Our reading indicated that poker bots of various flavors are a fairly active field of research (these can be based on neural networks or other strategies). We determined that the best way of implementing a poker bot on an FPGA would be through Monte Carlo simulation-based decision making.
Monte Carlo simulations involve randomly searching over a given state space over many iterations (in our case, over 10^^6 simulations). Poker has an extremely large state space that is suited for this kind of approach. By our estimation, in a ten player game of Texas Hold'em, there are approximately 10^^29 possible game states. It would be overkill to evaluate all possible hands, but evaluating a randomly distributed subset of these states can produce an indication of how strong a player's hand is.
We set the known cards (a player's hole cards, the community cards) as fixed conditions, then randomly assign hands and community cards (if not already specified) to other players. We then evaluate which player would win for a given hand. This process is repeated at least 10^^6 times, then we calculate the percentage of hands won. If this simulation is for a computer player, a action (raise, call, check, fold) is made based on hard-coded thresholds of the win probability. If the human player needs to make a decision, we give them their win probability. In this way, we can fully simulate a poker game.
We specifically simulated limit Texas Hold'em, a variation where bet size is fixed, because we did not want to deal with calculating raise values.
The C++ program had 2 primary functions: it could either be run in an interactive poker-playing mode (allowing the user to play the game), or it could be run in a testing mode to to test the FPGA. The program also contained an implementation of the Monte Carlo algorithm, and when launched in poker-playing mode, and command line flag allowed the user to select either the FPGA or the CPU for game computations. The following sections discuss the main modules in the program.
The core of the program is
These files contain structs for cards, game states, and all related types.
They also have functions for printing cards and hands to the screen and shuffling decks.
poker_prob.h defines a simple interface for interacting with the FPGA:
It provides 2 methods:
init, which configures everything, and then
num_wins, which given a ProbSetup -- the current state of the game as viewed by a single player and the total number of hands to simulate -- computes the number of hands that win.
FpgaProb, derive from this abstract base class.
This allows us to swap between the FPGA and CPU implementations easily for benchmarking and debugging.
cpu.h form the CPU implementation of the solver, and implement the
This class performs the same computations as the FPGA: given a game setup, it does a Monte Carlo simulation to determine the winner.
The trickiest part of this is given all the players cards, and the pool cards, determining a winner.
This is hard because first the program has to determine the best hand each player has.
Each player has 2 cards, plus the 5 pool cards, for a total of 7.
The program must iterate through all 21 five card combinations from this 7 and determine the best hand for the player.
fpga.h form the FPGA implementation of the solver, and implement the
Essentially, this class is responsible for shuttling data to and from the FPGA, which it does using
using PIO ports.
The main poker-playing game is in
This file contains all the game logic, and the user interface.
When the user starts playing, it firsts asks the user for a name.
For the computers' players, it chooses a random first and last name pairing, from a
list of first names and a list of last names.
The computer players normally end up with names like:
The program then prompts the user through the game.
When it is the user's turn, it asks for a move.
The permitted moves are fold (
f), check (
ch), call (
c), or raise (
Can either type in the full command, or the short version indicated in the parenthesis above.
When the game ends, the program prints out the winners, the hands of all the players
who have not folded, and the amount of money the user won.
Our hardware is made up of multiple modules stitched together on the FPGA. There is no other hardware external to the FPGA. The modules we have control the shuffling (DeckShuffler), the winner detection of a hand of poker (Winner), the best hand combination finder (BestRank), a hand sorter (SortHand), and lastly a ranker which takes a hand and converts it to a hexadecimal number that is easily comparable.
We use a pseudo-random number generator based on cyclic coding. We use a 128-bit generator register, which uses a variety of shifts and XORs to produce a new 16-bit random number every cycle. Our generator was written by Professor Land .
A deck shuffler translates the random numbers generated into a shuffled deck. Each deck shuffler module takes an input of 45 to 47 cards (i.e. all the unknown cards that need to be shuffled). We use an optimal shuffling algorithm, the Fisher-Yates shuffling method :
Because of the optimal nature of this algorithm, we only need to cycle through the deck once. In fact, because we only draw a maximum of 20 cards at a time, we only need to shuffle the first 20 cards. Each shuffler module shuffles until it's gotten through 20 cards, waits for the next stage of the pipeline to pull, then continues to shuffle.
This module went through each player's set of hole cards, and combined them with the five community cards and passed them to the BestRank module to get that player's best possible hand's score. The Winner was pipelined such that each stage computed one player's best hand and then compared this to the current best score of any previous player. Pipelining each player's comparison allowed us to achieve maximum throughput of different hands, despite each stage taking more than one cycle. The index of the player with the best score was also kept as this is what was output for each input. The inputs to this module consisted of all ten players' hole cards and a set of five community cards. The winner module could not handle ties, but we deemed this okay because the player cards that were known would always be in index zero. Thus, if no other player beat the known hole cards, this would be counted as a win. In terms of the final probability the win count is used for, a tie could be counted as a win because the money in the pot would not be lost.
BestRank was responsible for getting the rank for each of the 21 combinations of the 7 cards. The module consists of two iterative loops that swap in the hole cards in different places of the five community cards to create a hand for passing to the Ranker module. The first loop of two swaps in a single card through each position, applying this for each of the hole cards. The best rank for the single card swap is kept and compared each iteration. The second loop places both cards into the community cards and iterating over each possible two card swap. Finally the module compares the two best ranks of the single and double swaps and outputs the best of the two ranks. Because the BestRank module is iterative, it must use val/rdy bundles on its inputs and outputs to interface with the Winner pipeline. It makes use of multiple states as well, shifting between
DONE to handle the val/rdy signals properly.
We tried making BestRank occur completely combinationally, but instantiating 21 Ranker modules, would not meet timing as the muxing and comparisons would not meet timing with the combinational logic of the Ranker and also used up more area, preventing us from creating multiple pipelines. Our final decision was to make this module iterative such that we could later create multiple complete solvers over speeding up a single pipeline.
It is critical that we be able to evaluate the strength of each hand relative to one another. As noted above, the ranker module produces a 24 bit rank value, which can be used to determine which of two hands is better through a simple comparision. The rank is split up into:
x[Type of hand ( bits)][Card Value ( bits)][Card Value][Card Value][Card Value][Card Value]
The type of hand represents the actual poker hand produced (e.g. a straight flush  is higher than a flush  which is higher than a pair ). If two hands have identical types, the compare is then between the value of their most valuable cards. In the case of pairs, the cards involved with the pairs are placed in the 0 and 1 index spots in the above rank number. In this way, a pair of 3s would be greater than a pair of 2s.
We perform a variety of tests in parallel to determine what types of hand each hand actually has:
We then do a variety of tests to identify sets of pairs, three and four of a kinds, and full houses (all of which have their individual ranking rules).
It is a prerequisite for the ranker that the input hand be sorted from greatest to least value. We use an optimal 5-value sorting network to do this. We used a previously developed one .
This module is the wrapper over our two main modules, connecting the DeckShuffler to the Winner pipeline, essentially creating an additional pipeline stage. The module has two counters, a sent counter and a done counter, as well as a total number of hands won. The inputs come straight from the PIO ports, interfacing over the PIO ports using val/rdy bundles (for inputs and outputs). The module takes in the currently known community cards, the two known hole cards, and the remaining cards in the deck (as this is much easier to calculate on the HPS). The module then gives the remaining deck to the DeckShuffler, and then effectively deals the resulting shuffled deck through connections to the Winner module.
Once the input has been set to valid, the state changes to begin calculating. This begins starting shuffles to send down the winner pipeline. When valid indices come out the end of the Winner pipeline, it checks if the index is zero and adds to the number of wins. Once the done and sent counters hit the total number of hands to run (which is an input), it sends the number of wins back on the PIO.
In the end, we were able to instantiate three sets of the DeckShufflers and Winners to triple our counting ability. To make the counting easier on the FPGA end, we would still count to the same total hands count sent over PIO, but would do three times that amount.
We benchmarked the design running on the FPGA, the HPS, and a standard laptop processor:
|Implementation||Time / 100000 hands (ms)||Hands per second (KHz)||Speedup|
The FPGA is extremely fast, about 11.7 times faster than the i7, and 4287 times faster than the HPS. Being able to simulate so many hands so fast allows the poker bot to provide accurate probabilities, but without a noticeable delay to the user.
To ensure the design was accurate, we tested the software solution in isolation, tested the hardware design in modelsim, tested the hardware system against the software system, and finally tested the entire game.
To test the sofware system, we tested components individually.
First, we tested the algorithm which ranked hands and determined winners.
This was done though simple directed test cases.
Second, we tested the probabilistic
To do so, we gave it hands that seemed, based on our intuition,
either good or bad, and made sure the probabilities aligned with our expectations.
To test the hardware system, we used modelsim. In much the same way we tested the software system, we would input crafted inputs into the modelsim testbench, and ensure that the output agreed with our expectations.
To provide additional coverage, we also ran a series of random tests. For these tests, we fed the FPGA and CPU solvers the same games, and compared the output probabilities to make sure they were similar. This random testing provided confidence that our system worked correctly.
Finally, we tested the entire system together by playing entire games of poker. This testing was interactive: we would play a game of poker, and make sure throughout the game the probabilities were reasonable, and that the gameplay was as expected.
The entire system is very easy to use; running the poker program prompts the user for instructions and moves.
We were extremely pleased with our results. The acceleration over the HPS and laptop processor was excellent, and indicated that our approach had real advantages. We found the final poker game to be extremely enjoyable, and more importantly, found that the other players were acceptable and rational poker players.
If given more time, it would be extremely interesting to build a stronger decision making tree on top of the Monte Carlo simulation. This would make the computer players more interesting to play against, and the gameplay more enjoyable.
Texas Hold'em is not intellectual property, and is in the public domain. There were no IP issues with our project.
The group approves this report for inclusion on the course website.
The group approves the video for inclusion on the course youtube channel.
N/A. No external hardware was built. This will now be a poker vocabulary section.
Hole cards: the two cards a player is dealt that are kept secret.
Community cards: The 5 cards that are shared among everyone at the table, visible to all.
The flop: After the first round of betting, three community cards are layed onto the table, this is called the flop.
The turn: After the post-flop betting, a fourth community card is put down, this is the turn.
The river: After the post-turn betting, the fifth and final community card is put down, this is the river.