Building a chess engine: from bit twiddling to search heuristics

# Building a chess engine: from bit twiddling to search heuristics

#### Or how Camel was built

#### Semana de Informática 2024 @ FEUP

---

### Your host

** Bruno Mendes **

Software Engineer @ Kevel  
M.EIC Alumni  
4-year NIAEFEUP Member  
Club Chess Player @ GXP

---

### Let's talk about chess programming

Interrupt me anytime in case I am not being clear!

---

### Context

- Camel is my biggest personal project
    - Amateur chess engine built from scratch to learn Rust and chess programming
    - 2200-Elo on chess bot competitions (CCRL); estimated to be around 2500 in human pools
    - Available on my GitHub
- This talk uses code adapted from Camel, but the concepts are relevant in all of chess programming and game theory

---

### A quick reminder of chess rules

---

### Chess programming is really simple

---

### The big picture

#### What is the use case of our program?

---

### A single user story

"As an operator, I would like to retrieve the best move for a given position, given the time available for both players [so that I may win]"

---

### A clear separation of concerns

---

### Let's use a bottom-up approach

What core abstractions do we need for a chess position?

---

### The `Position` structure: considerations

- Needs to
    - Encode the contents of the 64 squares of the chess board
    - Store metadata such as castling rights, enpassant square
    - Be serialized and unserialized to/from FEN
    - Be flexible enough for very fast move making

*Is an array enough?*

```rust
struct Position {
    mailbox: [Option<Piece>; 64],
    en_passant_square: Option<Square>,
    castling_rights: u8, // bitset
    ...
}
```

---

### FEN encoding

.left-column-33[
- A rather simple position representation
- Very useful for communication protocols and chess analysis tools
- Not suitable for efficient hashing... (why?)

> 1k3R2/1pp4q/p1p5/7r/4P3/2PPQ3/P1P3P1/6K1 b - - 0 27
]
.right-column-66[
    <img src="/assets/slides/sinf-24-chess/fen.png" style="width: 450px;">
]

---

### The mailbox approach

- A regular array with 64 elements
- More efficient than a 2D array due to pointer indirection

- Easy to calculate directions with constant offsets

---

### Calculating movements using mailbox

- Loop over directions as you learn to do in programming 101
    - Stop when the board wraps around or you find an occupied square

```rust
for &offset in directions {
    let mut current_offset = offset; // e.g. -1 (WEST) or 9 (NORTHEAST)

loop {
        let to_index = (square.0 as isize + current_offset) as usize;
        if to_index >= BOARD_SIZE {
            break;
        }

if let Some(to_piece) = position.board[to_index] {
            if to_piece.color() != color {
                moves.push(Move::new(square, to_index, MoveFlags::CAPTURE));
            }
            break;
        } else {
            moves.push(Move::new(square, to_index, MoveFlags::empty()));
            current_offset += offset;
        }
    }
}
```

Unfortunately, not very efficient...

---

### Enter bitboards

You know what else has 64 elements besides a chess board?

---

### Bitboards

- A *piece-centric* manner of representing the board
    - 6 u64 for each piece: pawn, king, knight, bishop, rook, queen
    - 2 u64 for each color: white, black
- Very efficient for counting material and many other operations
    - "Where (and how many) are the black rooks?"
    - "Give me the rooks on the seventh rank."

<img src="/assets/slides/sinf-24-chess/bb_op.png" style="width: 100%">
---

### Calculating pawn movements

Shift and mask!

```rust
let direction = MoveDirection::pawn_direction(position.side_to_move); // -8 or 8

// Single push
let single_push_pawns = our_pawns.shift(direction) & !occupancy; // << 8 or >> 8
for to_square in single_push_pawns {
    let from_square = to_square.shift(-direction);
    moves.push(occupancy, moves, from_square, to_square);
}

// West capture
let west_pawns = (our_pawns & !WEST_FILE).shift(direction + MoveDirection::WEST)
    & occupancy_them;
for to_square in west_pawns {
    let from_square = to_square.shift(-direction - MoveDirection::WEST);
    moves.push(occupancy_them, moves, from_square, to_square);
}
...
```

---

### Calculating leaper movements

- Do it once, then be lazy.
    - You're building a `Map<square, moves>`

```rust
pub type LeaperAttackMap = [Bitboard; 64];

pub static KNIGHT_ATTACKS: LeaperAttackMap = init_leaper_attacks(&[
    MoveDirection::NORTH + 2 * MoveDirection::WEST,
    MoveDirection::NORTH + 2 * MoveDirection::EAST,
    MoveDirection::SOUTH + 2 * MoveDirection::WEST,
    MoveDirection::SOUTH + 2 * MoveDirection::EAST,
    2 * MoveDirection::NORTH + MoveDirection::WEST,
    2 * MoveDirection::NORTH + MoveDirection::EAST,
    2 * MoveDirection::SOUTH + MoveDirection::WEST,
    2 * MoveDirection::SOUTH + MoveDirection::EAST,
]);
```

Works fine because knights simply jump and cannot be blocked.

---

### Calculating slider movements

- Can do it on the fly, but there is a better approach

- *Magic bitboards* is a perfect hashing technique that caches the available movements for a piece, given the occupancy of the board
    - You're trying to build a `Map<(square, occupancy), moves>`
    - But instead you use a custom data structure to speed things up: you find these "magics" that eventually yield the correct moves for all blocker setups

```rust
// We'll have 128 of these. 2 sliders (rook, bishop) and 64 squares.
pub struct SquareMagic {
    pub blockers_mask: Bitboard, // piece ray in an empty board
    pub shift: u8, // the number of ones of the blockers mask
    pub magic: Bitboard, // a number used for hashing; calculated a priori
    pub attacks: Vec<Bitboard>, // all possible move combinations
}

pub fn magic_index(magic: &SquareMagic, occupancy: Bitboard) -> usize {
    // Remove non-relevant pieces from blockers (i.e not in ray)
    let blockers = occupancy & magic.blockers_mask;

// Do the magic to find the index in `square.attacks`   
    ((blockers * magic.magic) >> (64 - magic.shift)) as usize
}
```

---

### Calculating attackers of a square

- Useful e.g. for check detection.
- Put a virtual piece on the square and find movements from here
    - If you're simulating a rook and find rooks or queens in path, logically those will also "see" you and thus are attacking you

```rust
pub fn square_attackers(board: &Board, square: Square, color: Color) -> Bitboard {
    let mut bb = Bitboard::new(0);

// Attacked in file or rank
    let attacker_rooks_queens =
        (board.pieces(Piece::Rook) | board.pieces(Piece::Queen)) & occ_attacker;
    let rook_attacks = piece_attacks(Piece::Rook, square, occupancy, color);
    bb |= rook_attacks & attacker_rooks_queens;

...
}

pub fn king_square_attackers(board: &Board, color: Color) -> Bitboard {
    let checked_king = board.pieces_bb_color(Piece::King, color.opposite());
    square_attackers(board, checked_king, color)
}
```

---

### Move packing

.left-column-33[
- Speed is absolutely key not also in calculating moves but in storing them
- We can minimize memory usage by packing move data and metadata in a very domain-specific 16-bit payload
    - 6 bits for the intial square
    - 6 bits for the target square
    - 4 bits for metadata
]
.right-column-66[
    <img src="/assets/slides/sinf-24-chess/move_metadata.png" style="width: 100%;">
]

---

### Testing it all up

.left-column-33[
- This is all very complex and prone to bugs
    - My king used to like castling with no rook

- Fortunately, people are kind enough to provide huge testing suites for move generation, e.g. from Stockfish
]
.right-column-66[
```rust
fn expect_perft(fen: &str, depth: u8, nodes: u64) {
    let position = Position::from_fen(fen).unwrap();
    assert_eq!(perft(&position, depth, nodes));
}

#[test]
fn perft_case_1() {
    expect_perft("2K2r2/4P3/8/8/8/8/8/3k4 w - - 0 1",
     6, 3821001);
}

pub fn perft(position: &Position, depth: u8) -> u64 {
    if depth == 0 {
        return 1;
    }

let moves = generate_moves(position);
    let mut nodes = 0;

for mov in moves {
        let new_position = make_move(position, mov);
        let count = perft(&new_position, depth - 1);
        nodes += count;
    }

nodes
}
```
]

---

### We know how to generate moves, now what?

How does a chess engine know which positions are best?

---

### Evaluation 101

- Assignment of how much a position is worth, in white's perspective
- We like to use *centipawns* as a relative measure (although Stockfish is switching to being based on win probabilities...)
- Checkmates should be evaluated with scores outside of the reachable bounds for a playable position, e.g -999 for black and 999 for white

- Q.: Why do we need centipawns? Can't we just count material?

---

### Evaluation philosophy

- Unfortunately (or fortunately), we as humans use much more than material to assess a position:
    - Is my king safe?
    - How "developed"/active are my pieces?
    - How strong is my pawn structure?
- This needs to be encoded in the evaluation function
    - If we encode *too much*, we are making the function slower and less generalizable
    - If we encode *too little*, the engine does not know how to play chess at all, and will just repeat moves as long as they don't lose a piece right away

---

### Piece-square tables (PSQTs)

- A very rough, although effective way of providing the engine with positional knowledge
- A fancy name for *heatmaps*

- Should be *tapered* for the engine to play the endgame well, e.g. increase pawns values more as they advance in the board to incentivize promotion and incentivize the king to become active

```rust
const MIDGAME_KNIGHT_PSQT: PieceSquareTable = [
    -50,-30,-30,-30,-30,-30,-30,-50,
    -40,-20,  0,  5,  5,  0,-20,-40,
    -30,  5, 10, 15, 15, 10,  5,-30,
    -30, -5, 15, 20, 20, 15, -5,-30,
    -30,  5, 15, 20, 20, 15,  5,-30,
    -30,  0, 10, 15, 15, 10,  0,-30,
    -40,-20,  0,  0,  0,  0,-20,-40,
    -50,-30,-30,-30,-30,-30,-30,-50,
];
```

*"Don't place knights in the corners"*

---

### Pawn structure considerations

- Subject to discussion, and highly dependant on other factors
    - Give bonus to passed pawns
    - Give penalty to doubled and isolated pawns
- These may not generalize to every position...
    - Sometimes isolated pawns are good, providing extra space

```rust
pub fn evaluate_pawn_structure(position: &Position) -> ValueScore {
    let mut score = 0;

let white_pawns = position.board.pieces_bb_color(Piece::Pawn, Color::White);
    let black_pawns = position.board.pieces_bb_color(Piece::Pawn, Color::Black);

score += doubled_pawns(white_pawns) as ValueScore * DOUBLED_PAWNS_PENALTY;
    score -= doubled_pawns(black_pawns) as ValueScore * DOUBLED_PAWNS_PENALTY;

score += passed_pawns(pawn_direction(Color::White), white_pawns, black_pawns)
            .iter().fold(0, |acc, rank| acc + PASSED_PAWN_BONUS[*rank as usize]);
    score -= passed_pawns(pawn_direction(Color::Black), black_pawns, white_pawns)
            .iter().fold(0, |acc, rank| acc + PASSED_PAWN_BONUS[*rank as usize]);

score
}
```

---

### Now it should be simple to play chess, right?

Maybe not

---

### A first trial

- Make a move and use the one that results in the best position

```rust
pub fn best_move(position: &Position) -> Move {
    let moves = position.moves();
    let best_idx = 0;
    let best_eval = -999;

for (idx, mov) in moves.enumerate() {
        let new_position = position.make_move(&mov);
        let eval = new_position.evaluate();
        if (eval > best_eval) {
            best_eval = eval;
            best_idx = idx;
        }
    }

moves[best_idx]
}
```

Is this enough?

---

### 1-ply search isn't enough

- Conceptually, we'd like to look for positions very far away
    - "Grandmasters calculate 20 moves deep" is a bit misunderstood, but generally calculating deeper does result in better gameplay
- 1-ply immediately misses recaptures, e.g. we'll grab a pawn with a queen missing that it would be hanging the next move or miss mate in 1
    - Is there a solution to this without increasing depth?

---

### Minimax

- A classic algorithm for gameplay in adversarial games with complete information
    - In chess we do see the entire board (not so in poker and others...)
- We simply evaluate at leaves and find the move that yields the best evaluation for us, assuming that the opponent will play the best move

<div class="display: flex; width: 100%;">
    <img src="/assets/shared/chess-engine-position.png" style="width: 250px;" />
    <img src="/assets/slides/sinf-24-chess/chess_engine_game_tree.png" style="width: 300px;" />
</div>
---

### Quiescence search

- Unfortunately, we can't search with very large depths, due to search explosions and limited time to think
    - As discussed before, we need to guarantee that evaluation at leaves does not miss captures to avoid catastrophic blunders
- *Quiescence search* is the search for "quietness", or simply a position without captures deemed safe
    - Simply a constrained search with unlimited depth besides the Minimax leaves

---
class: center, middle, inverse, small-images

### Ok, now we can play miserably

How to speedup search besides depths 4/5?

---

### Alpha-beta pruning

- Mostly a free lunch; this is a lossless technique
    - You lose the ability to get multiple variations, but you can work around that
- Avoids expanding branches that cannot beat the current best
- α and β represent the best scores so far for us and the opponent, and form a *search window*
    - They are used by many other heuristics

---

### Move ordering

- If we search the best move first, alpha-beta will cut all remaining branches
    - Of course, we do not know the best move beforehand, but we might guess (or even better, just in a bit)
- Good heuristics for move ordering are straightforward
    - PSQT-based: if a move improves the rough position of a piece, it's likely better
    - MVV-LVA (Most Valuable Victim - Least Valuable Aggressor): prefer capturing high value pieces with our less valuable pieces

```rust
pub fn evaluate_move(position: &Position, mov: Move) -> ValueScore {
    let mut score = 0;
    let moving_piece = position.board.piece_at(mov.from());
    if mov.flag().is_capture() {
        let captured_piece = position.board.piece_at(mov.to());
        score += captured_piece.value() - moving_piece.value();
    }
    score += psqt_value(moving_piece, mov.to(), position.side_to_move, 0);
    score -= psqt_value(moving_piece, mov.from(), position.side_to_move, 0);
    score
}
```

---

### Null-move pruning

- The most essential lossy pruning technique
- If skipping our turn is still good enough to cause a beta-cutoff, then this is too good to be true

```rust
if !is_check && !twofold_repetition && !may_be_zug {
    position.side_to_move = position.side_to_move.opposite();
    let (score, _) = alphabeta(position, depth - R, -beta, -alpha);,
    position.side_to_move = position.side_to_move.opposite();

if score >= beta {
        return (beta, count);
    }
}
```

- Does not work in late endgames due to *zugzwang*, the most obvious example being the basic checkmates
- Of course, must also be disabled when in check to avoid illegal positions

---

### Standing pat

- Term used in poker variants, where you choose to keep your hand and withdraw from getting more cards
- In chess programming, this is the static evaluation of a position inside a quiescence search
    - Assuming captures cannot make the position worse, one refutation is enough to cancel a capture
    - Can also use it to "delta prune", i.e. fail if even a queen capture can't improve the score

```rust
let static_evaluation = position.value() * position.side_to_move.sign();

alpha = alpha.max(static_evaluation);

// Beta cutoff: position is too good
if static_evaluation >= beta {
    return (beta, 1);
}

// Delta pruning: sequence cannot improve the score
if static_evaluation < alpha.saturating_sub(Piece::Queen.value()) {
    return (alpha, 1);
}

static_evaluation
```

---

### Selectivity

- One can experiment with extension and reductions to further cut down the search tree size
    - *Check extensions* are usually a good idea, since they are usually a hint of a forced, interesting variant
    - *Late move reductions* trust that our move ordering is good enough to see later moves shallower
    - *Passed pawn extensions* help the program in endgames
- This is very unstable and the outcome in overall play strength might change drastically due to synergies

---

### Transposition tables

- A common theme in chess is finding the same positions with different move ordering
    - Of course, saving scores from previously seen positions saves search time
- Must take special care with score semantics
    - A score retrieved from a beta-cutoff is not an exact score, but rather a *lowerbound*, so it should be treated as that

```rust
if let Some((score, score_type)) = table.get_table_score(position, depth, ply) {
    match score_type {
        Exact => return (score, 1), // found a move that improve the score
        LowerBound => alpha = alpha.max(score), // "fail-high", i.e. beta-cutoff
        UpperBound => beta = beta.min(score), // "fail-soft", i.e. alpha unchanged
    }

// Beta cutoff: position is too good
    if alpha >= beta {
        return (alpha, 1);
    }
}
```

- We are storing data about a position in a hash table, but how to actually hash a position?

---

### Zobrist hashing

- Generate random u64 for (piece, square) pairs, and just XOR them into the hash as pieces move in and out of squares
    - This is because a ^ b ^ b = a

```rust
fn xor_hash(&mut self, square: Square, piece: Piece, color: Color) {
    let index = color as usize * 6 * 64 + piece as usize * 64 + square as usize;
    self.hash ^= ZOBRIST_NUMBERS[index];
}

pub fn set_square(&mut self, square: Square, piece: Piece, color: Color) {
    self.clear_square(square);
    self.pieces[piece as usize].set(square);
    self.occupancy[color as usize].set(square);
    self.mailbox[square as usize] = Some(piece);
    self.xor_hash(square, piece, color);
}
```

- Probability of collision is very thin (around 10^-5 for 64-bit hashes), but exists
    - As Sebastian Lague (Youtuber) says, "I guess that if we want speed we'll just have to live in danger"

---

### Iterative deepening

- Due to time limitations we search with progressively incrementing depths
    - Say we assign 10 seconds for searching. We'll search depth=1, depth=2, ... until time is reached and we yield the last completed search result
- Ironically, due to introducing *hash moves* in the transposition table, a search with depth *n* inside an iterative deepening framework is faster that a single one outside of it

```rust
pub fn alphabeta_iter(position: &Position, table: SearchTable) -> Option<Move> {
    let mut current_depth = 1;
    let mut current_best_move = None;

loop {
        let mov = alphabeta(position, current_depth, table);
        if mov.is_none() {
            // The search could not finish in time, or we are mated.
            // alphabeta checks for time termination.
            break;
        }
        current_depth = (current_depth + 1).min(MAX_DEPTH);
        current_best_move = table.get_hash_move(position);
    }

current_best_move
}
```

---

### Time management

- Our orchestrator (e.g. GUI or Lichess) provides us with a position and times, and now we need to assign some time to limit iterative deepening...
    - This is highly subjective; we humans do it very differently based on personality and humor
- A simple heuristic is to spend more time around move 20, when the game is more "tense" and past opening theory
- May also factor estimated remaining moves

---

### Advanced topics

If the organizers are not already impatient with me

---

### Opening books and endgame tablebases

- People like to ask me, what about teaching the engine openings and using precomputed endgame results
- This is done is some engines but is mostly handled by the operator, e.g. a GUI like Chessbase or Lichess
- Opening books are a necessity, since the opening is very theory-based; endgame tablebases are not much of an help, since we can search much deeper in the endgame

---

### LazySMP

- Minimax is a sequential algorithm by nature
    - Specially when we use (and we must) alpha-beta pruning
- *LazySMP* consists in launching a search in various cores with random variations in depth and move ordering
    - The cores will share the hash table, and use each others information
    - Important to avoid contention, as it will significantly slow down the search

```c
// Storing in the table with a lockless hashing schema
index = key % TABLESIZE;
hashtable[index].key  = key ^ data;
hashtable[index].data = data;

// Retrieving from the table with a lockless hashing schema
index = key % TABLESIZE;
if (( hashtable[index].key ^ hashtable[index].data) == key )
{
   /* entry matches key */
}
```

---

### Texel Tuning

- We discussed PSQTs and pawn structure bonuses
    - Of course, there can be many other evaluation parameters, but the common theme is that they are a wild guess
    - Unless, of course, we optimize them in some way
- *Texel Tuning* optimizes the parameters in regards to the outcome of games in a database

where *Ri* is the result of the game (0 for black win, 0.5 for draw and 1 for white win); 
q(i) is the quiescence search result;
N is the number of test positions.

---

### NNUE

- I once read that "Texel Tuning is the NNUE of the poor"
    - That is because we are optimizing the parameters of our own handcrafted function
- *NNUE* is a neural network to evaluate a position, with a catch
    - Since most aspects of a position do not change after a move (only 2 squares generally) we can leave most intermediate matrix multiplications unchanged and just update the affected ones

---

### Integration testing

- Perft is a good test suite for move generation, but a engine is so much more than that
    - The *Strategic Test Suite* tests the program strength in strategic play
    - Other suites test deep tactical patterns, uncovering bugs like the incorrect use of new heuristics
- A good (regression-sort-of) testing is putting the new versions against the latest stable one in a long tournament
    - I have this running in a Github Action

---

### The sky is the limit

- Chess programming is always evolving, and there is so much that we did not cover, in part because of time, but also because I can't fully grasp some advanced on-the-rise topics
- Use the beauty of the Internet to learn and build amazing things yourself
    - The *chessprogrammingwiki* is your friend
- Do it for the plot (and the knowledge), not for the job

---

## Thank you!

Questions?