Thorin

Enter password to continue

Skip to content

Hunter Chessbot — Deep Technical Profile

Table of Contents

  1. Project Overview
  2. Research & Design Rationale
  3. Architecture: The Transfer Learning Pipeline
  4. Board Encoding & Policy Indexing
  5. The Training Backend
  6. Model Export & Inference
  7. Training Configurations & Results
  8. Technical Tradeoffs & Decisions
  9. Development Timeline
  10. Key Files Reference

1. Project Overview

A fine-tuning pipeline for chess neural networks that takes base models (Maia 1900, Maia 2200, Leela 11258 80x7-SE) and fine-tunes them on Hunter's personal Lichess and Chess.com games (~1,800 training / ~200 validation games) to create chess engines that play like Hunter. The "Maia 2200 Hunter" model used in the play-lc0 project was produced here.

By the numbers:

  • 19 commits, 2 merged PRs, built in 7 days (Feb 3–10, 2026)
  • ~9,700 lines of Python (5,640 in TF backend alone)
  • 6 trained models: 2 production (Maia 1900 v1/v2) + 4 experimental (Leela 11258 at 25k/35k/50k, Maia 2200 at 20k)
  • TensorFlow 2.x transfer learning with configurable layer freezing
  • Exports to LC0 .pb.gz format and ONNX for browser inference
  • Built on top of maia-individual training code

Connection to other projects: The .onnx exports from this pipeline are directly used in play-lc0 as the "Maia 2200 Hunter" network — a personalized chess engine trained on Hunter's own games.


2. Research & Design Rationale

Hunter conducted two research conversations before building the pipeline, progressively refining the approach from "fine-tune Leela Zero" to "fine-tune Maia" to the final multi-base-model strategy.

2.1 Why Maia Over Leela Zero

The initial question was whether to fine-tune Leela Zero directly on personal games. The key insight Hunter arrived at through research:

Leela Zero was trained via self-play reinforcement learning to find the strongest moves. Fine-tuning it on human games would fight against its core objective — it wants to play optimally, not humanly. Maia Chess is specifically designed to predict human moves at various skill levels, trained supervised on millions of Lichess games. Its architecture is inherently better suited for "play like a specific person."

Three approaches were evaluated:

  1. Use Maia directly — quick, human-like at approximate rating, but not personalized
  2. Fine-tune Maia on personal games — best bet for a "plays like me" bot (chosen approach)
  3. Train from scratch — requires tens of thousands of games (infeasible with ~1,800 games)

Hunter identified the CSSLab/maia-individual repository as the purpose-built tooling for this use case. The Maia researchers had achieved up to 65% accuracy at predicting specific players' moves, demonstrating that individual playing style is learnable from observing enough games.

2.2 The Leela Fine-Tuning Pivot

After settling on Maia, Hunter explored a second question: what about fine-tuning Leela in addition to Maia, to create a stronger personalized model? This led to the multi-model strategy.

Key tooling discovered: trainingdata-tool — built specifically for converting PGN files into Lc0's training data format. Used to train Leela networks mimicking human styles and to distill alpha-beta engines into Leela networks.

The SL (supervised learning) data quality problem: Most PGN files lack "policy" data (probability distributions over candidate moves). Without policy data, SL-trained nets are ~200 Elo weaker. trainingdata-tool addresses this by generating policy distributions from Stockfish evaluations of alternative moves using softmax — reconstructing what the policy would have been.

Precedents that validated the approach:

  • Stein network (AllieStein): trained via SL on existing engine game data
  • Lc0 team's fine-tuning for odds games: +14 =2 -2 against a GM in knight odds
  • Detlef Kappe's personality nets (BadGyal, GoodGyal, TinyGyal): Trained on Lichess games with configurable q-ratio — Good Gyal at 0.75 (more Stockfish influence, positional) vs Evil Gyal at 0.25 (more human influence, chaotic). This parameter controls how much the net learns raw move choices versus engine evaluations.

2.3 The Multi-Model Strategy

Hunter's final design used three base models with different fine-tuning strategies, each representing a different point on the personalization-vs-strength tradeoff:

  1. Maia 1900 (aggressive fine-tuning): All 6 blocks trainable, higher LR, no weight decay → maximum personalization, lower base strength
  2. Maia 2200 (conservative): Only last 4 blocks trainable, weight decay → better base play while learning personal style
  3. Leela 11258 80x7 (conservative): Larger/stronger model, only last 4 blocks trainable → strongest base play, hardest to personalize (the model's "default style" resists override)

This was a deliberate experiment: train all three, measure policy accuracy (how often the model predicts Hunter's actual move), and see which base model produces the best personalized bot. Results: Maia 2200 achieved 63.28% accuracy, Leela 11258 achieved 53.13% — confirming that human-style base models are better starting points for human-style fine-tuning.

2.4 Data Strategy

Hunter had ~1,800 training games + ~200 validation games from Lichess and Chess.com (blitz, rapid, classical — no bullet). The research established that 1-2K games is "right in the sweet spot" for fine-tuning: enough to capture personal style without overfitting. Key decisions:

  • Down-sample 1/32: Only 1 in every 32 positions used per training pass (~4,500 effective positions per epoch). Prevents overfitting on a small corpus.
  • Color separation: Games split into white/black positions, alternated during training for balanced exposure.
  • 90/10 train/val split: Standard but important for monitoring overfitting.

3. Architecture: The Transfer Learning Pipeline

Lichess/Chess.com API


┌─────────────────┐     ┌─────────────────────┐     ┌──────────────────┐
│  prepare_data.py │────>│  ChunkParser         │────>│  TFProcess        │
│  PGN → split by │     │  V4 binary records   │     │  Transfer learning│
│  color, 90/10   │     │  1/32 down-sample    │     │  w/ stop_gradient │
│  train/val split│     │  shuffle buffer(128) │     │  SGD + Nesterov   │
└─────────────────┘     └─────────────────────┘     └────────┬─────────┘

                                                    TF Checkpoint

                         ┌────────────────────────────────────┤
                         │                                    │
                         ▼                                    ▼
               ┌──────────────────┐              ┌──────────────────┐
               │  export_model.py  │              │  export_onnx.py   │
               │  → .pb.gz (LC0)   │              │  → .onnx (browser)│
               └──────────────────┘              └──────────────────┘
                         │                                    │
                         ▼                                    ▼
                  LC0 engine play                    play-lc0 / play_onnx.py

4. Board Encoding & Policy Indexing

4.1 The 112-Plane Input Tensor

The Leela Chess Zero / AlphaZero input format: [1, 112, 8, 8] Float32.

Plane layout:

  • Planes 0–103: 13 planes × 8 history positions (12 piece types + 1 repetition flag per position). In this codebase, only the current position is populated.
  • Planes 104–107: Castling rights (our queenside, our kingside, their queenside, their kingside) — full 8×8 planes of 1s or 0s
  • Plane 108: Side-to-move indicator (1.0 if black)
  • Plane 109: Rule50 counter / 99.0 (capped at 1.0)
  • Plane 110: Move count (always 0)
  • Plane 111: All-ones plane

Critical: perspective flipping for Black. When it's black's turn:

  • Board is rotated 180 degrees (both ranks and files reversed)
  • Piece colors are swapped (uppercase ↔ lowercase in FEN)
  • Castling rights are swapped (our/their perspective)
  • The model always sees the position "from the side-to-move's perspective"

This is implemented in fen_to_vec.py (preproc_fen()) and verified across multiple test files (test_no_flip.py, test_output_flip.py, test_onnx.py).

4.2 The 1858-Element Policy Index

The neural network outputs 1858 policy logits — one per possible move pattern in LC0's compressed encoding. policy_index.py contains the canonical list of 1858 UCI move strings (e.g., "a1b1", "e7e8q").

The mapping between the 80×8×8 convolutional policy output (5120 elements) and the 1858-element vector is built by lc0_az_policy_map.py:

  • 56 planes for queen-like moves (8 directions × 7 distances)
  • 8 planes for knight moves
  • 9 planes for underpromotions (3 directions × 3 piece types)
  • Total: 73 used + 7 unused = 80 planes

This policy index is the same one used in play-lc0 — Hunter directed using this pre-generated table when the AI's programmatic generation in play-lc0 produced wrong results.


5. The Training Backend

5.1 Network Architecture

SE-ResNet (Squeeze-and-Excitation Residual Network):

  • Input: 112 planes of 8×8 → Conv2D to RESIDUAL_FILTERS channels
  • Residual tower: RESIDUAL_BLOCKS blocks, each: Conv2D → BN → ReLU → Conv2D → BN → SE → skip connection → ReLU
  • Policy head (two modes):
    • CONVOLUTION (Maia-style): Conv2D(policy_channels=32) → Conv2D(80, 3×3) → ApplyPolicyMap → 1858 logits
    • CLASSICAL (Leela-style): Conv2D(policy_channels) → Flatten → Dense(1858)
  • Value head (always WDL): Conv2D(32) → Flatten → Dense(128, relu) → Dense(3) — win/draw/loss probabilities

5.2 Transfer Learning: The Stop-Gradient Approach

Instead of Keras's standard layer.trainable = False, the pipeline inserts tf.stop_gradient() Lambda layers at strategic points. This is more flexible because it allows the frozen layers to still participate in the forward pass while blocking gradient flow.

The back_prop_blocks config parameter (counted from the output) determines how many sections receive gradients:

  • back_prop_blocks: 6 with 6 residual blocks → all blocks trainable (aggressive personalization)
  • back_prop_blocks: 4 with 6 blocks → first 2 blocks frozen (conservative, preserves base knowledge)

The value head is ALWAYS frozen — only the policy head is fine-tuned. This makes sense because the goal is to learn the player's move choices, not their position evaluation.

5.3 Loss Function

total_loss = policy_weight × cross_entropy(policy) 
           + value_weight × cross_entropy(WDL)
           + L2_regularization
  • Policy loss: Softmax cross-entropy between target policy and predicted logits
  • Value loss: Softmax cross-entropy between target WDL and predicted WDL
  • Regularization: L2 with weight_decay (0.0001) applied to all trainable layers
  • Gradient clipping: Global norm clipping at 10,000

5.4 Optimizer & Schedule

SGD with Nesterov momentum (0.9). Piecewise constant learning rate with optional warmup. Float16 mixed precision with loss scale 128.

5.5 Data Pipeline

  1. prepare_data.py: Reads combined PGN, splits by color (white/black), 90/10 train/val split
  2. ChunkParser: Multiprocessing workers read V4 binary records from .gz chunks, down-sample 1/32 (only 1 in every 32 positions is used), filter by color, shuffle via 128-element buffer
  3. tf.data.Dataset.from_generator() wraps the parser with .map(parse_function).prefetch(4)

The V4 binary format (8292 bytes per record) contains: version, 7432 bytes of planes + probabilities, winner, best_q, and auxiliary data.

5.6 Alternate Training Mode

tfprocess_reg_lr_noise.py (not actively used) implements two additional regularization techniques:

  • Gaussian noise injection: N(0, 0.01 × std(weight)) added to each weight during initialization — prevents perfect memorization of base model
  • Per-layer LR scaling: Earlier layers get lower learning rates (conv_block1: 0.05×, res_0: 0.1×, ... res_5: 0.85×, heads: 1.0×) — creates a gradient hierarchy where lower layers change less

6. Model Export & Inference

6.1 Export to LC0 .pb.gz

export_model.py extracts weights from a TF checkpoint and packs them into LC0's protobuf format:

  • Conv2D weights: Transposed from TF [H,W,in,out] to LC0 [out,in,H,W]
  • Dense weights: Transposed from TF [in,out] to LC0 [out,in]
  • Batch norm: Standard deviation is squared (LC0 stores variance, not stddev)
  • Head weight permutation: Policy/value head weights are reordered to match LC0's expected ordering
  • Rule50 rescaling: Input weights for plane 109 are divided by 99

LC0 weight saving is explicitly disabled during training to avoid crashes from weight count mismatches when back_prop_blocks < total_blocks. Export is a separate post-training step.

6.2 Export to ONNX

export_onnx.py rebuilds the model architecture from scratch in Keras, restores the checkpoint, then converts via tf2onnx. This produces the .onnx files used by play-lc0 in the browser.

6.3 Interactive CLI Play

play_onnx.py uses the lczerolens library for ONNX inference. Simple game loop: print board → accept SAN/UCI input → run inference → argmax over legal policy indices → print move. The lczerolens library handles all encoding/decoding internally.


7. Training Configurations & Results

7.1 Configuration Comparison

ParameterMaia 1900 (v1/v2)Maia 2200Leela 11258
Architecture64 filters, 6 blocks, SE(8)64 filters, 6 blocks, SE(8)80 filters, 7 blocks, SE(4)
Batch size64128128
Total steps100,00050,00050,000
Back-prop blocks6 (all trainable)4 (freeze first 2)4 (freeze first 3)
Starting LR0.0020.0010.001
LR schedule0.002→0.0005→0.0001→0.000020.001→0.0005→0.0001→0.00002same as Maia 2200
Weight decaynone (default 0.0001)0.00010.0001
Precisionfloat16float16float16

Key design insight: The Maia 1900 config is the most aggressive — all blocks trainable, higher LR, no explicit weight decay — maximizing personalization at the cost of base model knowledge. The larger models use more conservative freezing + weight decay to preserve engine strength while adding personal style.

7.2 Trained Models

ModelBaseStepsPolicy AccuracySizeNotes
Hunter Maia v1Maia 1900~50kNot recorded1.7 MBFirst fine-tune
Hunter Maia v2Maia 1900~100kNot recorded1.7 MBProduction model
Leela 11258-25kLeela 1125825k8.1 MBExperimental
Leela 11258-35kLeela 1125835k53.13% (best)8.1 MBExperimental
Leela 11258-50kLeela 1125850k8.1 MBExperimental
Maia 2200-20kMaia 220020k63.28% (best at 18.5k)1.2 MBExperimental

What "policy accuracy" means: The percentage of positions where the model's top-1 policy prediction matches the move Hunter actually played. Maia 2200 achieving 63.28% means it correctly predicts Hunter's move nearly 2/3 of the time — a strong result for a 64x6 network on ~1,800 games.


8. Technical Tradeoffs & Decisions

8.1 Stop-Gradient vs layer.trainable

Using tf.stop_gradient() Lambda layers instead of layer.trainable = False. The stop-gradient approach is more flexible: frozen layers still participate in forward pass and batch norm statistics computation, but receive no gradient updates. This preserves the base model's learned representations more faithfully during fine-tuning.

8.2 Down-Sampling 1/32

Only 1 in every 32 training positions is used (SKIP = 32). With ~1,800 games averaging ~80 positions each = ~144,000 total positions → ~4,500 effective training positions per epoch. This prevents overfitting on a small personal game corpus while still extracting the player's style signal.

8.3 Color-Separated Training Data

Games are split into white and black positions. The ChunkParser alternates between white and black data sources (FileDataSrc), ensuring balanced color representation in each batch. This matters because the board is always presented from the side-to-move perspective, so the model needs equal exposure to both perspectives.

8.4 Value Head Always Frozen

The value head (position evaluation) is never fine-tuned — only the policy head (move prediction). This is deliberate: the goal is to learn which moves the player chooses, not to change the model's assessment of how good positions are. The base model's position evaluation is already strong; personalizing it on ~1,800 games would likely degrade it.

8.5 Web UI Built Then Removed

A React + ONNX Runtime WASM web UI was built (PR #1, 213K lines added), encoding bugs were debugged (PR #2), then the entire web UI was deleted (Feb 6-7). The play-lc0 project superseded it as the browser-based chess interface — and inherited the lessons learned here about encoding, policy indexing, and ONNX conversion.

8.6 Multiple Base Models, Different Strategies

Three base models were tried with different fine-tuning strategies:

  • Maia 1900 (aggressive): All blocks trainable, small architecture → maximum personalization, used as production model
  • Maia 2200 (conservative): Only last 4 blocks trainable → better base play, decent accuracy (63.28%)
  • Leela 11258 (conservative): Larger model, only last 4 blocks → strongest base play, lower personal accuracy (53.13%) because the model's "default style" is harder to override

9. Development Timeline

DateCommitsKey Achievement
Feb 32Initial commit + full training infrastructure + Hunter Maia v1
Feb 3-42 PRsWeb UI (PR #1), encoding bug fixes (PR #2)
Feb 51play_onnx.py CLI play script
Feb 5-65Weight decay configs, TF2 fixes, lc0 weight loading, multi-model support
Feb 62Leela 11258 + Maia 2200 training runs + checkpoints + exports
Feb 6-72Deleted web UI, cleaned up WASM artifacts
Feb 101Final README revision

Total: 19 commits, 7 days, from nothing to 6 trained models + full export pipeline.


10. Key Files Reference

Training

FileLinesPurpose
src/backend/tf_transfer/tfprocess.py~1200Main TF training loop: model construction, weight loading, loss, optimizer
scripts/train_transfer.py~100Training orchestrator: config loading, data pipeline, loop execution
src/backend/tf_transfer/chunkparser.py~300Multiprocessing V4 binary record parser with shuffle buffer
src/backend/tf_transfer/training_shared.py~100Chunk discovery, white/black data source alternation

Encoding

FileLinesPurpose
src/backend/fen_to_vec.py~200FEN → 17/112-plane tensor with black-side flip
src/backend/tf_transfer/policy_index.py~19001858 UCI move strings (canonical LC0 ordering)
src/backend/tf_transfer/lc0_az_policy_map.py~20080×8×8 → 1858 mapping matrix

Export

FileLinesPurpose
scripts/export_model.py~300TF checkpoint → LC0 .pb.gz (with weight transposition)
scripts/export_onnx.py~500TF checkpoint → ONNX (rebuilds model, uses tf2onnx)

Data

FileLinesPurpose
scripts/prepare_data.py~80PGN split by color, 90/10 train/val
src/backend/pgn_to_csv.py~400PGN → per-move CSV with eval/clock/material features

Configs

FileBase ModelBlocks TrainableTotal Steps
configs/training_config.yamlMaia 1900All 6100k
configs/training_maia_2200.yamlMaia 2200Last 450k
configs/training_leela_11258.yamlLeela 11258Last 450k