Appearance
Hunter Chessbot — Deep Technical Profile
Table of Contents
- Project Overview
- Research & Design Rationale
- Architecture: The Transfer Learning Pipeline
- Board Encoding & Policy Indexing
- The Training Backend
- Model Export & Inference
- Training Configurations & Results
- Technical Tradeoffs & Decisions
- Development Timeline
- Key Files Reference
1. Project Overview
A fine-tuning pipeline for chess neural networks that takes base models (Maia 1900, Maia 2200, Leela 11258 80x7-SE) and fine-tunes them on Hunter's personal Lichess and Chess.com games (~1,800 training / ~200 validation games) to create chess engines that play like Hunter. The "Maia 2200 Hunter" model used in the play-lc0 project was produced here.
By the numbers:
- 19 commits, 2 merged PRs, built in 7 days (Feb 3–10, 2026)
- ~9,700 lines of Python (5,640 in TF backend alone)
- 6 trained models: 2 production (Maia 1900 v1/v2) + 4 experimental (Leela 11258 at 25k/35k/50k, Maia 2200 at 20k)
- TensorFlow 2.x transfer learning with configurable layer freezing
- Exports to LC0
.pb.gzformat and ONNX for browser inference - Built on top of maia-individual training code
Connection to other projects: The .onnx exports from this pipeline are directly used in play-lc0 as the "Maia 2200 Hunter" network — a personalized chess engine trained on Hunter's own games.
2. Research & Design Rationale
Hunter conducted two research conversations before building the pipeline, progressively refining the approach from "fine-tune Leela Zero" to "fine-tune Maia" to the final multi-base-model strategy.
2.1 Why Maia Over Leela Zero
The initial question was whether to fine-tune Leela Zero directly on personal games. The key insight Hunter arrived at through research:
Leela Zero was trained via self-play reinforcement learning to find the strongest moves. Fine-tuning it on human games would fight against its core objective — it wants to play optimally, not humanly. Maia Chess is specifically designed to predict human moves at various skill levels, trained supervised on millions of Lichess games. Its architecture is inherently better suited for "play like a specific person."
Three approaches were evaluated:
- Use Maia directly — quick, human-like at approximate rating, but not personalized
- Fine-tune Maia on personal games — best bet for a "plays like me" bot (chosen approach)
- Train from scratch — requires tens of thousands of games (infeasible with ~1,800 games)
Hunter identified the CSSLab/maia-individual repository as the purpose-built tooling for this use case. The Maia researchers had achieved up to 65% accuracy at predicting specific players' moves, demonstrating that individual playing style is learnable from observing enough games.
2.2 The Leela Fine-Tuning Pivot
After settling on Maia, Hunter explored a second question: what about fine-tuning Leela in addition to Maia, to create a stronger personalized model? This led to the multi-model strategy.
Key tooling discovered: trainingdata-tool — built specifically for converting PGN files into Lc0's training data format. Used to train Leela networks mimicking human styles and to distill alpha-beta engines into Leela networks.
The SL (supervised learning) data quality problem: Most PGN files lack "policy" data (probability distributions over candidate moves). Without policy data, SL-trained nets are ~200 Elo weaker. trainingdata-tool addresses this by generating policy distributions from Stockfish evaluations of alternative moves using softmax — reconstructing what the policy would have been.
Precedents that validated the approach:
- Stein network (AllieStein): trained via SL on existing engine game data
- Lc0 team's fine-tuning for odds games: +14 =2 -2 against a GM in knight odds
- Detlef Kappe's personality nets (BadGyal, GoodGyal, TinyGyal): Trained on Lichess games with configurable
q-ratio— Good Gyal at 0.75 (more Stockfish influence, positional) vs Evil Gyal at 0.25 (more human influence, chaotic). This parameter controls how much the net learns raw move choices versus engine evaluations.
2.3 The Multi-Model Strategy
Hunter's final design used three base models with different fine-tuning strategies, each representing a different point on the personalization-vs-strength tradeoff:
- Maia 1900 (aggressive fine-tuning): All 6 blocks trainable, higher LR, no weight decay → maximum personalization, lower base strength
- Maia 2200 (conservative): Only last 4 blocks trainable, weight decay → better base play while learning personal style
- Leela 11258 80x7 (conservative): Larger/stronger model, only last 4 blocks trainable → strongest base play, hardest to personalize (the model's "default style" resists override)
This was a deliberate experiment: train all three, measure policy accuracy (how often the model predicts Hunter's actual move), and see which base model produces the best personalized bot. Results: Maia 2200 achieved 63.28% accuracy, Leela 11258 achieved 53.13% — confirming that human-style base models are better starting points for human-style fine-tuning.
2.4 Data Strategy
Hunter had ~1,800 training games + ~200 validation games from Lichess and Chess.com (blitz, rapid, classical — no bullet). The research established that 1-2K games is "right in the sweet spot" for fine-tuning: enough to capture personal style without overfitting. Key decisions:
- Down-sample 1/32: Only 1 in every 32 positions used per training pass (~4,500 effective positions per epoch). Prevents overfitting on a small corpus.
- Color separation: Games split into white/black positions, alternated during training for balanced exposure.
- 90/10 train/val split: Standard but important for monitoring overfitting.
3. Architecture: The Transfer Learning Pipeline
Lichess/Chess.com API
│
▼
┌─────────────────┐ ┌─────────────────────┐ ┌──────────────────┐
│ prepare_data.py │────>│ ChunkParser │────>│ TFProcess │
│ PGN → split by │ │ V4 binary records │ │ Transfer learning│
│ color, 90/10 │ │ 1/32 down-sample │ │ w/ stop_gradient │
│ train/val split│ │ shuffle buffer(128) │ │ SGD + Nesterov │
└─────────────────┘ └─────────────────────┘ └────────┬─────────┘
│
TF Checkpoint
│
┌────────────────────────────────────┤
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ export_model.py │ │ export_onnx.py │
│ → .pb.gz (LC0) │ │ → .onnx (browser)│
└──────────────────┘ └──────────────────┘
│ │
▼ ▼
LC0 engine play play-lc0 / play_onnx.py4. Board Encoding & Policy Indexing
4.1 The 112-Plane Input Tensor
The Leela Chess Zero / AlphaZero input format: [1, 112, 8, 8] Float32.
Plane layout:
- Planes 0–103: 13 planes × 8 history positions (12 piece types + 1 repetition flag per position). In this codebase, only the current position is populated.
- Planes 104–107: Castling rights (our queenside, our kingside, their queenside, their kingside) — full 8×8 planes of 1s or 0s
- Plane 108: Side-to-move indicator (1.0 if black)
- Plane 109: Rule50 counter / 99.0 (capped at 1.0)
- Plane 110: Move count (always 0)
- Plane 111: All-ones plane
Critical: perspective flipping for Black. When it's black's turn:
- Board is rotated 180 degrees (both ranks and files reversed)
- Piece colors are swapped (uppercase ↔ lowercase in FEN)
- Castling rights are swapped (our/their perspective)
- The model always sees the position "from the side-to-move's perspective"
This is implemented in fen_to_vec.py (preproc_fen()) and verified across multiple test files (test_no_flip.py, test_output_flip.py, test_onnx.py).
4.2 The 1858-Element Policy Index
The neural network outputs 1858 policy logits — one per possible move pattern in LC0's compressed encoding. policy_index.py contains the canonical list of 1858 UCI move strings (e.g., "a1b1", "e7e8q").
The mapping between the 80×8×8 convolutional policy output (5120 elements) and the 1858-element vector is built by lc0_az_policy_map.py:
- 56 planes for queen-like moves (8 directions × 7 distances)
- 8 planes for knight moves
- 9 planes for underpromotions (3 directions × 3 piece types)
- Total: 73 used + 7 unused = 80 planes
This policy index is the same one used in play-lc0 — Hunter directed using this pre-generated table when the AI's programmatic generation in play-lc0 produced wrong results.
5. The Training Backend
5.1 Network Architecture
SE-ResNet (Squeeze-and-Excitation Residual Network):
- Input: 112 planes of 8×8 → Conv2D to
RESIDUAL_FILTERSchannels - Residual tower:
RESIDUAL_BLOCKSblocks, each: Conv2D → BN → ReLU → Conv2D → BN → SE → skip connection → ReLU - Policy head (two modes):
- CONVOLUTION (Maia-style): Conv2D(policy_channels=32) → Conv2D(80, 3×3) → ApplyPolicyMap → 1858 logits
- CLASSICAL (Leela-style): Conv2D(policy_channels) → Flatten → Dense(1858)
- Value head (always WDL): Conv2D(32) → Flatten → Dense(128, relu) → Dense(3) — win/draw/loss probabilities
5.2 Transfer Learning: The Stop-Gradient Approach
Instead of Keras's standard layer.trainable = False, the pipeline inserts tf.stop_gradient() Lambda layers at strategic points. This is more flexible because it allows the frozen layers to still participate in the forward pass while blocking gradient flow.
The back_prop_blocks config parameter (counted from the output) determines how many sections receive gradients:
back_prop_blocks: 6with 6 residual blocks → all blocks trainable (aggressive personalization)back_prop_blocks: 4with 6 blocks → first 2 blocks frozen (conservative, preserves base knowledge)
The value head is ALWAYS frozen — only the policy head is fine-tuned. This makes sense because the goal is to learn the player's move choices, not their position evaluation.
5.3 Loss Function
total_loss = policy_weight × cross_entropy(policy)
+ value_weight × cross_entropy(WDL)
+ L2_regularization- Policy loss: Softmax cross-entropy between target policy and predicted logits
- Value loss: Softmax cross-entropy between target WDL and predicted WDL
- Regularization: L2 with
weight_decay(0.0001) applied to all trainable layers - Gradient clipping: Global norm clipping at 10,000
5.4 Optimizer & Schedule
SGD with Nesterov momentum (0.9). Piecewise constant learning rate with optional warmup. Float16 mixed precision with loss scale 128.
5.5 Data Pipeline
prepare_data.py: Reads combined PGN, splits by color (white/black), 90/10 train/val splitChunkParser: Multiprocessing workers read V4 binary records from.gzchunks, down-sample 1/32 (only 1 in every 32 positions is used), filter by color, shuffle via 128-element buffertf.data.Dataset.from_generator()wraps the parser with.map(parse_function).prefetch(4)
The V4 binary format (8292 bytes per record) contains: version, 7432 bytes of planes + probabilities, winner, best_q, and auxiliary data.
5.6 Alternate Training Mode
tfprocess_reg_lr_noise.py (not actively used) implements two additional regularization techniques:
- Gaussian noise injection:
N(0, 0.01 × std(weight))added to each weight during initialization — prevents perfect memorization of base model - Per-layer LR scaling: Earlier layers get lower learning rates (conv_block1: 0.05×, res_0: 0.1×, ... res_5: 0.85×, heads: 1.0×) — creates a gradient hierarchy where lower layers change less
6. Model Export & Inference
6.1 Export to LC0 .pb.gz
export_model.py extracts weights from a TF checkpoint and packs them into LC0's protobuf format:
- Conv2D weights: Transposed from TF
[H,W,in,out]to LC0[out,in,H,W] - Dense weights: Transposed from TF
[in,out]to LC0[out,in] - Batch norm: Standard deviation is squared (LC0 stores variance, not stddev)
- Head weight permutation: Policy/value head weights are reordered to match LC0's expected ordering
- Rule50 rescaling: Input weights for plane 109 are divided by 99
LC0 weight saving is explicitly disabled during training to avoid crashes from weight count mismatches when back_prop_blocks < total_blocks. Export is a separate post-training step.
6.2 Export to ONNX
export_onnx.py rebuilds the model architecture from scratch in Keras, restores the checkpoint, then converts via tf2onnx. This produces the .onnx files used by play-lc0 in the browser.
6.3 Interactive CLI Play
play_onnx.py uses the lczerolens library for ONNX inference. Simple game loop: print board → accept SAN/UCI input → run inference → argmax over legal policy indices → print move. The lczerolens library handles all encoding/decoding internally.
7. Training Configurations & Results
7.1 Configuration Comparison
| Parameter | Maia 1900 (v1/v2) | Maia 2200 | Leela 11258 |
|---|---|---|---|
| Architecture | 64 filters, 6 blocks, SE(8) | 64 filters, 6 blocks, SE(8) | 80 filters, 7 blocks, SE(4) |
| Batch size | 64 | 128 | 128 |
| Total steps | 100,000 | 50,000 | 50,000 |
| Back-prop blocks | 6 (all trainable) | 4 (freeze first 2) | 4 (freeze first 3) |
| Starting LR | 0.002 | 0.001 | 0.001 |
| LR schedule | 0.002→0.0005→0.0001→0.00002 | 0.001→0.0005→0.0001→0.00002 | same as Maia 2200 |
| Weight decay | none (default 0.0001) | 0.0001 | 0.0001 |
| Precision | float16 | float16 | float16 |
Key design insight: The Maia 1900 config is the most aggressive — all blocks trainable, higher LR, no explicit weight decay — maximizing personalization at the cost of base model knowledge. The larger models use more conservative freezing + weight decay to preserve engine strength while adding personal style.
7.2 Trained Models
| Model | Base | Steps | Policy Accuracy | Size | Notes |
|---|---|---|---|---|---|
| Hunter Maia v1 | Maia 1900 | ~50k | Not recorded | 1.7 MB | First fine-tune |
| Hunter Maia v2 | Maia 1900 | ~100k | Not recorded | 1.7 MB | Production model |
| Leela 11258-25k | Leela 11258 | 25k | — | 8.1 MB | Experimental |
| Leela 11258-35k | Leela 11258 | 35k | 53.13% (best) | 8.1 MB | Experimental |
| Leela 11258-50k | Leela 11258 | 50k | — | 8.1 MB | Experimental |
| Maia 2200-20k | Maia 2200 | 20k | 63.28% (best at 18.5k) | 1.2 MB | Experimental |
What "policy accuracy" means: The percentage of positions where the model's top-1 policy prediction matches the move Hunter actually played. Maia 2200 achieving 63.28% means it correctly predicts Hunter's move nearly 2/3 of the time — a strong result for a 64x6 network on ~1,800 games.
8. Technical Tradeoffs & Decisions
8.1 Stop-Gradient vs layer.trainable
Using tf.stop_gradient() Lambda layers instead of layer.trainable = False. The stop-gradient approach is more flexible: frozen layers still participate in forward pass and batch norm statistics computation, but receive no gradient updates. This preserves the base model's learned representations more faithfully during fine-tuning.
8.2 Down-Sampling 1/32
Only 1 in every 32 training positions is used (SKIP = 32). With ~1,800 games averaging ~80 positions each = ~144,000 total positions → ~4,500 effective training positions per epoch. This prevents overfitting on a small personal game corpus while still extracting the player's style signal.
8.3 Color-Separated Training Data
Games are split into white and black positions. The ChunkParser alternates between white and black data sources (FileDataSrc), ensuring balanced color representation in each batch. This matters because the board is always presented from the side-to-move perspective, so the model needs equal exposure to both perspectives.
8.4 Value Head Always Frozen
The value head (position evaluation) is never fine-tuned — only the policy head (move prediction). This is deliberate: the goal is to learn which moves the player chooses, not to change the model's assessment of how good positions are. The base model's position evaluation is already strong; personalizing it on ~1,800 games would likely degrade it.
8.5 Web UI Built Then Removed
A React + ONNX Runtime WASM web UI was built (PR #1, 213K lines added), encoding bugs were debugged (PR #2), then the entire web UI was deleted (Feb 6-7). The play-lc0 project superseded it as the browser-based chess interface — and inherited the lessons learned here about encoding, policy indexing, and ONNX conversion.
8.6 Multiple Base Models, Different Strategies
Three base models were tried with different fine-tuning strategies:
- Maia 1900 (aggressive): All blocks trainable, small architecture → maximum personalization, used as production model
- Maia 2200 (conservative): Only last 4 blocks trainable → better base play, decent accuracy (63.28%)
- Leela 11258 (conservative): Larger model, only last 4 blocks → strongest base play, lower personal accuracy (53.13%) because the model's "default style" is harder to override
9. Development Timeline
| Date | Commits | Key Achievement |
|---|---|---|
| Feb 3 | 2 | Initial commit + full training infrastructure + Hunter Maia v1 |
| Feb 3-4 | 2 PRs | Web UI (PR #1), encoding bug fixes (PR #2) |
| Feb 5 | 1 | play_onnx.py CLI play script |
| Feb 5-6 | 5 | Weight decay configs, TF2 fixes, lc0 weight loading, multi-model support |
| Feb 6 | 2 | Leela 11258 + Maia 2200 training runs + checkpoints + exports |
| Feb 6-7 | 2 | Deleted web UI, cleaned up WASM artifacts |
| Feb 10 | 1 | Final README revision |
Total: 19 commits, 7 days, from nothing to 6 trained models + full export pipeline.
10. Key Files Reference
Training
| File | Lines | Purpose |
|---|---|---|
src/backend/tf_transfer/tfprocess.py | ~1200 | Main TF training loop: model construction, weight loading, loss, optimizer |
scripts/train_transfer.py | ~100 | Training orchestrator: config loading, data pipeline, loop execution |
src/backend/tf_transfer/chunkparser.py | ~300 | Multiprocessing V4 binary record parser with shuffle buffer |
src/backend/tf_transfer/training_shared.py | ~100 | Chunk discovery, white/black data source alternation |
Encoding
| File | Lines | Purpose |
|---|---|---|
src/backend/fen_to_vec.py | ~200 | FEN → 17/112-plane tensor with black-side flip |
src/backend/tf_transfer/policy_index.py | ~1900 | 1858 UCI move strings (canonical LC0 ordering) |
src/backend/tf_transfer/lc0_az_policy_map.py | ~200 | 80×8×8 → 1858 mapping matrix |
Export
| File | Lines | Purpose |
|---|---|---|
scripts/export_model.py | ~300 | TF checkpoint → LC0 .pb.gz (with weight transposition) |
scripts/export_onnx.py | ~500 | TF checkpoint → ONNX (rebuilds model, uses tf2onnx) |
Data
| File | Lines | Purpose |
|---|---|---|
scripts/prepare_data.py | ~80 | PGN split by color, 90/10 train/val |
src/backend/pgn_to_csv.py | ~400 | PGN → per-move CSV with eval/clock/material features |
Configs
| File | Base Model | Blocks Trainable | Total Steps |
|---|---|---|---|
configs/training_config.yaml | Maia 1900 | All 6 | 100k |
configs/training_maia_2200.yaml | Maia 2200 | Last 4 | 50k |
configs/training_leela_11258.yaml | Leela 11258 | Last 4 | 50k |