Hunter Chessbot — Deep Technical Profile

Build timeline — ~7 active days across 3 phases (Feb 3 – Feb 10, 2026, 8 calendar days)
Initial Maia fine-tune + web UI (Feb 3, 1 day) — fine-tuned Maia 1900, ONNX export tooling, web UI, encoding fixes
v2 iteration + UX polish (Feb 3–4, 2 days) — ONNX play script, v2 model, promotion dialog, color switching, web fixes
TF2 training pipeline + Maia 2200 (Feb 5–10, 5 days) — fixed TF2 weight handling, lc0 direct weight loading, disabled lc0 saves during training, weight-decay config fix, checkpoint export, Maia 2200 best-accuracy checkpoint (the model used in play-lc0)

Project Overview

Libraries & Frameworks

Research & Design Rationale
Architecture: The Transfer Learning Pipeline
Board Encoding & Policy Indexing
The Training Backend
Model Export & Inference
Training Configurations & Results
Technical Tradeoffs & Decisions
Development Timeline
Key Files Reference

1. Project Overview

A fine-tuning pipeline for chess neural networks that takes base models (Maia 1900, Maia 2200, Leela 11258 80x7-SE) and fine-tunes them on my personal Lichess and Chess.com games (~1,800 training / ~200 validation games) to create chess engines that play in my style. The "Maia 2200 Hunter" model used in the play-lc0 project was produced here.

By the numbers:

19 commits, 2 merged PRs, built in 7 days (Feb 3–10, 2026)
~9,700 lines of Python (5,640 in TF backend alone)
6 trained models: 2 production (Maia 1900 v1/v2) + 4 experimental (Leela 11258 at 25k/35k/50k, Maia 2200 at 20k)
TensorFlow 2.x transfer learning with configurable layer freezing
Exports to LC0 .pb.gz format and ONNX for browser inference
Built on top of maia-individual training code

Connection to other projects: The .onnx exports from this pipeline are directly used in play-lc0 as the "Maia 2200 Hunter" network — a personalized chess engine trained on my own games.

Libraries & Frameworks

Training backend (`src/backend/tf_transfer/`, Python)

TensorFlow — model construction, transfer-learning loss, optimizer, checkpointing in tfprocess.py.
TensorBoard — scalar/histogram logging during training runs.
NumPy — board-to-tensor encoding, policy/value array ops, weight inspection.
PyYAML — reads training configs (configs/training_*.yaml) that select the base model and which layers to unfreeze.
h5py — HDF5 weight serialization (secondary path; primary is protobuf).
SciPy, pandas, tqdm, natsort — misc utilities (data processing, progress bars, checkpoint sorting).

Chess & model formats

python-chess 1.999 — legality checking, board state, game replay for both test_maia.py and data-prep scripts.
lczerolens — LC0 wrapper exposing LczeroModel / LczeroBoard; used in play_onnx.py for board-to-tensor encoding and move decoding.
lc0 (binary) — actual Leela engine used over UCI in test_maia.py.
Protocol Buffers (backend.proto.net_pb2, from src/backend/proto/net.proto) — LC0 network/weight serialization, including SE-residual block definitions.
ONNX Runtime (onnxruntime) — runs exported ONNX models for inference tests in test_onnx.py.

Environment

conda — the env manifest pins CUDA 11.8 and lists PyTorch alongside TensorFlow (PyTorch is listed but the training path is TF-only).

2. Research & Design Rationale

I conducted two research conversations before building the pipeline, progressively refining the approach from "fine-tune Leela Zero" to "fine-tune Maia" to the final multi-base-model strategy.

2.1 Why Maia Over Leela Zero

The initial question was whether to fine-tune Leela Zero directly on personal games. The key insight I arrived at through research:

Leela Zero was trained via self-play reinforcement learning to find the strongest moves. Fine-tuning it on human games would fight against its core objective — it wants to play optimally, not humanly. Maia Chess is specifically designed to predict human moves at various skill levels, trained supervised on millions of Lichess games. Its architecture is inherently better suited for "play like a specific person."

Three approaches were evaluated:

Use Maia directly — quick, human-like at approximate rating, but not personalized
Fine-tune Maia on personal games — best bet for a "plays like me" bot (chosen approach)
Train from scratch — requires tens of thousands of games (infeasible with ~1,800 games)

I identified the CSSLab/maia-individual repository as the purpose-built tooling for this use case. The Maia researchers had achieved up to 65% accuracy at predicting specific players' moves, demonstrating that individual playing style is learnable from observing enough games.

2.2 The Leela Fine-Tuning Pivot

After settling on Maia, I explored a second question: what about fine-tuning Leela in addition to Maia, to create a stronger personalized model? This led to the multi-model strategy.

Key tooling discovered: trainingdata-tool — built specifically for converting PGN files into Lc0's training data format. Used to train Leela networks mimicking human styles and to distill alpha-beta engines into Leela networks.

The SL (supervised learning) data quality problem: Most PGN files lack "policy" data (probability distributions over candidate moves). Without policy data, SL-trained nets are ~200 Elo weaker. trainingdata-tool addresses this by generating policy distributions from Stockfish evaluations of alternative moves using softmax — reconstructing what the policy would have been.

Precedents that validated the approach:

Stein network (AllieStein): trained via SL on existing engine game data
Lc0 team's fine-tuning for odds games: +14 =2 -2 against a GM in knight odds
Detlef Kappe's personality nets (BadGyal, GoodGyal, TinyGyal): Trained on Lichess games with configurable q-ratio — Good Gyal at 0.75 (more Stockfish influence, positional) vs Evil Gyal at 0.25 (more human influence, chaotic). This parameter controls how much the net learns raw move choices versus engine evaluations.

2.3 The Multi-Model Strategy

my final design used three base models with different fine-tuning strategies, each representing a different point on the personalization-vs-strength tradeoff:

Maia 1900 (aggressive fine-tuning): All 6 blocks trainable, higher LR, no weight decay → maximum personalization, lower base strength
Maia 2200 (conservative): Only last 4 blocks trainable, weight decay → better base play while learning personal style
Leela 11258 80x7 (conservative): Larger/stronger model, only last 4 blocks trainable → strongest base play, hardest to personalize (the model's "default style" resists override)

This was a deliberate experiment: train all three, measure policy accuracy (how often the model predicts my actual move), and see which base model produces the best personalized bot. Results: Maia 2200 achieved 63.28% accuracy, Leela 11258 achieved 53.13% — confirming that human-style base models are better starting points for human-style fine-tuning.

2.4 Data Strategy

I had ~1,800 training games + ~200 validation games from Lichess and Chess.com (blitz, rapid, classical — no bullet). The research established that 1-2K games is "right in the sweet spot" for fine-tuning: enough to capture personal style without overfitting. Key decisions:

Down-sample 1/32: Only 1 in every 32 positions used per training pass (~4,500 effective positions per epoch). Prevents overfitting on a small corpus.
Color separation: Games split into white/black positions, alternated during training for balanced exposure.
90/10 train/val split: Standard but important for monitoring overfitting.

3. Architecture: The Transfer Learning Pipeline

Lichess/Chess.com API
        │
        ▼
┌─────────────────┐     ┌─────────────────────┐     ┌──────────────────┐
│  prepare_data.py │────>│  ChunkParser         │────>│  TFProcess        │
│  PGN → split by │     │  V4 binary records   │     │  Transfer learning│
│  color, 90/10   │     │  1/32 down-sample    │     │  w/ stop_gradient │
│  train/val split│     │  shuffle buffer(128) │     │  SGD + Nesterov   │
└─────────────────┘     └─────────────────────┘     └────────┬─────────┘
                                                              │
                                                    TF Checkpoint
                                                              │
                         ┌────────────────────────────────────┤
                         │                                    │
                         ▼                                    ▼
               ┌──────────────────┐              ┌──────────────────┐
               │  export_model.py  │              │  export_onnx.py   │
               │  → .pb.gz (LC0)   │              │  → .onnx (browser)│
               └──────────────────┘              └──────────────────┘
                         │                                    │
                         ▼                                    ▼
                  LC0 engine play                    play-lc0 / play_onnx.py

4. Board Encoding & Policy Indexing

4.1 The 112-Plane Input Tensor

The Leela Chess Zero / AlphaZero input format: [1, 112, 8, 8] Float32.

Plane layout:

Planes 0–103: 13 planes × 8 history positions (12 piece types + 1 repetition flag per position). In this codebase, only the current position is populated.
Planes 104–107: Castling rights (our queenside, our kingside, their queenside, their kingside) — full 8×8 planes of 1s or 0s
Plane 108: Side-to-move indicator (1.0 if black)
Plane 109: Rule50 counter / 99.0 (capped at 1.0)
Plane 110: Move count (always 0)
Plane 111: All-ones plane

Critical: perspective flipping for Black. When it's black's turn:

Board is rotated 180 degrees (both ranks and files reversed)
Piece colors are swapped (uppercase ↔ lowercase in FEN)
Castling rights are swapped (our/their perspective)
The model always sees the position "from the side-to-move's perspective"

This is implemented in fen_to_vec.py (preproc_fen()) and verified across multiple test files (test_no_flip.py, test_output_flip.py, test_onnx.py).

4.2 The 1858-Element Policy Index

The neural network outputs 1858 policy logits — one per possible move pattern in LC0's compressed encoding. policy_index.py contains the canonical list of 1858 UCI move strings (e.g., "a1b1", "e7e8q").

The mapping between the 80×8×8 convolutional policy output (5120 elements) and the 1858-element vector is built by lc0_az_policy_map.py:

56 planes for queen-like moves (8 directions × 7 distances)
8 planes for knight moves
9 planes for underpromotions (3 directions × 3 piece types)
Total: 73 used + 7 unused = 80 planes

This policy index is the same one used in play-lc0 — I directed using this pre-generated table when the AI's programmatic generation in play-lc0 produced wrong results.

5. The Training Backend

5.1 Network Architecture

SE-ResNet (Squeeze-and-Excitation Residual Network):

Input: 112 planes of 8×8 → Conv2D to RESIDUAL_FILTERS channels
Residual tower: RESIDUAL_BLOCKS blocks, each: Conv2D → BN → ReLU → Conv2D → BN → SE → skip connection → ReLU
Policy head (two modes):
- CONVOLUTION (Maia-style): Conv2D(policy_channels=32) → Conv2D(80, 3×3) → ApplyPolicyMap → 1858 logits
- CLASSICAL (Leela-style): Conv2D(policy_channels) → Flatten → Dense(1858)
Value head (always WDL): Conv2D(32) → Flatten → Dense(128, relu) → Dense(3) — win/draw/loss probabilities

5.2 Transfer Learning: The Stop-Gradient Approach

Instead of Keras's standard layer.trainable = False, the pipeline inserts tf.stop_gradient() Lambda layers at strategic points. This is more flexible because it allows the frozen layers to still participate in the forward pass while blocking gradient flow.

The back_prop_blocks config parameter (counted from the output) determines how many sections receive gradients:

back_prop_blocks: 6 with 6 residual blocks → all blocks trainable (aggressive personalization)
back_prop_blocks: 4 with 6 blocks → first 2 blocks frozen (conservative, preserves base knowledge)

The value head is ALWAYS frozen — only the policy head is fine-tuned. This makes sense because the goal is to learn the player's move choices, not their position evaluation.

5.3 Loss Function

total_loss = policy_weight × cross_entropy(policy) 
           + value_weight × cross_entropy(WDL)
           + L2_regularization

Policy loss: Softmax cross-entropy between target policy and predicted logits
Value loss: Softmax cross-entropy between target WDL and predicted WDL
Regularization: L2 with weight_decay (0.0001) applied to all trainable layers
Gradient clipping: Global norm clipping at 10,000

5.4 Optimizer & Schedule

SGD with Nesterov momentum (0.9). Piecewise constant learning rate with optional warmup. Float16 mixed precision with loss scale 128.

5.5 Data Pipeline

prepare_data.py: Reads combined PGN, splits by color (white/black), 90/10 train/val split
ChunkParser: Multiprocessing workers read V4 binary records from .gz chunks, down-sample 1/32 (only 1 in every 32 positions is used), filter by color, shuffle via 128-element buffer
tf.data.Dataset.from_generator() wraps the parser with .map(parse_function).prefetch(4)

The V4 binary format (8292 bytes per record) contains: version, 7432 bytes of planes + probabilities, winner, best_q, and auxiliary data.

5.6 Alternate Training Mode

tfprocess_reg_lr_noise.py (not actively used) implements two additional regularization techniques:

Gaussian noise injection: N(0, 0.01 × std(weight)) added to each weight during initialization — prevents perfect memorization of base model
Per-layer LR scaling: Earlier layers get lower learning rates (conv_block1: 0.05×, res_0: 0.1×, ... res_5: 0.85×, heads: 1.0×) — creates a gradient hierarchy where lower layers change less

6. Model Export & Inference

6.1 Export to LC0 `.pb.gz`

export_model.py extracts weights from a TF checkpoint and packs them into LC0's protobuf format:

Conv2D weights: Transposed from TF [H,W,in,out] to LC0 [out,in,H,W]
Dense weights: Transposed from TF [in,out] to LC0 [out,in]
Batch norm: Standard deviation is squared (LC0 stores variance, not stddev)
Head weight permutation: Policy/value head weights are reordered to match LC0's expected ordering
Rule50 rescaling: Input weights for plane 109 are divided by 99

LC0 weight saving is explicitly disabled during training to avoid crashes from weight count mismatches when back_prop_blocks < total_blocks. Export is a separate post-training step.

6.2 Export to ONNX

export_onnx.py rebuilds the model architecture from scratch in Keras, restores the checkpoint, then converts via tf2onnx. This produces the .onnx files used by play-lc0 in the browser.

6.3 Interactive CLI Play

play_onnx.py uses the lczerolens library for ONNX inference. Simple game loop: print board → accept SAN/UCI input → run inference → argmax over legal policy indices → print move. The lczerolens library handles all encoding/decoding internally.

7. Training Configurations & Results

7.1 Configuration Comparison

Parameter	Maia 1900 (v1/v2)	Maia 2200	Leela 11258
Architecture	64 filters, 6 blocks, SE(8)	64 filters, 6 blocks, SE(8)	80 filters, 7 blocks, SE(4)
Batch size	64	128	128
Total steps	100,000	50,000	50,000
Back-prop blocks	6 (all trainable)	4 (freeze first 2)	4 (freeze first 3)
Starting LR	0.002	0.001	0.001
LR schedule	0.002→0.0005→0.0001→0.00002	0.001→0.0005→0.0001→0.00002	same as Maia 2200
Weight decay	none (default 0.0001)	0.0001	0.0001
Precision	float16	float16	float16

Key design insight: The Maia 1900 config is the most aggressive — all blocks trainable, higher LR, no explicit weight decay — maximizing personalization at the cost of base model knowledge. The larger models use more conservative freezing + weight decay to preserve engine strength while adding personal style.

7.2 Trained Models

Model	Base	Steps	Policy Accuracy	Size	Notes
I Maia v1	Maia 1900	~50k	Not recorded	1.7 MB	First fine-tune
I Maia v2	Maia 1900	~100k	Not recorded	1.7 MB	Production model
Leela 11258-25k	Leela 11258	25k	—	8.1 MB	Experimental
Leela 11258-35k	Leela 11258	35k	53.13% (best)	8.1 MB	Experimental
Leela 11258-50k	Leela 11258	50k	—	8.1 MB	Experimental
Maia 2200-20k	Maia 2200	20k	63.28% (best at 18.5k)	1.2 MB	Experimental

What "policy accuracy" means: The percentage of positions where the model's top-1 policy prediction matches the move I actually played. Maia 2200 achieving 63.28% means it correctly predicts my move nearly 2/3 of the time — a strong result for a 64x6 network on ~1,800 games.

8. Technical Tradeoffs & Decisions

8.1 Stop-Gradient vs layer.trainable

Using tf.stop_gradient() Lambda layers instead of layer.trainable = False. The stop-gradient approach is more flexible: frozen layers still participate in forward pass and batch norm statistics computation, but receive no gradient updates. This preserves the base model's learned representations more faithfully during fine-tuning.

8.2 Down-Sampling 1/32

Only 1 in every 32 training positions is used (SKIP = 32). With ~1,800 games averaging ~80 positions each = ~144,000 total positions → ~4,500 effective training positions per epoch. This prevents overfitting on a small personal game corpus while still extracting the player's style signal.

8.3 Color-Separated Training Data

Games are split into white and black positions. The ChunkParser alternates between white and black data sources (FileDataSrc), ensuring balanced color representation in each batch. This matters because the board is always presented from the side-to-move perspective, so the model needs equal exposure to both perspectives.

8.4 Value Head Always Frozen

The value head (position evaluation) is never fine-tuned — only the policy head (move prediction). This is deliberate: the goal is to learn which moves the player chooses, not to change the model's assessment of how good positions are. The base model's position evaluation is already strong; personalizing it on ~1,800 games would likely degrade it.

8.5 Web UI Built Then Removed

A React + ONNX Runtime WASM web UI was built (PR #1, 213K lines added), encoding bugs were debugged (PR #2), then the entire web UI was deleted (Feb 6-7). The play-lc0 project superseded it as the browser-based chess interface — and inherited the lessons learned here about encoding, policy indexing, and ONNX conversion.

8.6 Multiple Base Models, Different Strategies

Three base models were tried with different fine-tuning strategies:

Maia 1900 (aggressive): All blocks trainable, small architecture → maximum personalization, used as production model
Maia 2200 (conservative): Only last 4 blocks trainable → better base play, decent accuracy (63.28%)
Leela 11258 (conservative): Larger model, only last 4 blocks → strongest base play, lower personal accuracy (53.13%) because the model's "default style" is harder to override

8.7 Additional architectural tradeoffs worth naming

Supervised imitation, not DPO / RL preference learning — target is cross-entropy on the played move. DPO between the played move and legal alternatives would model my preference over the ~35% of positions where top-1 accuracy plateaus — exactly where imitation learning fails. SL was the path of least resistance since Maia-individual is already pure CE and ~1.8k games is too little to bootstrap a reward signal. Inferred.
TensorFlow 2, not PyTorch — inherited from the Maia-individual scaffolding (~5,600 LOC of TF). PyTorch has cleaner ONNX export and is the dominant research framework in 2026; rewriting would have cost more than the 7-day project. The cost is visible in the commit log — multiple days on TF2 compatibility bugs ("fixed TF2 weight handling," "lc0 direct weight loading," "disabled lc0 saves during training"). Evidenced in timeline.
Value head frozen; not fine-tuned on my WDL outcomes — with 1.8k blitz games the outcome signal is extremely noisy (binary W/D/L, high variance). Freezing protects lc0's eval strength. But value loss is still computed in training for shared-trunk gradient balance, not because V is updating — an interviewer should probe: "then why compute it at all?" Inferred.
Top-1 policy accuracy as the evaluation metric, not head-to-head Elo — cheap to compute, matches Maia's paper metric, but has a classic imitation-learning pitfall: penalizes the model for choosing a move I would also like. No Elo ladder, no move-agreement rate on held-out games, no KL to my empirical distribution. Inferred.
No KD from base-Maia's full policy distribution; flat one-move target — q-ratio blending (exactly what Kappe's BadGyal/GoodGyal did, referenced in §2.2) would mix my played-move target with base-Maia's policy to preserve strength while shifting style. This was considered and deferred; implementing KD needed caching base-model logits. Inferred.
Inherited SKIP=32 down-sample, not re-examined for small-data regime — gives ~4.5k effective positions/epoch, 50k–100k steps. The factor comes from Maia-individual's billion-game pipeline where it's a scale necessity — an unexamined inherited default for a 1.8k-game dataset where the right answer is probably "use all positions, fewer epochs, rely on weight decay." Inferred.

8.8 Strategic / "why do this at all" tradeoffs

"Play as ME" as product thesis vs. Maia 1900 as-is — Maia 1900 already plays indistinguishably from an average 1900 human. Shipping it unmodified would have been zero effort. The fine-tune bets that the interesting product is identity-mimicry, not rating-mimicry — a bot friends recognize as me by opening repertoire, time-trouble blunders, and pet tactics. That reframes success from "human-like" (already solved) to "distinctively mine in ways distinguishable from a generic 1900," which no off-the-shelf model offers and which top-1 accuracy only weakly captures. Inferred.
Three base models trained in parallel as a de-risking hedge, not an ablation study — with a 7-day budget and unknown answers to "does 1.8k games move the needle?" and "does a stronger base resist personalization?", training Maia 1900 / 2200 / Leela in parallel was a portfolio bet: at least one would produce a shippable artifact. Cost: none got the care a single-model study would — no seeds, no held-out Elo, inconsistent step counts — so the 63% vs 53% gap is suggestive not conclusive. Right call under time pressure, not defensible as research. Inferred.
Fork Maia-individual vs. clean PyTorch rewrite — a clean rewrite would have been ~2 weeks and produced better ONNX ergonomics in the framework I actually use. Forking bought a working V4 chunk parser, SE-ResNet definition, lc0 weight format, and policy-map matrix on day one — at the cost of 3+ days debugging inherited TF2 bugs (visible in commit log) and carrying unexamined defaults. Correct for a 7-day project; the tax is the codebase isn't reusable for the next idea. Evidenced in TF2 bug commits.
Standalone training project vs. folding into play-lc0 — deliberate separation of concerns: training needs CUDA + TF2 + Python + chunk files; play-lc0 needs a browser and an .onnx. Co-locating would couple a weekly-iteration research repo to a user-facing app's release cadence. The 3.3 MB .onnx is the clean contract — play-lc0 can add any lc0-compatible net later without touching training code. Inferred.
Personalized-bot generator vs. one-off personal model — nothing in the repo hardcodes me: prepare_data.py takes any PGN, configs are YAML, export produces a standard .onnx. The real deliverable is a pipeline that turns anyone's Lichess archive into a personal bot — a product ("Maia-as-a-Service for your Lichess account") not a trophy model. The interview story shifts from "I trained a model" to "I built the factory." Inferred.
Surfacing q-ratio research as the honest deferral — Kappe's BadGyal/GoodGyal precedent (mixed played-move + base-Maia policy to preserve strength while shifting style) is exactly the right prior art for the personalization-vs-strength dial. I found it, evaluated it, and made a conscious deferral for scope. "I know what I didn't build, and why" is a stronger signal than shipping a half-baked KD implementation. Evidenced in §2.2.

8.9 Code-level tradeoffs visible in the source

Config-per-experiment with near-duplication, not hierarchical inheritance — configs/ has four standalone YAMLs. training_config_wd.yaml is byte-for-byte training_config.yaml plus one line (weight_decay: 0.0001). Architecture constants (filters/blocks/se_ratio) are duplicated into every config even though they're dictated by the base .pb.gz. Hydra or OmegaConf would enable defaults: inheritance. train_transfer.py derives collection_name from the config's directory and name from the filename — a flat file-per-run layout is load-bearing for output naming. For a 4-config repo, inheritance adds ceremony for zero savings. Evidenced.
prepare_data.py stops at PGN split; chunk generation is delegated to upstream Maia tooling — the script only splits combined PGN by player color and does a 90/10 slice. Fusing PGN → v4 chunk generation into one script would produce data/processed/<name>/{train,validate} end-to-end. Keeping the boundary at PGN means users reuse the Maia-style chunk pipeline unmodified, and avoids reimplementing the V4_STRUCT_STRING = '4s7432s832sBBBBBBBbffff' format that TF can't efficiently unpack in-graph anyway. Evidenced in chunkparser.py comment.
Shell script for GPU run serialization, not a queue system — scripts/queue_wd_training.sh polls pgrep -f train_transfer.py every 60s, renames runs/hunter → runs/hunter-v2-no-wd between runs, then launches the next one. A Python orchestrator or just cmd1 && cmd2 would also work. The rename has to land between runs because runs/hunter is the default TensorBoard output path; && skips on failure but doesn't handle the rename. 20 lines of bash beats any proper scheduler for single-GPU workstation serial runs. WSL hardcode (/mnt/c/Users/chenh/...) is a smell consistent with a solo-dev tool. Evidenced.
V4 training chunk format inherited verbatim from Leela/Maia, not Parquet/Arrow/TFRecord — chunkparser.py retains the GPLv3 header from Leela Chess and supports V3/V4 raw-byte packed chunks (7432-byte state + 832-byte policy). Modern columnar format would be TF-native and avoid the "TensorFlow doesn't have a fast way to unpack bit vectors" comment that drives the current unpack-in-Python-workers approach. Staying on V4 means binary compatibility with upstream Maia checkpoints and lc0 — forking the format would break the whole point of the project (producing lc0-compatible personalized nets). Evidenced.

9. Development Timeline

Date	Commits	Key Achievement
Feb 3	2	Initial commit + full training infrastructure + I Maia v1
Feb 3-4	2 PRs	Web UI (PR #1), encoding bug fixes (PR #2)
Feb 5	1	`play_onnx.py` CLI play script
Feb 5-6	5	Weight decay configs, TF2 fixes, lc0 weight loading, multi-model support
Feb 6	2	Leela 11258 + Maia 2200 training runs + checkpoints + exports
Feb 6-7	2	Deleted web UI, cleaned up WASM artifacts
Feb 10	1	Final README revision

Total: 19 commits, 7 days, from nothing to 6 trained models + full export pipeline.

10. Key Files Reference

Training

File	Lines	Purpose
`src/backend/tf_transfer/tfprocess.py`	~1200	Main TF training loop: model construction, weight loading, loss, optimizer
`scripts/train_transfer.py`	~100	Training orchestrator: config loading, data pipeline, loop execution
`src/backend/tf_transfer/chunkparser.py`	~300	Multiprocessing V4 binary record parser with shuffle buffer
`src/backend/tf_transfer/training_shared.py`	~100	Chunk discovery, white/black data source alternation

Encoding

File	Lines	Purpose
`src/backend/fen_to_vec.py`	~200	FEN → 17/112-plane tensor with black-side flip
`src/backend/tf_transfer/policy_index.py`	~1900	1858 UCI move strings (canonical LC0 ordering)
`src/backend/tf_transfer/lc0_az_policy_map.py`	~200	80×8×8 → 1858 mapping matrix

Export

File	Lines	Purpose
`scripts/export_model.py`	~300	TF checkpoint → LC0 `.pb.gz` (with weight transposition)
`scripts/export_onnx.py`	~500	TF checkpoint → ONNX (rebuilds model, uses tf2onnx)

Data

File	Lines	Purpose
`scripts/prepare_data.py`	~80	PGN split by color, 90/10 train/val
`src/backend/pgn_to_csv.py`	~400	PGN → per-move CSV with eval/clock/material features

Configs

File	Base Model	Blocks Trainable	Total Steps
`configs/training_config.yaml`	Maia 1900	All 6	100k
`configs/training_maia_2200.yaml`	Maia 2200	Last 4	50k
`configs/training_leela_11258.yaml`	Leela 11258	Last 4	50k

Thorin

Hunter Chessbot — Deep Technical Profile ​

Table of Contents ​

1. Project Overview ​

Libraries & Frameworks ​

Training backend (src/backend/tf_transfer/, Python) ​

Chess & model formats ​

Environment ​

2. Research & Design Rationale ​

2.1 Why Maia Over Leela Zero ​

2.2 The Leela Fine-Tuning Pivot ​

2.3 The Multi-Model Strategy ​

2.4 Data Strategy ​

3. Architecture: The Transfer Learning Pipeline ​

4. Board Encoding & Policy Indexing ​

4.1 The 112-Plane Input Tensor ​

4.2 The 1858-Element Policy Index ​

5. The Training Backend ​

5.1 Network Architecture ​

5.2 Transfer Learning: The Stop-Gradient Approach ​

5.3 Loss Function ​

5.4 Optimizer & Schedule ​

5.5 Data Pipeline ​

5.6 Alternate Training Mode ​

6. Model Export & Inference ​

6.1 Export to LC0 .pb.gz ​

6.2 Export to ONNX ​

6.3 Interactive CLI Play ​

7. Training Configurations & Results ​

7.1 Configuration Comparison ​

7.2 Trained Models ​

8. Technical Tradeoffs & Decisions ​

8.1 Stop-Gradient vs layer.trainable ​

8.2 Down-Sampling 1/32 ​

8.3 Color-Separated Training Data ​

8.4 Value Head Always Frozen ​

8.5 Web UI Built Then Removed ​

8.6 Multiple Base Models, Different Strategies ​

8.7 Additional architectural tradeoffs worth naming ​

8.8 Strategic / "why do this at all" tradeoffs ​

8.9 Code-level tradeoffs visible in the source ​

9. Development Timeline ​

10. Key Files Reference ​

Training ​

Encoding ​

Export ​

Data ​

Configs ​