Appearance
Hunter Chessbot — Deep Technical Profile
Build timeline — ~7 active days across 3 phases (Feb 3 – Feb 10, 2026, 8 calendar days)
- Initial Maia fine-tune + web UI (Feb 3, 1 day) — fine-tuned Maia 1900, ONNX export tooling, web UI, encoding fixes
- v2 iteration + UX polish (Feb 3–4, 2 days) — ONNX play script, v2 model, promotion dialog, color switching, web fixes
- TF2 training pipeline + Maia 2200 (Feb 5–10, 5 days) — fixed TF2 weight handling, lc0 direct weight loading, disabled lc0 saves during training, weight-decay config fix, checkpoint export, Maia 2200 best-accuracy checkpoint (the model used in play-lc0)
Table of Contents
- Research & Design Rationale
- Architecture: The Transfer Learning Pipeline
- Board Encoding & Policy Indexing
- The Training Backend
- Model Export & Inference
- Training Configurations & Results
- Technical Tradeoffs & Decisions
- Development Timeline
- Key Files Reference
1. Project Overview
A fine-tuning pipeline for chess neural networks that takes base models (Maia 1900, Maia 2200, Leela 11258 80x7-SE) and fine-tunes them on my personal Lichess and Chess.com games (~1,800 training / ~200 validation games) to create chess engines that play in my style. The "Maia 2200 Hunter" model used in the play-lc0 project was produced here.
By the numbers:
- 19 commits, 2 merged PRs, built in 7 days (Feb 3–10, 2026)
- ~9,700 lines of Python (5,640 in TF backend alone)
- 6 trained models: 2 production (Maia 1900 v1/v2) + 4 experimental (Leela 11258 at 25k/35k/50k, Maia 2200 at 20k)
- TensorFlow 2.x transfer learning with configurable layer freezing
- Exports to LC0
.pb.gzformat and ONNX for browser inference - Built on top of maia-individual training code
Connection to other projects: The .onnx exports from this pipeline are directly used in play-lc0 as the "Maia 2200 Hunter" network — a personalized chess engine trained on my own games.
Libraries & Frameworks
Training backend (src/backend/tf_transfer/, Python)
- TensorFlow — model construction, transfer-learning loss, optimizer, checkpointing in
tfprocess.py. - TensorBoard — scalar/histogram logging during training runs.
- NumPy — board-to-tensor encoding, policy/value array ops, weight inspection.
- PyYAML — reads training configs (
configs/training_*.yaml) that select the base model and which layers to unfreeze. - h5py — HDF5 weight serialization (secondary path; primary is protobuf).
- SciPy, pandas, tqdm, natsort — misc utilities (data processing, progress bars, checkpoint sorting).
Chess & model formats
- python-chess 1.999 — legality checking, board state, game replay for both
test_maia.pyand data-prep scripts. - lczerolens — LC0 wrapper exposing
LczeroModel/LczeroBoard; used inplay_onnx.pyfor board-to-tensor encoding and move decoding. - lc0 (binary) — actual Leela engine used over UCI in
test_maia.py. - Protocol Buffers (
backend.proto.net_pb2, fromsrc/backend/proto/net.proto) — LC0 network/weight serialization, including SE-residual block definitions. - ONNX Runtime (
onnxruntime) — runs exported ONNX models for inference tests intest_onnx.py.
Environment
- conda — the env manifest pins CUDA 11.8 and lists PyTorch alongside TensorFlow (PyTorch is listed but the training path is TF-only).
2. Research & Design Rationale
I conducted two research conversations before building the pipeline, progressively refining the approach from "fine-tune Leela Zero" to "fine-tune Maia" to the final multi-base-model strategy.
2.1 Why Maia Over Leela Zero
The initial question was whether to fine-tune Leela Zero directly on personal games. The key insight I arrived at through research:
Leela Zero was trained via self-play reinforcement learning to find the strongest moves. Fine-tuning it on human games would fight against its core objective — it wants to play optimally, not humanly. Maia Chess is specifically designed to predict human moves at various skill levels, trained supervised on millions of Lichess games. Its architecture is inherently better suited for "play like a specific person."
Three approaches were evaluated:
- Use Maia directly — quick, human-like at approximate rating, but not personalized
- Fine-tune Maia on personal games — best bet for a "plays like me" bot (chosen approach)
- Train from scratch — requires tens of thousands of games (infeasible with ~1,800 games)
I identified the CSSLab/maia-individual repository as the purpose-built tooling for this use case. The Maia researchers had achieved up to 65% accuracy at predicting specific players' moves, demonstrating that individual playing style is learnable from observing enough games.
2.2 The Leela Fine-Tuning Pivot
After settling on Maia, I explored a second question: what about fine-tuning Leela in addition to Maia, to create a stronger personalized model? This led to the multi-model strategy.
Key tooling discovered: trainingdata-tool — built specifically for converting PGN files into Lc0's training data format. Used to train Leela networks mimicking human styles and to distill alpha-beta engines into Leela networks.
The SL (supervised learning) data quality problem: Most PGN files lack "policy" data (probability distributions over candidate moves). Without policy data, SL-trained nets are ~200 Elo weaker. trainingdata-tool addresses this by generating policy distributions from Stockfish evaluations of alternative moves using softmax — reconstructing what the policy would have been.
Precedents that validated the approach:
- Stein network (AllieStein): trained via SL on existing engine game data
- Lc0 team's fine-tuning for odds games: +14 =2 -2 against a GM in knight odds
- Detlef Kappe's personality nets (BadGyal, GoodGyal, TinyGyal): Trained on Lichess games with configurable
q-ratio— Good Gyal at 0.75 (more Stockfish influence, positional) vs Evil Gyal at 0.25 (more human influence, chaotic). This parameter controls how much the net learns raw move choices versus engine evaluations.
2.3 The Multi-Model Strategy
my final design used three base models with different fine-tuning strategies, each representing a different point on the personalization-vs-strength tradeoff:
- Maia 1900 (aggressive fine-tuning): All 6 blocks trainable, higher LR, no weight decay → maximum personalization, lower base strength
- Maia 2200 (conservative): Only last 4 blocks trainable, weight decay → better base play while learning personal style
- Leela 11258 80x7 (conservative): Larger/stronger model, only last 4 blocks trainable → strongest base play, hardest to personalize (the model's "default style" resists override)
This was a deliberate experiment: train all three, measure policy accuracy (how often the model predicts my actual move), and see which base model produces the best personalized bot. Results: Maia 2200 achieved 63.28% accuracy, Leela 11258 achieved 53.13% — confirming that human-style base models are better starting points for human-style fine-tuning.
2.4 Data Strategy
I had ~1,800 training games + ~200 validation games from Lichess and Chess.com (blitz, rapid, classical — no bullet). The research established that 1-2K games is "right in the sweet spot" for fine-tuning: enough to capture personal style without overfitting. Key decisions:
- Down-sample 1/32: Only 1 in every 32 positions used per training pass (~4,500 effective positions per epoch). Prevents overfitting on a small corpus.
- Color separation: Games split into white/black positions, alternated during training for balanced exposure.
- 90/10 train/val split: Standard but important for monitoring overfitting.
3. Architecture: The Transfer Learning Pipeline
Lichess/Chess.com API
│
▼
┌─────────────────┐ ┌─────────────────────┐ ┌──────────────────┐
│ prepare_data.py │────>│ ChunkParser │────>│ TFProcess │
│ PGN → split by │ │ V4 binary records │ │ Transfer learning│
│ color, 90/10 │ │ 1/32 down-sample │ │ w/ stop_gradient │
│ train/val split│ │ shuffle buffer(128) │ │ SGD + Nesterov │
└─────────────────┘ └─────────────────────┘ └────────┬─────────┘
│
TF Checkpoint
│
┌────────────────────────────────────┤
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ export_model.py │ │ export_onnx.py │
│ → .pb.gz (LC0) │ │ → .onnx (browser)│
└──────────────────┘ └──────────────────┘
│ │
▼ ▼
LC0 engine play play-lc0 / play_onnx.py4. Board Encoding & Policy Indexing
4.1 The 112-Plane Input Tensor
The Leela Chess Zero / AlphaZero input format: [1, 112, 8, 8] Float32.
Plane layout:
- Planes 0–103: 13 planes × 8 history positions (12 piece types + 1 repetition flag per position). In this codebase, only the current position is populated.
- Planes 104–107: Castling rights (our queenside, our kingside, their queenside, their kingside) — full 8×8 planes of 1s or 0s
- Plane 108: Side-to-move indicator (1.0 if black)
- Plane 109: Rule50 counter / 99.0 (capped at 1.0)
- Plane 110: Move count (always 0)
- Plane 111: All-ones plane
Critical: perspective flipping for Black. When it's black's turn:
- Board is rotated 180 degrees (both ranks and files reversed)
- Piece colors are swapped (uppercase ↔ lowercase in FEN)
- Castling rights are swapped (our/their perspective)
- The model always sees the position "from the side-to-move's perspective"
This is implemented in fen_to_vec.py (preproc_fen()) and verified across multiple test files (test_no_flip.py, test_output_flip.py, test_onnx.py).
4.2 The 1858-Element Policy Index
The neural network outputs 1858 policy logits — one per possible move pattern in LC0's compressed encoding. policy_index.py contains the canonical list of 1858 UCI move strings (e.g., "a1b1", "e7e8q").
The mapping between the 80×8×8 convolutional policy output (5120 elements) and the 1858-element vector is built by lc0_az_policy_map.py:
- 56 planes for queen-like moves (8 directions × 7 distances)
- 8 planes for knight moves
- 9 planes for underpromotions (3 directions × 3 piece types)
- Total: 73 used + 7 unused = 80 planes
This policy index is the same one used in play-lc0 — I directed using this pre-generated table when the AI's programmatic generation in play-lc0 produced wrong results.
5. The Training Backend
5.1 Network Architecture
SE-ResNet (Squeeze-and-Excitation Residual Network):
- Input: 112 planes of 8×8 → Conv2D to
RESIDUAL_FILTERSchannels - Residual tower:
RESIDUAL_BLOCKSblocks, each: Conv2D → BN → ReLU → Conv2D → BN → SE → skip connection → ReLU - Policy head (two modes):
- CONVOLUTION (Maia-style): Conv2D(policy_channels=32) → Conv2D(80, 3×3) → ApplyPolicyMap → 1858 logits
- CLASSICAL (Leela-style): Conv2D(policy_channels) → Flatten → Dense(1858)
- Value head (always WDL): Conv2D(32) → Flatten → Dense(128, relu) → Dense(3) — win/draw/loss probabilities
5.2 Transfer Learning: The Stop-Gradient Approach
Instead of Keras's standard layer.trainable = False, the pipeline inserts tf.stop_gradient() Lambda layers at strategic points. This is more flexible because it allows the frozen layers to still participate in the forward pass while blocking gradient flow.
The back_prop_blocks config parameter (counted from the output) determines how many sections receive gradients:
back_prop_blocks: 6with 6 residual blocks → all blocks trainable (aggressive personalization)back_prop_blocks: 4with 6 blocks → first 2 blocks frozen (conservative, preserves base knowledge)
The value head is ALWAYS frozen — only the policy head is fine-tuned. This makes sense because the goal is to learn the player's move choices, not their position evaluation.
5.3 Loss Function
total_loss = policy_weight × cross_entropy(policy)
+ value_weight × cross_entropy(WDL)
+ L2_regularization- Policy loss: Softmax cross-entropy between target policy and predicted logits
- Value loss: Softmax cross-entropy between target WDL and predicted WDL
- Regularization: L2 with
weight_decay(0.0001) applied to all trainable layers - Gradient clipping: Global norm clipping at 10,000
5.4 Optimizer & Schedule
SGD with Nesterov momentum (0.9). Piecewise constant learning rate with optional warmup. Float16 mixed precision with loss scale 128.
5.5 Data Pipeline
prepare_data.py: Reads combined PGN, splits by color (white/black), 90/10 train/val splitChunkParser: Multiprocessing workers read V4 binary records from.gzchunks, down-sample 1/32 (only 1 in every 32 positions is used), filter by color, shuffle via 128-element buffertf.data.Dataset.from_generator()wraps the parser with.map(parse_function).prefetch(4)
The V4 binary format (8292 bytes per record) contains: version, 7432 bytes of planes + probabilities, winner, best_q, and auxiliary data.
5.6 Alternate Training Mode
tfprocess_reg_lr_noise.py (not actively used) implements two additional regularization techniques:
- Gaussian noise injection:
N(0, 0.01 × std(weight))added to each weight during initialization — prevents perfect memorization of base model - Per-layer LR scaling: Earlier layers get lower learning rates (conv_block1: 0.05×, res_0: 0.1×, ... res_5: 0.85×, heads: 1.0×) — creates a gradient hierarchy where lower layers change less
6. Model Export & Inference
6.1 Export to LC0 .pb.gz
export_model.py extracts weights from a TF checkpoint and packs them into LC0's protobuf format:
- Conv2D weights: Transposed from TF
[H,W,in,out]to LC0[out,in,H,W] - Dense weights: Transposed from TF
[in,out]to LC0[out,in] - Batch norm: Standard deviation is squared (LC0 stores variance, not stddev)
- Head weight permutation: Policy/value head weights are reordered to match LC0's expected ordering
- Rule50 rescaling: Input weights for plane 109 are divided by 99
LC0 weight saving is explicitly disabled during training to avoid crashes from weight count mismatches when back_prop_blocks < total_blocks. Export is a separate post-training step.
6.2 Export to ONNX
export_onnx.py rebuilds the model architecture from scratch in Keras, restores the checkpoint, then converts via tf2onnx. This produces the .onnx files used by play-lc0 in the browser.
6.3 Interactive CLI Play
play_onnx.py uses the lczerolens library for ONNX inference. Simple game loop: print board → accept SAN/UCI input → run inference → argmax over legal policy indices → print move. The lczerolens library handles all encoding/decoding internally.
7. Training Configurations & Results
7.1 Configuration Comparison
| Parameter | Maia 1900 (v1/v2) | Maia 2200 | Leela 11258 |
|---|---|---|---|
| Architecture | 64 filters, 6 blocks, SE(8) | 64 filters, 6 blocks, SE(8) | 80 filters, 7 blocks, SE(4) |
| Batch size | 64 | 128 | 128 |
| Total steps | 100,000 | 50,000 | 50,000 |
| Back-prop blocks | 6 (all trainable) | 4 (freeze first 2) | 4 (freeze first 3) |
| Starting LR | 0.002 | 0.001 | 0.001 |
| LR schedule | 0.002→0.0005→0.0001→0.00002 | 0.001→0.0005→0.0001→0.00002 | same as Maia 2200 |
| Weight decay | none (default 0.0001) | 0.0001 | 0.0001 |
| Precision | float16 | float16 | float16 |
Key design insight: The Maia 1900 config is the most aggressive — all blocks trainable, higher LR, no explicit weight decay — maximizing personalization at the cost of base model knowledge. The larger models use more conservative freezing + weight decay to preserve engine strength while adding personal style.
7.2 Trained Models
| Model | Base | Steps | Policy Accuracy | Size | Notes |
|---|---|---|---|---|---|
| I Maia v1 | Maia 1900 | ~50k | Not recorded | 1.7 MB | First fine-tune |
| I Maia v2 | Maia 1900 | ~100k | Not recorded | 1.7 MB | Production model |
| Leela 11258-25k | Leela 11258 | 25k | — | 8.1 MB | Experimental |
| Leela 11258-35k | Leela 11258 | 35k | 53.13% (best) | 8.1 MB | Experimental |
| Leela 11258-50k | Leela 11258 | 50k | — | 8.1 MB | Experimental |
| Maia 2200-20k | Maia 2200 | 20k | 63.28% (best at 18.5k) | 1.2 MB | Experimental |
What "policy accuracy" means: The percentage of positions where the model's top-1 policy prediction matches the move I actually played. Maia 2200 achieving 63.28% means it correctly predicts my move nearly 2/3 of the time — a strong result for a 64x6 network on ~1,800 games.
8. Technical Tradeoffs & Decisions
8.1 Stop-Gradient vs layer.trainable
Using tf.stop_gradient() Lambda layers instead of layer.trainable = False. The stop-gradient approach is more flexible: frozen layers still participate in forward pass and batch norm statistics computation, but receive no gradient updates. This preserves the base model's learned representations more faithfully during fine-tuning.
8.2 Down-Sampling 1/32
Only 1 in every 32 training positions is used (SKIP = 32). With ~1,800 games averaging ~80 positions each = ~144,000 total positions → ~4,500 effective training positions per epoch. This prevents overfitting on a small personal game corpus while still extracting the player's style signal.
8.3 Color-Separated Training Data
Games are split into white and black positions. The ChunkParser alternates between white and black data sources (FileDataSrc), ensuring balanced color representation in each batch. This matters because the board is always presented from the side-to-move perspective, so the model needs equal exposure to both perspectives.
8.4 Value Head Always Frozen
The value head (position evaluation) is never fine-tuned — only the policy head (move prediction). This is deliberate: the goal is to learn which moves the player chooses, not to change the model's assessment of how good positions are. The base model's position evaluation is already strong; personalizing it on ~1,800 games would likely degrade it.
8.5 Web UI Built Then Removed
A React + ONNX Runtime WASM web UI was built (PR #1, 213K lines added), encoding bugs were debugged (PR #2), then the entire web UI was deleted (Feb 6-7). The play-lc0 project superseded it as the browser-based chess interface — and inherited the lessons learned here about encoding, policy indexing, and ONNX conversion.
8.6 Multiple Base Models, Different Strategies
Three base models were tried with different fine-tuning strategies:
- Maia 1900 (aggressive): All blocks trainable, small architecture → maximum personalization, used as production model
- Maia 2200 (conservative): Only last 4 blocks trainable → better base play, decent accuracy (63.28%)
- Leela 11258 (conservative): Larger model, only last 4 blocks → strongest base play, lower personal accuracy (53.13%) because the model's "default style" is harder to override
8.7 Additional architectural tradeoffs worth naming
- Supervised imitation, not DPO / RL preference learning — target is cross-entropy on the played move. DPO between the played move and legal alternatives would model my preference over the ~35% of positions where top-1 accuracy plateaus — exactly where imitation learning fails. SL was the path of least resistance since Maia-individual is already pure CE and ~1.8k games is too little to bootstrap a reward signal. Inferred.
- TensorFlow 2, not PyTorch — inherited from the Maia-individual scaffolding (~5,600 LOC of TF). PyTorch has cleaner ONNX export and is the dominant research framework in 2026; rewriting would have cost more than the 7-day project. The cost is visible in the commit log — multiple days on TF2 compatibility bugs ("fixed TF2 weight handling," "lc0 direct weight loading," "disabled lc0 saves during training"). Evidenced in timeline.
- Value head frozen; not fine-tuned on my WDL outcomes — with 1.8k blitz games the outcome signal is extremely noisy (binary W/D/L, high variance). Freezing protects lc0's eval strength. But value loss is still computed in training for shared-trunk gradient balance, not because V is updating — an interviewer should probe: "then why compute it at all?" Inferred.
- Top-1 policy accuracy as the evaluation metric, not head-to-head Elo — cheap to compute, matches Maia's paper metric, but has a classic imitation-learning pitfall: penalizes the model for choosing a move I would also like. No Elo ladder, no move-agreement rate on held-out games, no KL to my empirical distribution. Inferred.
- No KD from base-Maia's full policy distribution; flat one-move target — q-ratio blending (exactly what Kappe's BadGyal/GoodGyal did, referenced in §2.2) would mix my played-move target with base-Maia's policy to preserve strength while shifting style. This was considered and deferred; implementing KD needed caching base-model logits. Inferred.
- Inherited
SKIP=32down-sample, not re-examined for small-data regime — gives ~4.5k effective positions/epoch, 50k–100k steps. The factor comes from Maia-individual's billion-game pipeline where it's a scale necessity — an unexamined inherited default for a 1.8k-game dataset where the right answer is probably "use all positions, fewer epochs, rely on weight decay." Inferred.
8.8 Strategic / "why do this at all" tradeoffs
- "Play as ME" as product thesis vs. Maia 1900 as-is — Maia 1900 already plays indistinguishably from an average 1900 human. Shipping it unmodified would have been zero effort. The fine-tune bets that the interesting product is identity-mimicry, not rating-mimicry — a bot friends recognize as me by opening repertoire, time-trouble blunders, and pet tactics. That reframes success from "human-like" (already solved) to "distinctively mine in ways distinguishable from a generic 1900," which no off-the-shelf model offers and which top-1 accuracy only weakly captures. Inferred.
- Three base models trained in parallel as a de-risking hedge, not an ablation study — with a 7-day budget and unknown answers to "does 1.8k games move the needle?" and "does a stronger base resist personalization?", training Maia 1900 / 2200 / Leela in parallel was a portfolio bet: at least one would produce a shippable artifact. Cost: none got the care a single-model study would — no seeds, no held-out Elo, inconsistent step counts — so the 63% vs 53% gap is suggestive not conclusive. Right call under time pressure, not defensible as research. Inferred.
- Fork Maia-individual vs. clean PyTorch rewrite — a clean rewrite would have been ~2 weeks and produced better ONNX ergonomics in the framework I actually use. Forking bought a working V4 chunk parser, SE-ResNet definition, lc0 weight format, and policy-map matrix on day one — at the cost of 3+ days debugging inherited TF2 bugs (visible in commit log) and carrying unexamined defaults. Correct for a 7-day project; the tax is the codebase isn't reusable for the next idea. Evidenced in TF2 bug commits.
- Standalone training project vs. folding into play-lc0 — deliberate separation of concerns: training needs CUDA + TF2 + Python + chunk files; play-lc0 needs a browser and an
.onnx. Co-locating would couple a weekly-iteration research repo to a user-facing app's release cadence. The 3.3 MB.onnxis the clean contract — play-lc0 can add any lc0-compatible net later without touching training code. Inferred. - Personalized-bot generator vs. one-off personal model — nothing in the repo hardcodes me:
prepare_data.pytakes any PGN, configs are YAML, export produces a standard.onnx. The real deliverable is a pipeline that turns anyone's Lichess archive into a personal bot — a product ("Maia-as-a-Service for your Lichess account") not a trophy model. The interview story shifts from "I trained a model" to "I built the factory." Inferred. - Surfacing q-ratio research as the honest deferral — Kappe's BadGyal/GoodGyal precedent (mixed played-move + base-Maia policy to preserve strength while shifting style) is exactly the right prior art for the personalization-vs-strength dial. I found it, evaluated it, and made a conscious deferral for scope. "I know what I didn't build, and why" is a stronger signal than shipping a half-baked KD implementation. Evidenced in §2.2.
8.9 Code-level tradeoffs visible in the source
- Config-per-experiment with near-duplication, not hierarchical inheritance —
configs/has four standalone YAMLs.training_config_wd.yamlis byte-for-bytetraining_config.yamlplus one line (weight_decay: 0.0001). Architecture constants (filters/blocks/se_ratio) are duplicated into every config even though they're dictated by the base.pb.gz. Hydra or OmegaConf would enabledefaults:inheritance.train_transfer.pyderivescollection_namefrom the config's directory andnamefrom the filename — a flat file-per-run layout is load-bearing for output naming. For a 4-config repo, inheritance adds ceremony for zero savings. Evidenced. prepare_data.pystops at PGN split; chunk generation is delegated to upstream Maia tooling — the script only splits combined PGN by player color and does a 90/10 slice. Fusing PGN → v4 chunk generation into one script would producedata/processed/<name>/{train,validate}end-to-end. Keeping the boundary at PGN means users reuse the Maia-style chunk pipeline unmodified, and avoids reimplementing theV4_STRUCT_STRING = '4s7432s832sBBBBBBBbffff'format that TF can't efficiently unpack in-graph anyway. Evidenced inchunkparser.pycomment.- Shell script for GPU run serialization, not a queue system —
scripts/queue_wd_training.shpollspgrep -f train_transfer.pyevery 60s, renamesruns/hunter → runs/hunter-v2-no-wdbetween runs, then launches the next one. A Python orchestrator or justcmd1 && cmd2would also work. The rename has to land between runs becauseruns/hunteris the default TensorBoard output path;&&skips on failure but doesn't handle the rename. 20 lines of bash beats any proper scheduler for single-GPU workstation serial runs. WSL hardcode (/mnt/c/Users/chenh/...) is a smell consistent with a solo-dev tool. Evidenced. - V4 training chunk format inherited verbatim from Leela/Maia, not Parquet/Arrow/TFRecord —
chunkparser.pyretains the GPLv3 header from Leela Chess and supports V3/V4 raw-byte packed chunks (7432-byte state + 832-byte policy). Modern columnar format would be TF-native and avoid the"TensorFlow doesn't have a fast way to unpack bit vectors"comment that drives the current unpack-in-Python-workers approach. Staying on V4 means binary compatibility with upstream Maia checkpoints andlc0— forking the format would break the whole point of the project (producing lc0-compatible personalized nets). Evidenced.
9. Development Timeline
| Date | Commits | Key Achievement |
|---|---|---|
| Feb 3 | 2 | Initial commit + full training infrastructure + I Maia v1 |
| Feb 3-4 | 2 PRs | Web UI (PR #1), encoding bug fixes (PR #2) |
| Feb 5 | 1 | play_onnx.py CLI play script |
| Feb 5-6 | 5 | Weight decay configs, TF2 fixes, lc0 weight loading, multi-model support |
| Feb 6 | 2 | Leela 11258 + Maia 2200 training runs + checkpoints + exports |
| Feb 6-7 | 2 | Deleted web UI, cleaned up WASM artifacts |
| Feb 10 | 1 | Final README revision |
Total: 19 commits, 7 days, from nothing to 6 trained models + full export pipeline.
10. Key Files Reference
Training
| File | Lines | Purpose |
|---|---|---|
src/backend/tf_transfer/tfprocess.py | ~1200 | Main TF training loop: model construction, weight loading, loss, optimizer |
scripts/train_transfer.py | ~100 | Training orchestrator: config loading, data pipeline, loop execution |
src/backend/tf_transfer/chunkparser.py | ~300 | Multiprocessing V4 binary record parser with shuffle buffer |
src/backend/tf_transfer/training_shared.py | ~100 | Chunk discovery, white/black data source alternation |
Encoding
| File | Lines | Purpose |
|---|---|---|
src/backend/fen_to_vec.py | ~200 | FEN → 17/112-plane tensor with black-side flip |
src/backend/tf_transfer/policy_index.py | ~1900 | 1858 UCI move strings (canonical LC0 ordering) |
src/backend/tf_transfer/lc0_az_policy_map.py | ~200 | 80×8×8 → 1858 mapping matrix |
Export
| File | Lines | Purpose |
|---|---|---|
scripts/export_model.py | ~300 | TF checkpoint → LC0 .pb.gz (with weight transposition) |
scripts/export_onnx.py | ~500 | TF checkpoint → ONNX (rebuilds model, uses tf2onnx) |
Data
| File | Lines | Purpose |
|---|---|---|
scripts/prepare_data.py | ~80 | PGN split by color, 90/10 train/val |
src/backend/pgn_to_csv.py | ~400 | PGN → per-move CSV with eval/clock/material features |
Configs
| File | Base Model | Blocks Trainable | Total Steps |
|---|---|---|---|
configs/training_config.yaml | Maia 1900 | All 6 | 100k |
configs/training_maia_2200.yaml | Maia 2200 | Last 4 | 50k |
configs/training_leela_11258.yaml | Leela 11258 | Last 4 | 50k |