Thorin

Enter password to continue

Skip to content

Hunter Chessbot — Deep Technical Profile

Build timeline — ~7 active days across 3 phases (Feb 3 – Feb 10, 2026, 8 calendar days)

  1. Initial Maia fine-tune + web UI (Feb 3, 1 day) — fine-tuned Maia 1900, ONNX export tooling, web UI, encoding fixes
  2. v2 iteration + UX polish (Feb 3–4, 2 days) — ONNX play script, v2 model, promotion dialog, color switching, web fixes
  3. TF2 training pipeline + Maia 2200 (Feb 5–10, 5 days) — fixed TF2 weight handling, lc0 direct weight loading, disabled lc0 saves during training, weight-decay config fix, checkpoint export, Maia 2200 best-accuracy checkpoint (the model used in play-lc0)

Table of Contents

  1. Project Overview
  1. Research & Design Rationale
  2. Architecture: The Transfer Learning Pipeline
  3. Board Encoding & Policy Indexing
  4. The Training Backend
  5. Model Export & Inference
  6. Training Configurations & Results
  7. Technical Tradeoffs & Decisions
  8. Development Timeline
  9. Key Files Reference

1. Project Overview

A fine-tuning pipeline for chess neural networks that takes base models (Maia 1900, Maia 2200, Leela 11258 80x7-SE) and fine-tunes them on my personal Lichess and Chess.com games (~1,800 training / ~200 validation games) to create chess engines that play in my style. The "Maia 2200 Hunter" model used in the play-lc0 project was produced here.

By the numbers:

  • 19 commits, 2 merged PRs, built in 7 days (Feb 3–10, 2026)
  • ~9,700 lines of Python (5,640 in TF backend alone)
  • 6 trained models: 2 production (Maia 1900 v1/v2) + 4 experimental (Leela 11258 at 25k/35k/50k, Maia 2200 at 20k)
  • TensorFlow 2.x transfer learning with configurable layer freezing
  • Exports to LC0 .pb.gz format and ONNX for browser inference
  • Built on top of maia-individual training code

Connection to other projects: The .onnx exports from this pipeline are directly used in play-lc0 as the "Maia 2200 Hunter" network — a personalized chess engine trained on my own games.


Libraries & Frameworks

Training backend (src/backend/tf_transfer/, Python)

  • TensorFlow — model construction, transfer-learning loss, optimizer, checkpointing in tfprocess.py.
  • TensorBoard — scalar/histogram logging during training runs.
  • NumPy — board-to-tensor encoding, policy/value array ops, weight inspection.
  • PyYAML — reads training configs (configs/training_*.yaml) that select the base model and which layers to unfreeze.
  • h5py — HDF5 weight serialization (secondary path; primary is protobuf).
  • SciPy, pandas, tqdm, natsort — misc utilities (data processing, progress bars, checkpoint sorting).

Chess & model formats

  • python-chess 1.999 — legality checking, board state, game replay for both test_maia.py and data-prep scripts.
  • lczerolens — LC0 wrapper exposing LczeroModel / LczeroBoard; used in play_onnx.py for board-to-tensor encoding and move decoding.
  • lc0 (binary) — actual Leela engine used over UCI in test_maia.py.
  • Protocol Buffers (backend.proto.net_pb2, from src/backend/proto/net.proto) — LC0 network/weight serialization, including SE-residual block definitions.
  • ONNX Runtime (onnxruntime) — runs exported ONNX models for inference tests in test_onnx.py.

Environment

  • conda — the env manifest pins CUDA 11.8 and lists PyTorch alongside TensorFlow (PyTorch is listed but the training path is TF-only).

2. Research & Design Rationale

I conducted two research conversations before building the pipeline, progressively refining the approach from "fine-tune Leela Zero" to "fine-tune Maia" to the final multi-base-model strategy.

2.1 Why Maia Over Leela Zero

The initial question was whether to fine-tune Leela Zero directly on personal games. The key insight I arrived at through research:

Leela Zero was trained via self-play reinforcement learning to find the strongest moves. Fine-tuning it on human games would fight against its core objective — it wants to play optimally, not humanly. Maia Chess is specifically designed to predict human moves at various skill levels, trained supervised on millions of Lichess games. Its architecture is inherently better suited for "play like a specific person."

Three approaches were evaluated:

  1. Use Maia directly — quick, human-like at approximate rating, but not personalized
  2. Fine-tune Maia on personal games — best bet for a "plays like me" bot (chosen approach)
  3. Train from scratch — requires tens of thousands of games (infeasible with ~1,800 games)

I identified the CSSLab/maia-individual repository as the purpose-built tooling for this use case. The Maia researchers had achieved up to 65% accuracy at predicting specific players' moves, demonstrating that individual playing style is learnable from observing enough games.

2.2 The Leela Fine-Tuning Pivot

After settling on Maia, I explored a second question: what about fine-tuning Leela in addition to Maia, to create a stronger personalized model? This led to the multi-model strategy.

Key tooling discovered: trainingdata-tool — built specifically for converting PGN files into Lc0's training data format. Used to train Leela networks mimicking human styles and to distill alpha-beta engines into Leela networks.

The SL (supervised learning) data quality problem: Most PGN files lack "policy" data (probability distributions over candidate moves). Without policy data, SL-trained nets are ~200 Elo weaker. trainingdata-tool addresses this by generating policy distributions from Stockfish evaluations of alternative moves using softmax — reconstructing what the policy would have been.

Precedents that validated the approach:

  • Stein network (AllieStein): trained via SL on existing engine game data
  • Lc0 team's fine-tuning for odds games: +14 =2 -2 against a GM in knight odds
  • Detlef Kappe's personality nets (BadGyal, GoodGyal, TinyGyal): Trained on Lichess games with configurable q-ratio — Good Gyal at 0.75 (more Stockfish influence, positional) vs Evil Gyal at 0.25 (more human influence, chaotic). This parameter controls how much the net learns raw move choices versus engine evaluations.

2.3 The Multi-Model Strategy

my final design used three base models with different fine-tuning strategies, each representing a different point on the personalization-vs-strength tradeoff:

  1. Maia 1900 (aggressive fine-tuning): All 6 blocks trainable, higher LR, no weight decay → maximum personalization, lower base strength
  2. Maia 2200 (conservative): Only last 4 blocks trainable, weight decay → better base play while learning personal style
  3. Leela 11258 80x7 (conservative): Larger/stronger model, only last 4 blocks trainable → strongest base play, hardest to personalize (the model's "default style" resists override)

This was a deliberate experiment: train all three, measure policy accuracy (how often the model predicts my actual move), and see which base model produces the best personalized bot. Results: Maia 2200 achieved 63.28% accuracy, Leela 11258 achieved 53.13% — confirming that human-style base models are better starting points for human-style fine-tuning.

2.4 Data Strategy

I had ~1,800 training games + ~200 validation games from Lichess and Chess.com (blitz, rapid, classical — no bullet). The research established that 1-2K games is "right in the sweet spot" for fine-tuning: enough to capture personal style without overfitting. Key decisions:

  • Down-sample 1/32: Only 1 in every 32 positions used per training pass (~4,500 effective positions per epoch). Prevents overfitting on a small corpus.
  • Color separation: Games split into white/black positions, alternated during training for balanced exposure.
  • 90/10 train/val split: Standard but important for monitoring overfitting.

3. Architecture: The Transfer Learning Pipeline

Lichess/Chess.com API


┌─────────────────┐     ┌─────────────────────┐     ┌──────────────────┐
│  prepare_data.py │────>│  ChunkParser         │────>│  TFProcess        │
│  PGN → split by │     │  V4 binary records   │     │  Transfer learning│
│  color, 90/10   │     │  1/32 down-sample    │     │  w/ stop_gradient │
│  train/val split│     │  shuffle buffer(128) │     │  SGD + Nesterov   │
└─────────────────┘     └─────────────────────┘     └────────┬─────────┘

                                                    TF Checkpoint

                         ┌────────────────────────────────────┤
                         │                                    │
                         ▼                                    ▼
               ┌──────────────────┐              ┌──────────────────┐
               │  export_model.py  │              │  export_onnx.py   │
               │  → .pb.gz (LC0)   │              │  → .onnx (browser)│
               └──────────────────┘              └──────────────────┘
                         │                                    │
                         ▼                                    ▼
                  LC0 engine play                    play-lc0 / play_onnx.py

4. Board Encoding & Policy Indexing

4.1 The 112-Plane Input Tensor

The Leela Chess Zero / AlphaZero input format: [1, 112, 8, 8] Float32.

Plane layout:

  • Planes 0–103: 13 planes × 8 history positions (12 piece types + 1 repetition flag per position). In this codebase, only the current position is populated.
  • Planes 104–107: Castling rights (our queenside, our kingside, their queenside, their kingside) — full 8×8 planes of 1s or 0s
  • Plane 108: Side-to-move indicator (1.0 if black)
  • Plane 109: Rule50 counter / 99.0 (capped at 1.0)
  • Plane 110: Move count (always 0)
  • Plane 111: All-ones plane

Critical: perspective flipping for Black. When it's black's turn:

  • Board is rotated 180 degrees (both ranks and files reversed)
  • Piece colors are swapped (uppercase ↔ lowercase in FEN)
  • Castling rights are swapped (our/their perspective)
  • The model always sees the position "from the side-to-move's perspective"

This is implemented in fen_to_vec.py (preproc_fen()) and verified across multiple test files (test_no_flip.py, test_output_flip.py, test_onnx.py).

4.2 The 1858-Element Policy Index

The neural network outputs 1858 policy logits — one per possible move pattern in LC0's compressed encoding. policy_index.py contains the canonical list of 1858 UCI move strings (e.g., "a1b1", "e7e8q").

The mapping between the 80×8×8 convolutional policy output (5120 elements) and the 1858-element vector is built by lc0_az_policy_map.py:

  • 56 planes for queen-like moves (8 directions × 7 distances)
  • 8 planes for knight moves
  • 9 planes for underpromotions (3 directions × 3 piece types)
  • Total: 73 used + 7 unused = 80 planes

This policy index is the same one used in play-lc0 — I directed using this pre-generated table when the AI's programmatic generation in play-lc0 produced wrong results.


5. The Training Backend

5.1 Network Architecture

SE-ResNet (Squeeze-and-Excitation Residual Network):

  • Input: 112 planes of 8×8 → Conv2D to RESIDUAL_FILTERS channels
  • Residual tower: RESIDUAL_BLOCKS blocks, each: Conv2D → BN → ReLU → Conv2D → BN → SE → skip connection → ReLU
  • Policy head (two modes):
    • CONVOLUTION (Maia-style): Conv2D(policy_channels=32) → Conv2D(80, 3×3) → ApplyPolicyMap → 1858 logits
    • CLASSICAL (Leela-style): Conv2D(policy_channels) → Flatten → Dense(1858)
  • Value head (always WDL): Conv2D(32) → Flatten → Dense(128, relu) → Dense(3) — win/draw/loss probabilities

5.2 Transfer Learning: The Stop-Gradient Approach

Instead of Keras's standard layer.trainable = False, the pipeline inserts tf.stop_gradient() Lambda layers at strategic points. This is more flexible because it allows the frozen layers to still participate in the forward pass while blocking gradient flow.

The back_prop_blocks config parameter (counted from the output) determines how many sections receive gradients:

  • back_prop_blocks: 6 with 6 residual blocks → all blocks trainable (aggressive personalization)
  • back_prop_blocks: 4 with 6 blocks → first 2 blocks frozen (conservative, preserves base knowledge)

The value head is ALWAYS frozen — only the policy head is fine-tuned. This makes sense because the goal is to learn the player's move choices, not their position evaluation.

5.3 Loss Function

total_loss = policy_weight × cross_entropy(policy) 
           + value_weight × cross_entropy(WDL)
           + L2_regularization
  • Policy loss: Softmax cross-entropy between target policy and predicted logits
  • Value loss: Softmax cross-entropy between target WDL and predicted WDL
  • Regularization: L2 with weight_decay (0.0001) applied to all trainable layers
  • Gradient clipping: Global norm clipping at 10,000

5.4 Optimizer & Schedule

SGD with Nesterov momentum (0.9). Piecewise constant learning rate with optional warmup. Float16 mixed precision with loss scale 128.

5.5 Data Pipeline

  1. prepare_data.py: Reads combined PGN, splits by color (white/black), 90/10 train/val split
  2. ChunkParser: Multiprocessing workers read V4 binary records from .gz chunks, down-sample 1/32 (only 1 in every 32 positions is used), filter by color, shuffle via 128-element buffer
  3. tf.data.Dataset.from_generator() wraps the parser with .map(parse_function).prefetch(4)

The V4 binary format (8292 bytes per record) contains: version, 7432 bytes of planes + probabilities, winner, best_q, and auxiliary data.

5.6 Alternate Training Mode

tfprocess_reg_lr_noise.py (not actively used) implements two additional regularization techniques:

  • Gaussian noise injection: N(0, 0.01 × std(weight)) added to each weight during initialization — prevents perfect memorization of base model
  • Per-layer LR scaling: Earlier layers get lower learning rates (conv_block1: 0.05×, res_0: 0.1×, ... res_5: 0.85×, heads: 1.0×) — creates a gradient hierarchy where lower layers change less

6. Model Export & Inference

6.1 Export to LC0 .pb.gz

export_model.py extracts weights from a TF checkpoint and packs them into LC0's protobuf format:

  • Conv2D weights: Transposed from TF [H,W,in,out] to LC0 [out,in,H,W]
  • Dense weights: Transposed from TF [in,out] to LC0 [out,in]
  • Batch norm: Standard deviation is squared (LC0 stores variance, not stddev)
  • Head weight permutation: Policy/value head weights are reordered to match LC0's expected ordering
  • Rule50 rescaling: Input weights for plane 109 are divided by 99

LC0 weight saving is explicitly disabled during training to avoid crashes from weight count mismatches when back_prop_blocks < total_blocks. Export is a separate post-training step.

6.2 Export to ONNX

export_onnx.py rebuilds the model architecture from scratch in Keras, restores the checkpoint, then converts via tf2onnx. This produces the .onnx files used by play-lc0 in the browser.

6.3 Interactive CLI Play

play_onnx.py uses the lczerolens library for ONNX inference. Simple game loop: print board → accept SAN/UCI input → run inference → argmax over legal policy indices → print move. The lczerolens library handles all encoding/decoding internally.


7. Training Configurations & Results

7.1 Configuration Comparison

ParameterMaia 1900 (v1/v2)Maia 2200Leela 11258
Architecture64 filters, 6 blocks, SE(8)64 filters, 6 blocks, SE(8)80 filters, 7 blocks, SE(4)
Batch size64128128
Total steps100,00050,00050,000
Back-prop blocks6 (all trainable)4 (freeze first 2)4 (freeze first 3)
Starting LR0.0020.0010.001
LR schedule0.002→0.0005→0.0001→0.000020.001→0.0005→0.0001→0.00002same as Maia 2200
Weight decaynone (default 0.0001)0.00010.0001
Precisionfloat16float16float16

Key design insight: The Maia 1900 config is the most aggressive — all blocks trainable, higher LR, no explicit weight decay — maximizing personalization at the cost of base model knowledge. The larger models use more conservative freezing + weight decay to preserve engine strength while adding personal style.

7.2 Trained Models

ModelBaseStepsPolicy AccuracySizeNotes
I Maia v1Maia 1900~50kNot recorded1.7 MBFirst fine-tune
I Maia v2Maia 1900~100kNot recorded1.7 MBProduction model
Leela 11258-25kLeela 1125825k8.1 MBExperimental
Leela 11258-35kLeela 1125835k53.13% (best)8.1 MBExperimental
Leela 11258-50kLeela 1125850k8.1 MBExperimental
Maia 2200-20kMaia 220020k63.28% (best at 18.5k)1.2 MBExperimental

What "policy accuracy" means: The percentage of positions where the model's top-1 policy prediction matches the move I actually played. Maia 2200 achieving 63.28% means it correctly predicts my move nearly 2/3 of the time — a strong result for a 64x6 network on ~1,800 games.


8. Technical Tradeoffs & Decisions

8.1 Stop-Gradient vs layer.trainable

Using tf.stop_gradient() Lambda layers instead of layer.trainable = False. The stop-gradient approach is more flexible: frozen layers still participate in forward pass and batch norm statistics computation, but receive no gradient updates. This preserves the base model's learned representations more faithfully during fine-tuning.

8.2 Down-Sampling 1/32

Only 1 in every 32 training positions is used (SKIP = 32). With ~1,800 games averaging ~80 positions each = ~144,000 total positions → ~4,500 effective training positions per epoch. This prevents overfitting on a small personal game corpus while still extracting the player's style signal.

8.3 Color-Separated Training Data

Games are split into white and black positions. The ChunkParser alternates between white and black data sources (FileDataSrc), ensuring balanced color representation in each batch. This matters because the board is always presented from the side-to-move perspective, so the model needs equal exposure to both perspectives.

8.4 Value Head Always Frozen

The value head (position evaluation) is never fine-tuned — only the policy head (move prediction). This is deliberate: the goal is to learn which moves the player chooses, not to change the model's assessment of how good positions are. The base model's position evaluation is already strong; personalizing it on ~1,800 games would likely degrade it.

8.5 Web UI Built Then Removed

A React + ONNX Runtime WASM web UI was built (PR #1, 213K lines added), encoding bugs were debugged (PR #2), then the entire web UI was deleted (Feb 6-7). The play-lc0 project superseded it as the browser-based chess interface — and inherited the lessons learned here about encoding, policy indexing, and ONNX conversion.

8.6 Multiple Base Models, Different Strategies

Three base models were tried with different fine-tuning strategies:

  • Maia 1900 (aggressive): All blocks trainable, small architecture → maximum personalization, used as production model
  • Maia 2200 (conservative): Only last 4 blocks trainable → better base play, decent accuracy (63.28%)
  • Leela 11258 (conservative): Larger model, only last 4 blocks → strongest base play, lower personal accuracy (53.13%) because the model's "default style" is harder to override

8.7 Additional architectural tradeoffs worth naming

  • Supervised imitation, not DPO / RL preference learning — target is cross-entropy on the played move. DPO between the played move and legal alternatives would model my preference over the ~35% of positions where top-1 accuracy plateaus — exactly where imitation learning fails. SL was the path of least resistance since Maia-individual is already pure CE and ~1.8k games is too little to bootstrap a reward signal. Inferred.
  • TensorFlow 2, not PyTorch — inherited from the Maia-individual scaffolding (~5,600 LOC of TF). PyTorch has cleaner ONNX export and is the dominant research framework in 2026; rewriting would have cost more than the 7-day project. The cost is visible in the commit log — multiple days on TF2 compatibility bugs ("fixed TF2 weight handling," "lc0 direct weight loading," "disabled lc0 saves during training"). Evidenced in timeline.
  • Value head frozen; not fine-tuned on my WDL outcomes — with 1.8k blitz games the outcome signal is extremely noisy (binary W/D/L, high variance). Freezing protects lc0's eval strength. But value loss is still computed in training for shared-trunk gradient balance, not because V is updating — an interviewer should probe: "then why compute it at all?" Inferred.
  • Top-1 policy accuracy as the evaluation metric, not head-to-head Elo — cheap to compute, matches Maia's paper metric, but has a classic imitation-learning pitfall: penalizes the model for choosing a move I would also like. No Elo ladder, no move-agreement rate on held-out games, no KL to my empirical distribution. Inferred.
  • No KD from base-Maia's full policy distribution; flat one-move target — q-ratio blending (exactly what Kappe's BadGyal/GoodGyal did, referenced in §2.2) would mix my played-move target with base-Maia's policy to preserve strength while shifting style. This was considered and deferred; implementing KD needed caching base-model logits. Inferred.
  • Inherited SKIP=32 down-sample, not re-examined for small-data regime — gives ~4.5k effective positions/epoch, 50k–100k steps. The factor comes from Maia-individual's billion-game pipeline where it's a scale necessity — an unexamined inherited default for a 1.8k-game dataset where the right answer is probably "use all positions, fewer epochs, rely on weight decay." Inferred.

8.8 Strategic / "why do this at all" tradeoffs

  • "Play as ME" as product thesis vs. Maia 1900 as-is — Maia 1900 already plays indistinguishably from an average 1900 human. Shipping it unmodified would have been zero effort. The fine-tune bets that the interesting product is identity-mimicry, not rating-mimicry — a bot friends recognize as me by opening repertoire, time-trouble blunders, and pet tactics. That reframes success from "human-like" (already solved) to "distinctively mine in ways distinguishable from a generic 1900," which no off-the-shelf model offers and which top-1 accuracy only weakly captures. Inferred.
  • Three base models trained in parallel as a de-risking hedge, not an ablation study — with a 7-day budget and unknown answers to "does 1.8k games move the needle?" and "does a stronger base resist personalization?", training Maia 1900 / 2200 / Leela in parallel was a portfolio bet: at least one would produce a shippable artifact. Cost: none got the care a single-model study would — no seeds, no held-out Elo, inconsistent step counts — so the 63% vs 53% gap is suggestive not conclusive. Right call under time pressure, not defensible as research. Inferred.
  • Fork Maia-individual vs. clean PyTorch rewrite — a clean rewrite would have been ~2 weeks and produced better ONNX ergonomics in the framework I actually use. Forking bought a working V4 chunk parser, SE-ResNet definition, lc0 weight format, and policy-map matrix on day one — at the cost of 3+ days debugging inherited TF2 bugs (visible in commit log) and carrying unexamined defaults. Correct for a 7-day project; the tax is the codebase isn't reusable for the next idea. Evidenced in TF2 bug commits.
  • Standalone training project vs. folding into play-lc0 — deliberate separation of concerns: training needs CUDA + TF2 + Python + chunk files; play-lc0 needs a browser and an .onnx. Co-locating would couple a weekly-iteration research repo to a user-facing app's release cadence. The 3.3 MB .onnx is the clean contract — play-lc0 can add any lc0-compatible net later without touching training code. Inferred.
  • Personalized-bot generator vs. one-off personal model — nothing in the repo hardcodes me: prepare_data.py takes any PGN, configs are YAML, export produces a standard .onnx. The real deliverable is a pipeline that turns anyone's Lichess archive into a personal bot — a product ("Maia-as-a-Service for your Lichess account") not a trophy model. The interview story shifts from "I trained a model" to "I built the factory." Inferred.
  • Surfacing q-ratio research as the honest deferral — Kappe's BadGyal/GoodGyal precedent (mixed played-move + base-Maia policy to preserve strength while shifting style) is exactly the right prior art for the personalization-vs-strength dial. I found it, evaluated it, and made a conscious deferral for scope. "I know what I didn't build, and why" is a stronger signal than shipping a half-baked KD implementation. Evidenced in §2.2.

8.9 Code-level tradeoffs visible in the source

  • Config-per-experiment with near-duplication, not hierarchical inheritanceconfigs/ has four standalone YAMLs. training_config_wd.yaml is byte-for-byte training_config.yaml plus one line (weight_decay: 0.0001). Architecture constants (filters/blocks/se_ratio) are duplicated into every config even though they're dictated by the base .pb.gz. Hydra or OmegaConf would enable defaults: inheritance. train_transfer.py derives collection_name from the config's directory and name from the filename — a flat file-per-run layout is load-bearing for output naming. For a 4-config repo, inheritance adds ceremony for zero savings. Evidenced.
  • prepare_data.py stops at PGN split; chunk generation is delegated to upstream Maia tooling — the script only splits combined PGN by player color and does a 90/10 slice. Fusing PGN → v4 chunk generation into one script would produce data/processed/<name>/{train,validate} end-to-end. Keeping the boundary at PGN means users reuse the Maia-style chunk pipeline unmodified, and avoids reimplementing the V4_STRUCT_STRING = '4s7432s832sBBBBBBBbffff' format that TF can't efficiently unpack in-graph anyway. Evidenced in chunkparser.py comment.
  • Shell script for GPU run serialization, not a queue systemscripts/queue_wd_training.sh polls pgrep -f train_transfer.py every 60s, renames runs/hunter → runs/hunter-v2-no-wd between runs, then launches the next one. A Python orchestrator or just cmd1 && cmd2 would also work. The rename has to land between runs because runs/hunter is the default TensorBoard output path; && skips on failure but doesn't handle the rename. 20 lines of bash beats any proper scheduler for single-GPU workstation serial runs. WSL hardcode (/mnt/c/Users/chenh/...) is a smell consistent with a solo-dev tool. Evidenced.
  • V4 training chunk format inherited verbatim from Leela/Maia, not Parquet/Arrow/TFRecordchunkparser.py retains the GPLv3 header from Leela Chess and supports V3/V4 raw-byte packed chunks (7432-byte state + 832-byte policy). Modern columnar format would be TF-native and avoid the "TensorFlow doesn't have a fast way to unpack bit vectors" comment that drives the current unpack-in-Python-workers approach. Staying on V4 means binary compatibility with upstream Maia checkpoints and lc0 — forking the format would break the whole point of the project (producing lc0-compatible personalized nets). Evidenced.

9. Development Timeline

DateCommitsKey Achievement
Feb 32Initial commit + full training infrastructure + I Maia v1
Feb 3-42 PRsWeb UI (PR #1), encoding bug fixes (PR #2)
Feb 51play_onnx.py CLI play script
Feb 5-65Weight decay configs, TF2 fixes, lc0 weight loading, multi-model support
Feb 62Leela 11258 + Maia 2200 training runs + checkpoints + exports
Feb 6-72Deleted web UI, cleaned up WASM artifacts
Feb 101Final README revision

Total: 19 commits, 7 days, from nothing to 6 trained models + full export pipeline.


10. Key Files Reference

Training

FileLinesPurpose
src/backend/tf_transfer/tfprocess.py~1200Main TF training loop: model construction, weight loading, loss, optimizer
scripts/train_transfer.py~100Training orchestrator: config loading, data pipeline, loop execution
src/backend/tf_transfer/chunkparser.py~300Multiprocessing V4 binary record parser with shuffle buffer
src/backend/tf_transfer/training_shared.py~100Chunk discovery, white/black data source alternation

Encoding

FileLinesPurpose
src/backend/fen_to_vec.py~200FEN → 17/112-plane tensor with black-side flip
src/backend/tf_transfer/policy_index.py~19001858 UCI move strings (canonical LC0 ordering)
src/backend/tf_transfer/lc0_az_policy_map.py~20080×8×8 → 1858 mapping matrix

Export

FileLinesPurpose
scripts/export_model.py~300TF checkpoint → LC0 .pb.gz (with weight transposition)
scripts/export_onnx.py~500TF checkpoint → ONNX (rebuilds model, uses tf2onnx)

Data

FileLinesPurpose
scripts/prepare_data.py~80PGN split by color, 90/10 train/val
src/backend/pgn_to_csv.py~400PGN → per-move CSV with eval/clock/material features

Configs

FileBase ModelBlocks TrainableTotal Steps
configs/training_config.yamlMaia 1900All 6100k
configs/training_maia_2200.yamlMaia 2200Last 450k
configs/training_leela_11258.yamlLeela 11258Last 450k