Thorin

Enter password to continue

Skip to content

Projects — Distilled

Single-file distillation of the six project deep-dives. Core facts, architecture, hardest problems, numbers, tradeoffs. No interview framing.


TI-84 CE Emulator (ti84ce)

Cycle-accurate Zilog eZ80 emulator built from scratch in Rust (~15,000 lines), native iOS/Android/Web frontends, instruction-by-instruction parity with CEmu. Shipped in 25 days (Jan 27 – Feb 20, 2026); ~80% AI co-authored (262/332 commits).

Full-system OS emulation of the TI-84 Plus CE: real eZ80 CPU, 13 hardware peripherals (LCD with DMA, timers, RTC, SPI, interrupt controller), 4MB flash ROM, 256KB RAM, all exposed through a narrow C ABI. Boots TI-OS 5.8.2 in 168,140,000 cycles (verified against CEmu). Dual-backend: Rust core swappable with CEmu C reference on every platform.

Distinctive decisions

  • Dual-backend via stable C ABI — both backends export the same 15 extern "C" functions (emu.h, 52 lines). Android: dlopen() + RTLD_LOCAL. iOS: static linking with dual prefixing (rust_emu_* + cemu_*). Web: no C ABI seam — TypeScript factory instantiates separate WASM modules. Payoff: real-time A/B divergence comparison on device.
  • No_std core + buffer I/O — no std::fs, no threads. ROM as &[u8], framebuffer as *const u32 (ARGB8888, 320×240), save state as Vec<u8>. Platforms own pacing/persistence/logging. Enables trivial WASM + deterministic execution (critical for trace-diffing).
  • Cycle-accurate scheduler via 7.68 GHz LCM base clock — LCM of all hardware clocks (48 MHz CPU, 24 MHz panel, 32.768 kHz RTC). Pure integer arithmetic, zero float drift. HALT fast-forward batches idle cycles to next event.

Hardest problems

eZ80 architectural surprises, each blocked boot:

  • IM2 ≡ IM1 — eZ80 ignores I register, always jumps 0x0038. Z80 docs wrong.
  • Dual stack pointers (SPS/SPL) — 16- vs 24-bit selected by L mode; mixed-mode CALL/RET pushed wrong widths.
  • Suffix opcodes atomic — 0x40/0x49/0x52/0x5B execute with following instruction in one step.
  • Undocumented OS Timer — 4th timer on 32.768 kHz crystal, no public docs; ROM hangs without it.
  • LD A,MB (ED 6E) — load memory base register, not in Z80 specs; used in first 10K cycles.
  • R register rotation(A<<1) | (A>>7), not simple increment.
  • Flash unlock detection — 16–17 byte magic sequence in fetch stream (DI; JR; DI; IM2; IM1; OUT0/IN0; BIT 2,A).
  • LCD DMA cycle stealing — track dma_last_mem_timestamp, retroactively steal cycles on CPU memory access. Adds ~13M of 168M boot cycles (7.7%, matching CEmu exactly).
  • Prefetch pipeline — single-byte buffer charged on fetch. Without it, cycle counts drop to ~50% of CEmu.
  • Timer interrupt delay pipeline — 2-cycle match → status → interrupt. Wrong = "graphing hang."

Numbers

  • 15,000 lines Rust core, ~38,000 total across platforms
  • 332 commits, 67 merged PRs, 80+ branches
  • 168,140,000 cycles to boot (cycle-for-cycle vs CEmu)
  • WASM binary: 148 KB uncompressed, 96 KB gzipped
  • Largest file: cpu/execute.rs 2,646 lines
  • 37+ main sessions, 472 subagent invocations, ~73 MB conversation data
  • 7-phase parity campaign surfaced ~150 CEmu discrepancies via 8-agent analysis

Key files

emu.rs (3,168) execution loop + frame rendering. cpu/execute.rs (2,646) all instruction dispatch. bus.rs (1,929) memory routing + flash unlock + debug ports. peripherals/lcd.rs (1,302) 5-state DMA engine. scheduler.rs (702). Debug CLI core/examples/debug.rs (~2,900) with boot/trace/fulltrace/screen/sendfile/bakerom modes.

Tradeoffs

  • Cycle-accurate from day one vs retrofit — firmware is interrupt-sensitive; off-by-cycles hangs boot. Retrofit would require mid-project core rewrite.
  • Tri-platform for a 0-user project — justified only because parity-driven development and cross-platform story are the point. Single-platform would be better as CEmu fork.
  • Manual serialization (8 STATE_VERSION bumps) vs Serde — precise byte control, smaller snapshots; maintenance burden (visible in "RAM Cleared" bugs). Web solved differently: full WASM linear memory snapshot (~29 MB → 4 ms copy).
  • Monolithic execute.rs nested match vs dispatch table — Rust match compiles to jump table anyway; structured x-y-z-p-q decomposition lets DD/FD/ED/CB prefix variants share ALU helpers.
  • Image-based keypad (49 regions from photograph, percentage coords, shared JSON) vs programmatic buttons — consistency + realism, harder to modify.

CE-Games Chess Engine (ce-games)

Chess engine for the TI-84 Plus CE (eZ80 @ 48 MHz, 256 KB RAM) achieving ~2083 Elo at master difficulty (27s/move); ~2700 Elo on desktop ARM64 at 0.1s/move. Built in 16 days (Feb 10–25, 2026); 111 commits, 15 PRs. ~14,000× desktop-to-target slowdown is itself the value — the eZ80's hostility (no branch predictor, no SIMD, 24-bit int, ~62 KB usable RAM) forces architectural choices modern CPUs hide.

Distinctive decisions

  • 0x88 board over bitboards — eZ80 lacks 64-bit arithmetic; every bitboard op is an expensive library call. 0x88 uses byte ops exclusively (if (square & 0x88) detects off-board in one cycle).
  • Engine/UI separation via engine.h — 8 C files (~4,000 LOC) + 1 hand-written eZ80 asm file (pick_best.asm, 60 lines, 2.8× faster than compiler). GUI in main.c (1,200 lines) never touches board representation.
  • Dual-target compilation — same source compiles for eZ80 (-Oz) and desktop GCC (-O2). Texel tuning and Stockfish tournaments run on desktop; cycle-accurate emulator validates actual target. Caught platform-divergent regressions (sentinel rays: +1.3% eZ80, −8.4% desktop).
  • Memory budget (~62 KB of 256 KB): TT 32 KB (4096 × 8 bytes, always-replace), move pool 10 KB (2048 entries, SoA shared via global stack ptr — avoids ~2 KB stack overhead per ply), pawn cache 5 KB (4-way set-assoc, 32 entries), Zobrist 3.2 KB, UI 2 KB.

Hardest problems

  • 24-bit integer: int = 3 bytes. Widen hot-path locals = win; widen struct fields = loss. int16_t in arrays is fine; widening history heuristic regressed +44.8%. Array stride 4 (shift) beats stride 3 (multiply), so Zobrist tables stay uint32_t.
  • Zobrist at 24 bits: bare 24-bit XOR inline asm ~24 cycles vs __ixor library ~45; but 24 bits alone collides 1-in-16M. Split into 24-bit hash + 16-bit lock = ~40 effective bits.
  • Flash cache hostility: -Oz beats -O2 by 41% (7,053 vs 10,005 cy/node). Flash has 10-cycle baseline + 197 on miss; -O2 code bloat destroys L1.
  • Pawn cache alignment: 128-byte pawn_atk[] MUST be static, not stack. Stack pushes IX displacement beyond signed 8-bit range, triggering multi-byte addressing; −14% across all eval.
  • PVS difficulty variance: engine hung queens because null-window PVS clips non-PV moves identically. Fix: widen PVS floor by variance. var=5 (sweet spot), var=10 (−147 Elo), var=15 (catastrophic).

Numbers

  • Texel tuning on 1M GM-level Lichess broadcast positions (ply 12–100, filtered quiet): +52 Elo, largest single gain. Adam descent, K=0.00652 sigmoid, L2. Subsequent rounds <1 Elo; abandoned.
  • Performance: positional middlegames 470K cy/node (38% eval), tactical 337K, endgame 133–272K. Avg ~200K cy/node = ~370 NPS on eZ80 vs ~5.1M NPS desktop.
  • Search: PVS, aspiration ±25 cp, null-move (R=2, depth ≥3), LMR (after 4+ moves, depth ≥3, reduce 1), futility (d≤2, 200/500 margins), check extensions (limited 2/path), quiescence (max depth 8, delta +1100).
  • Tournament Elo vs Stockfish calibrated: Easy 1320, Medium 1442, Hard 1712, Expert 1958, Master 2083 (100-game sample).

Key files

search.c ~800 (negamax/PVS/LMR/null-move/quiescence), board.c ~600 (0x88/piece lists/make-unmake), eval.c ~500 (HCE/pawn cache/tapered), movegen.c ~400, book.c ~350 (Polyglot/multi-AppVar), engine.c ~350, zobrist.c ~150, tt.c ~100, pick_best.asm 60. GUI main.c 1,200. Tuning texel_tune.py 32 KB.

Tradeoffs

  • -Oz is mandatory on flash-starved eZ80 — inverts modern wisdom. Cache misses cost 197 cycles; code bloat is fatal.
  • Platform-divergent optimization is real — same source, two Makefiles, emulator validation catches drift.
  • Pseudo-legal movegen + fast legality filter beats either extreme. Check/pin pre-computation elides ~50% of legality calls.
  • Eval and search are joint tradeoffs — cheap/noisy eval wants depth (alpha-beta); expensive/accurate wants selectivity (MCTS). NNUE shelved (32-neuron layer on no-SIMD 48 MHz = uneconomical).
  • Opening book essentially free — Polyglot entries accessed directly from flash AppVars via ti_GetDataPtr(); zero RAM cost. Book vs no-book Elo ~equal; value is early-game variety, not strength.

Anna's Archive MCP (annas-archive-mcp)

Self-hosted MCP server indexing Anna's Archive's 72M deduplicated documents into local PostgreSQL. Exposes search/download/read/stats tools to Claude over stdio and HTTP. Built in 4.5 days (Mar 30 – Apr 4, 2026); 46/48 commits Claude-co-authored.

Legal foundation: index metadata locally (robots.txt respected, not copyrighted), client provides their own AA membership key for downloads, server never stores or serves copyrighted bytes.

Architecture

BitTorrent → Rust ingestion → PostgreSQL FTS → TypeScript/Bun MCP. 150 GB of zstd JSONL from 50+ AA collections flows through 8 parallel Rust workers (46K rec/sec, ~1 hour total) into Postgres with 10 indexes: GIN on weighted search vectors (title=A, author=B, publisher=C), trigram on title/author, B-tree on DOI/ISBN/lang/year. Server runs stdio for local Claude Code + HTTP+SSE for claude.ai. Per-request McpServer instantiation (sessionIdGenerator: undefined) scopes client keys as capabilities — never stored server-side.

Hardest problems

  • Reconciliation, not dedup. Same MD5 appears across zlib3/upload/ia2/nexusstc/etc. Completeness scoring (non-null field count) determines which source wins per field — not last-writer, not priority lists.
  • Format detection by magic bytes. Source extension unreliable across 50+ collections. Reader sniffs first 128 bytes → pdftotext / Calibre ebook-convert / djvutxt / EPUB ZIP central-directory (distinguishes EPUB from DOCX).
  • AND → OR → trigram fallback chain. Each tier has different cost envelope (trigram expensive at 72M rows). Application-level short-circuit; each tier sanitizes input differently.
  • Tool descriptions as prompts. 4 iterations tuned against Claude failures. "Query Strategies" few-shots embedded (specific book → title+author, author's works → author alone, broad topic → query, non-English → original language).

Numbers

  • 150 GB raw → 80 GB Postgres index in ~1 hour
  • 46K records/sec (2.7× Python version)
  • ~150M raw records → 72M unique by MD5
  • 48 commits, 4.5 days; Day 1 was a 9-hour continuous sprint delivering full stack

Key files

reader.ts (325, text extraction + LRU + page splitting), api.ts (272, REST + auto OpenAPI 3.1.0), server.ts (257, MCP tool defs), db.ts (223, FTS + trigram + exact DOI/ISBN), ingest/main.rs (662, parallel workers + zstd streaming + completeness UPSERT).

Tradeoffs

  • Local index vs scraping — 150 GB + 80 GB storage + 1h upfront buys ms queries, respects robots.txt, kills AA-uptime dependency. Scraping on every search is ethically equivalent to iosifache/annas-mcp (rejected).
  • MD5 as global PK — one row per unique file across all sources. Reconciliation at UPSERT time (completeness-scored) instead of source prioritization.
  • Stateless per-request instantiation — fresh McpServer closing over client key. Structurally prevents key leakage; higher object churn.
  • Memory-mode extraction via /dev/shm — raw bytes stream through pdftotext/djvutxt via stdin into tmpfs, never touching disk. Only extracted text persists in bounded LRU.
  • Rootless deploy + sudoers allowlist — Docker group = root (mount escape). Remove annas-deploy from docker group; block ssh olares at Claude Code hook; permit only scoped docker compose commands via /etc/sudoers.d. No direct docker socket, no docker run, source tree read-only.
  • Temp tables per worker → bulk COPY → per-worker UPSERT — direct COPY fails on conflict; per-row INSERT drops to 17K rec/s; in-memory dedup can't scale to 150M.

Infrastructure

Docker Compose (Postgres 17, Bun server, Rust ingest, Cloudflare tunnel) on home Olares K3s box (96 GB / 24c / 6.9 TB). Exposed at https://aa-mcp.hunterchen.ca via named Cloudflare Tunnel (Tailscale failed — pod only forwards SSH). Rate limiting: 60 req/min/IP via in-memory Map. REST /api/* requires key; MCP search runs keyless (download needs key anyway).


Readr (readr)

Self-hosted cross-platform e-book reader targeting Supernote A5X e-ink tablet. Cloud sync, offline support, typed + handwritten annotations, TTS, web dashboard. pnpm/turbo monorepo. Built over 7 active days (Feb 23 – Apr 12, 2026): 2-day scaffold sprint, 43-day gap, then 5 days of hardware integration + polish. 222 commits, ~92% AI co-authored (289 MB session data), 119+ source files.

Architecture

React Native (Expo SDK 54) mobile with embedded WebViews (foliate-js for EPUB, pdf.js for PDF). Hono API (Node 22) backed by Postgres 16, Redis 7, MinIO S3-compatible. Monorepo: apps/mobile (58 TS files), apps/server (47 TS, 53 endpoints / 10 routes), packages/sync-engine (~150 LOC CRDT-lite), shared validators, Python FastAPI TTS worker (Chatterbox/Kokoro). Deployed on Olares via two-stack Docker Compose (infra + app split after Cloudflare cache-poisoning bug).

Hardest problem: kernel-level handwriting for Supernote

Supernote's first-party Atelier: ~20ms latency, no ghosting. React Native + Skia: ~400ms with severe EPD ghosting. Closed the gap by bypassing Android's render pipeline.

Required reverse-engineering via APK decompilation (JADX), binder service discovery, kernel source reading, custom Kotlin modules. Discovered:

  • service_myservice (vendor binder service)
  • exact Parcel protocol from decompiled HandWriteClient
  • EPD waveform constants from kernel headers (EPD_A2 vs EPD_FULL_GC16)
  • /dev/ebc is world-writable — third-party apps can access

Architecture: kernel draws strokes directly at ~20ms; app captures framebuffer on save; Skia only for previously-saved re-rendering. Two native modules (506 + 624 lines) expose interfaces via reflection (firmware-agnostic vs compile-time linking).

Second-hardest: page numbering

8+ consecutive commits. foliate-js page counts depend on font/viewport. Two-phase: stub using byte-based location.total until background measurement via hidden <foliate-view> iterating sections and polling renderer.pages until stable.

Distinctive decisions

  • WebView file access — Android inline-HTML WebView gets about:blank origin (blocks fetch/XHR to file://). Solution: write HTML to local file, load via uri, use XHR (legacy file:// support).
  • Bearer-token auth, no passwords — client generates random token, stored as user identity.
  • Content-addressable storage by SHA-256, multi-user dedup via refCounting.
  • Two-stack Compose — infra (Postgres/Redis/MinIO) stays live; app stack rebuilds on push.
  • Custom CRDT-lite (~150 LOC) — reading progress = LWW, annotations = permanent tombstones (no resurrection).
  • Offline-first optimistic auth — read token from SecureStore immediately, probe server in background.

Numbers

  • 14-table Postgres schema
  • 17 bundled Google Fonts
  • 108K-word offline dictionary (27 Metro-split JSON files, ~9 MB)
  • 53 API endpoints
  • Kernel handwriting: ~20ms vs Skia's ~400ms

Hunter Chessbot (hunter-chessbot)

Transfer-learning fine-tuning pipeline adapting pre-trained chess networks (Maia 1900, Maia 2200, Leela 11258) to play in a specific person's style by supervised learning on their game archive. Built in 7 days (Feb 3–10, 2026); 19 commits, ~9,700 lines Python, 6 trained models (2 production: Maia 1900 v1/v2; 4 experimental). Maia 2200 Hunter powers play-lc0.

Distinctive decisions

  • Maia over Leela Zero. Leela Zero is RL-trained for optimal play; fine-tuning it on human games fights the objective. Maia is supervised on millions of Lichess games at specific skill levels — natural fit for "plays like a person." Results confirmed: Maia 2200 achieved 63.28% top-1 policy accuracy; Leela 11258 only 53.13% (human-style bases resist being overridden).
  • Stop-gradient, not layer.trainable. Lambda layers inserting tf.stop_gradient(). Frozen layers still participate in forward pass + batch-norm statistics but don't receive gradient updates. Preserves base representations more faithfully than layer freezing.
  • Value head permanently frozen; policy head fine-tuned. Goal is which moves the player chooses, not re-assessing positions. Value loss still computed for shared-trunk gradient balance, but head doesn't update.
  • 1/32 down-sampling. Only 1 in 32 training positions used (~4,500 effective per epoch from ~1,800 games × ~80 moves). Prevents overfit on small personal corpus.
  • Color-separated data. Games split white/black, ChunkParser alternates. Board always shown from side-to-move perspective (180° rotated for Black), so balanced color exposure matters.

Hardest problems

  • Board encoding + policy indexing. 112-plane input (13 piece types × 8 history + 8 meta). Required correct perspective flipping for Black (board rotation, color swap, castling reorder). Output policy is a compressed 1,858-element vector, mapped via lc0_az_policy_map.py (56 queen + 8 knight + 9 underpromotion). Initial ONNX export bugs consumed significant debug time.
  • TF2 weight handling. Inherited TF2 infra from maia-individual. Multiple days: weight transposition on export (Conv2D [H,W,in,out] → LC0 [out,in,H,W]), disabling LC0 saves during training (crashed on partial backprop), loading lc0 weights directly into TF. Batch-norm: LC0 stores variance, TF outputs stddev (rescale required).
  • ONNX export without rebuilding in PyTorch. Pipeline exports both .pb.gz (LC0 engine) and .onnx (browser). export_onnx.py rebuilds architecture from scratch in Keras, restores checkpoints, converts via tf2onnx — workaround for TF's awkward native ONNX export.

Numbers

  • 6 trained models: Maia 1900 v1 (~50k steps), v2 (~100k); Leela 11258 at 25k/35k/50k; Maia 2200 at 20k
  • Top-1 accuracy: Maia 2200 63.28% at 18.5k; Leela 11258-35k 53.13%
  • Training data: ~1,800 games (blitz/rapid/classical), 90/10 train/val
  • Effective positions/epoch: ~4,500 (1/32 down-sampling)
  • Batch sizes: Maia 1900: 64; Maia 2200 / Leela: 128
  • Model sizes: Maia 1.2–1.7 MB; Leela 8.1 MB

Code breakdown

  • TF training backend (tfprocess.py, chunkparser.py, training_shared.py): ~1,600 LOC
  • Encoding (fen_to_vec.py, policy_index.py, lc0_az_policy_map.py): ~2,300 LOC
  • Export (export_model.py, export_onnx.py): ~800 LOC
  • Data prep + orchestration: ~300 LOC
  • Total: ~9,700; inherited ~5,600 from maia-individual fork + ~4,100 custom

Tradeoffs

  • Supervised imitation, not preference learning. Cross-entropy on played move. DPO (preference between played and legal alternatives) would model style in the 35% where top-1 plateaus — exactly where imitation fails. SL was path of least resistance with ~1,800 games.
  • Fork maia-individual vs clean PyTorch rewrite. Bought V4 chunk parser, SE-ResNet, lc0 weight format, policy-map matrix on day one. Cost: 3+ days debugging inherited TF2 bugs.
  • Top-1 accuracy as sole metric. Cheap + matches Maia paper, but penalizes model for choosing moves the player would also like. No head-to-head Elo, no KL divergence from empirical distribution.
  • No knowledge distillation from base-Maia policy. Kappe's BadGyal/GoodGyal precedent (q-ratio blend) would preserve strength while shifting style. Conscious deferral — KD needs cached base-model logits across dataset.
  • Web UI built then deleted. React + ONNX WASM UI built (PR #1), debugged (PR #2), deleted (Feb 6–7). play-lc0 superseded it and inherited the encoding/policy/ONNX lessons.

Play Lc0 (play-lc0)

Fully client-side web app for playing chess against Lc0 neural networks. All inference in-browser via ONNX Runtime Web (WebGPU, WASM fallback). Built in 19 days (Feb 5–23, 2026, 9 active); ~16,000 lines TS/TSX across 51 files, 36 commits / 11 PRs, 53 networks spanning ~800–2,900 Elo, tournament system (Swiss + round-robin).

Core thesis: "play against a personality, not just strength." Curates 53 networks across 6 model families (11258 distilled, Maia, Gyal, official Lc0, transformers, specialty). The catalog is the product. Fully client-side = static site + R2 bucket; a URL gives a 50+ network chess lab with no server ops.

Distinctive decisions

  • Web Worker isolation. NN inference + MCTS in dedicated worker; main thread handles UI, opening book, game state via chess.js. Typed message protocol. One worker per Lc0Engine, not a shared queue — enables parallel tournament inference.
  • Board encoding from scratch. Replicated lc0's exact [1,112,8,8] tensor + 1858-element policy index in TS rather than compiling lc0 to WASM. Trades debuggability for reimplementation risk — 6 encoding bugs caught on Feb 5 (policy ordering, promotion encoding, move flipping, history order, FEN init, halfmove divisor).
  • MCTS with PUCT (cPUCT=2.5). 0–800 nodes, 0–30s. 0-node (raw policy) is first-class because BT4's policy head alone is ~2,500–2,700 Elo. Temperature 0.15 default (not 0) — samples visits^(1/T) for "feels alive" experience.
  • Gzip + IndexedDB caching. Models as .onnx.bin gzipped (30–45% reduction), decompressed via DecompressionStream, cached in IndexedDB. Model hosting: bundled → Git LFS → Cloudflare R2 within one day (Pages 25 MB deploy limit).
  • LRU pool with Bélády's optimal eviction. Tournament pool of Lc0Engine instances evicted by next-use distance (pairings known in advance) instead of recency. Plain LRU would thrash.
  • Tournament runner in one 2,474-line hook (useTournamentRunner.ts). Swiss/round-robin pairings (Berger tables), concurrent scheduling via Promise.race, series reconciliation, FIDE performance rating, exponential backoff (1s–30s, 6 max). State in useRef for synchronous read-after-write.
  • No router. Five screens (home/game/tournament/share-loading/share-confirm) via imperative state machine in App.tsx. Share URLs are query payloads (?network=foo&fen=...), not symbolic routes.

Hardest problems

  • Six simultaneous encoding bugs (Feb 5). Correct WDL but nonsensical moves. Traced to reference mismatch; fixed using pre-generated 1,858-entry canonical table from hunter-chessbot repo.
  • ONNX bus error on Maia fine-tune. KERN_PROTECTION_FAILURE in lc0 v0.32.1's FloatOnnxWeightsAdapter::GetRawData() when model has training_params. Needed v0.21.0+ to work around.
  • useEffect anti-patterns (Feb 9). OpeningPicker/NetworkPicker oscillating selection (effects re-resolving before localStorage writes). Removed all 3 useEffects from OpeningPicker; replaced render-time resolution with conditional parent rendering.
  • Vite .gz interception. sirv treated .gz as pre-compressed. Renamed to .onnx.bin.

Numbers

  • 16K lines: ~1,900 for policy table, 2,474 for tournament runner, ~1,500 for UI, ~500 for MCTS/inference/encoding/decoding, ~800 for catalog
  • 53 networks: 15 distilled (11258), 11 Maia, 8 Gyal, 5 official Lc0, 5 transformers (T1/t3/T82/BT3/BT4), 4 specialty. Sizes: 1.1 MB (Tiny Gyal) → 707 MB (BT4)
  • 15K+ opening positions (full ECO database in trie)
  • Performance: ~80–100 nodes/sec on small nets, ~8–10 on large (unbatched, single-node)
  • First working app: 2 hours from initial commit. 53 networks converted: 1 day. Tournament mode: 1 day. MCTS: 1 day.

Tradeoffs

  • Lc0 over Stockfish. Lc0 gives 53 personalities via swappable weights; Stockfish is stronger but plays one way.
  • TypeScript MCTS vs lc0-to-WASM. TS = debuggability + chess.js access; correctness risk (every encoding bug is yours). Avoids lc0 build toolchain.
  • Replay chess.js along each MCTS path (O(N·d) replays). chess.js has no unmake; writing correct one (castling, ep, 50-move, 3-fold) would take a week. Chose perf ceiling (~100 nps) over correctness risk.
  • Temperature 0.15 default, not 0. 0 = deterministic/boring. 0.15 tuned for vibrancy without throwing games. PR #11 tested broader sampling; reverted.
  • No batched inference. Current MCTS is single-node unbatched. Phase 2: virtual-loss diversity + batched [B,112,8,8] for expected 5–8× throughput.

Cross-project patterns

  • Dual-target / dual-backend validation shows up in ti84ce (CEmu runtime-swappable) and ce-games (desktop vs eZ80 via same-source compilation). Both catch divergence impossible to see on one target.
  • Hand-coded canonical tables over programmatic generation — play-lc0's 1,858 policy index literal and hunter-chessbot's 1,858 UCI move ordering both ship the canonical artifact instead of recomputing; zero-chance-of-regression vs ~60 KB of source.
  • Cycle-accurate trace diffing as the correctness oracle (ti84ce) and Texel tuning + emulator-in-the-loop tournaments (ce-games) both treat CEmu/Stockfish as the executable spec rather than writing unit tests.
  • Tool descriptions as prompts (annas-archive-mcp) and tool descriptions = system prompts for the caller (both the AI MCP deep dive material and the actual engineering here) — iterate like prompts, not API docs.
  • Human-directed security hardening (annas-archive-mcp rootless deploy / sudoers allowlist, readr ebook-deploy rootless user) — architectural guardrails beat "trust the agent."