Appearance
Projects — Distilled
Single-file distillation of the six project deep-dives. Core facts, architecture, hardest problems, numbers, tradeoffs. No interview framing.
TI-84 CE Emulator (ti84ce)
Cycle-accurate Zilog eZ80 emulator built from scratch in Rust (~15,000 lines), native iOS/Android/Web frontends, instruction-by-instruction parity with CEmu. Shipped in 25 days (Jan 27 – Feb 20, 2026); ~80% AI co-authored (262/332 commits).
Full-system OS emulation of the TI-84 Plus CE: real eZ80 CPU, 13 hardware peripherals (LCD with DMA, timers, RTC, SPI, interrupt controller), 4MB flash ROM, 256KB RAM, all exposed through a narrow C ABI. Boots TI-OS 5.8.2 in 168,140,000 cycles (verified against CEmu). Dual-backend: Rust core swappable with CEmu C reference on every platform.
Distinctive decisions
- Dual-backend via stable C ABI — both backends export the same 15
extern "C"functions (emu.h, 52 lines). Android:dlopen()+RTLD_LOCAL. iOS: static linking with dual prefixing (rust_emu_*+cemu_*). Web: no C ABI seam — TypeScript factory instantiates separate WASM modules. Payoff: real-time A/B divergence comparison on device. - No_std core + buffer I/O — no
std::fs, no threads. ROM as&[u8], framebuffer as*const u32(ARGB8888, 320×240), save state asVec<u8>. Platforms own pacing/persistence/logging. Enables trivial WASM + deterministic execution (critical for trace-diffing). - Cycle-accurate scheduler via 7.68 GHz LCM base clock — LCM of all hardware clocks (48 MHz CPU, 24 MHz panel, 32.768 kHz RTC). Pure integer arithmetic, zero float drift. HALT fast-forward batches idle cycles to next event.
Hardest problems
eZ80 architectural surprises, each blocked boot:
- IM2 ≡ IM1 — eZ80 ignores I register, always jumps 0x0038. Z80 docs wrong.
- Dual stack pointers (SPS/SPL) — 16- vs 24-bit selected by L mode; mixed-mode CALL/RET pushed wrong widths.
- Suffix opcodes atomic — 0x40/0x49/0x52/0x5B execute with following instruction in one step.
- Undocumented OS Timer — 4th timer on 32.768 kHz crystal, no public docs; ROM hangs without it.
LD A,MB(ED 6E) — load memory base register, not in Z80 specs; used in first 10K cycles.- R register rotation —
(A<<1) | (A>>7), not simple increment. - Flash unlock detection — 16–17 byte magic sequence in fetch stream (DI; JR; DI; IM2; IM1; OUT0/IN0; BIT 2,A).
- LCD DMA cycle stealing — track
dma_last_mem_timestamp, retroactively steal cycles on CPU memory access. Adds ~13M of 168M boot cycles (7.7%, matching CEmu exactly). - Prefetch pipeline — single-byte buffer charged on fetch. Without it, cycle counts drop to ~50% of CEmu.
- Timer interrupt delay pipeline — 2-cycle match → status → interrupt. Wrong = "graphing hang."
Numbers
- 15,000 lines Rust core, ~38,000 total across platforms
- 332 commits, 67 merged PRs, 80+ branches
- 168,140,000 cycles to boot (cycle-for-cycle vs CEmu)
- WASM binary: 148 KB uncompressed, 96 KB gzipped
- Largest file:
cpu/execute.rs2,646 lines - 37+ main sessions, 472 subagent invocations, ~73 MB conversation data
- 7-phase parity campaign surfaced ~150 CEmu discrepancies via 8-agent analysis
Key files
emu.rs (3,168) execution loop + frame rendering. cpu/execute.rs (2,646) all instruction dispatch. bus.rs (1,929) memory routing + flash unlock + debug ports. peripherals/lcd.rs (1,302) 5-state DMA engine. scheduler.rs (702). Debug CLI core/examples/debug.rs (~2,900) with boot/trace/fulltrace/screen/sendfile/bakerom modes.
Tradeoffs
- Cycle-accurate from day one vs retrofit — firmware is interrupt-sensitive; off-by-cycles hangs boot. Retrofit would require mid-project core rewrite.
- Tri-platform for a 0-user project — justified only because parity-driven development and cross-platform story are the point. Single-platform would be better as CEmu fork.
- Manual serialization (8 STATE_VERSION bumps) vs Serde — precise byte control, smaller snapshots; maintenance burden (visible in "RAM Cleared" bugs). Web solved differently: full WASM linear memory snapshot (~29 MB → 4 ms copy).
- Monolithic
execute.rsnested match vs dispatch table — Rust match compiles to jump table anyway; structured x-y-z-p-q decomposition lets DD/FD/ED/CB prefix variants share ALU helpers. - Image-based keypad (49 regions from photograph, percentage coords, shared JSON) vs programmatic buttons — consistency + realism, harder to modify.
CE-Games Chess Engine (ce-games)
Chess engine for the TI-84 Plus CE (eZ80 @ 48 MHz, 256 KB RAM) achieving ~2083 Elo at master difficulty (27s/move); ~2700 Elo on desktop ARM64 at 0.1s/move. Built in 16 days (Feb 10–25, 2026); 111 commits, 15 PRs. ~14,000× desktop-to-target slowdown is itself the value — the eZ80's hostility (no branch predictor, no SIMD, 24-bit int, ~62 KB usable RAM) forces architectural choices modern CPUs hide.
Distinctive decisions
- 0x88 board over bitboards — eZ80 lacks 64-bit arithmetic; every bitboard op is an expensive library call. 0x88 uses byte ops exclusively (
if (square & 0x88)detects off-board in one cycle). - Engine/UI separation via
engine.h— 8 C files (~4,000 LOC) + 1 hand-written eZ80 asm file (pick_best.asm, 60 lines, 2.8× faster than compiler). GUI inmain.c(1,200 lines) never touches board representation. - Dual-target compilation — same source compiles for eZ80 (
-Oz) and desktop GCC (-O2). Texel tuning and Stockfish tournaments run on desktop; cycle-accurate emulator validates actual target. Caught platform-divergent regressions (sentinel rays: +1.3% eZ80, −8.4% desktop). - Memory budget (~62 KB of 256 KB): TT 32 KB (4096 × 8 bytes, always-replace), move pool 10 KB (2048 entries, SoA shared via global stack ptr — avoids ~2 KB stack overhead per ply), pawn cache 5 KB (4-way set-assoc, 32 entries), Zobrist 3.2 KB, UI 2 KB.
Hardest problems
- 24-bit integer:
int= 3 bytes. Widen hot-path locals = win; widen struct fields = loss.int16_tin arrays is fine; widening history heuristic regressed +44.8%. Array stride 4 (shift) beats stride 3 (multiply), so Zobrist tables stayuint32_t. - Zobrist at 24 bits: bare 24-bit XOR inline asm ~24 cycles vs
__ixorlibrary ~45; but 24 bits alone collides 1-in-16M. Split into 24-bit hash + 16-bit lock = ~40 effective bits. - Flash cache hostility:
-Ozbeats-O2by 41% (7,053 vs 10,005 cy/node). Flash has 10-cycle baseline + 197 on miss;-O2code bloat destroys L1. - Pawn cache alignment: 128-byte
pawn_atk[]MUST be static, not stack. Stack pushes IX displacement beyond signed 8-bit range, triggering multi-byte addressing; −14% across all eval. - PVS difficulty variance: engine hung queens because null-window PVS clips non-PV moves identically. Fix: widen PVS floor by variance. var=5 (sweet spot), var=10 (−147 Elo), var=15 (catastrophic).
Numbers
- Texel tuning on 1M GM-level Lichess broadcast positions (ply 12–100, filtered quiet): +52 Elo, largest single gain. Adam descent, K=0.00652 sigmoid, L2. Subsequent rounds <1 Elo; abandoned.
- Performance: positional middlegames 470K cy/node (38% eval), tactical 337K, endgame 133–272K. Avg ~200K cy/node = ~370 NPS on eZ80 vs ~5.1M NPS desktop.
- Search: PVS, aspiration ±25 cp, null-move (R=2, depth ≥3), LMR (after 4+ moves, depth ≥3, reduce 1), futility (d≤2, 200/500 margins), check extensions (limited 2/path), quiescence (max depth 8, delta +1100).
- Tournament Elo vs Stockfish calibrated: Easy 1320, Medium 1442, Hard 1712, Expert 1958, Master 2083 (100-game sample).
Key files
search.c ~800 (negamax/PVS/LMR/null-move/quiescence), board.c ~600 (0x88/piece lists/make-unmake), eval.c ~500 (HCE/pawn cache/tapered), movegen.c ~400, book.c ~350 (Polyglot/multi-AppVar), engine.c ~350, zobrist.c ~150, tt.c ~100, pick_best.asm 60. GUI main.c 1,200. Tuning texel_tune.py 32 KB.
Tradeoffs
-Ozis mandatory on flash-starved eZ80 — inverts modern wisdom. Cache misses cost 197 cycles; code bloat is fatal.- Platform-divergent optimization is real — same source, two Makefiles, emulator validation catches drift.
- Pseudo-legal movegen + fast legality filter beats either extreme. Check/pin pre-computation elides ~50% of legality calls.
- Eval and search are joint tradeoffs — cheap/noisy eval wants depth (alpha-beta); expensive/accurate wants selectivity (MCTS). NNUE shelved (32-neuron layer on no-SIMD 48 MHz = uneconomical).
- Opening book essentially free — Polyglot entries accessed directly from flash AppVars via
ti_GetDataPtr(); zero RAM cost. Book vs no-book Elo ~equal; value is early-game variety, not strength.
Anna's Archive MCP (annas-archive-mcp)
Self-hosted MCP server indexing Anna's Archive's 72M deduplicated documents into local PostgreSQL. Exposes search/download/read/stats tools to Claude over stdio and HTTP. Built in 4.5 days (Mar 30 – Apr 4, 2026); 46/48 commits Claude-co-authored.
Legal foundation: index metadata locally (robots.txt respected, not copyrighted), client provides their own AA membership key for downloads, server never stores or serves copyrighted bytes.
Architecture
BitTorrent → Rust ingestion → PostgreSQL FTS → TypeScript/Bun MCP. 150 GB of zstd JSONL from 50+ AA collections flows through 8 parallel Rust workers (46K rec/sec, ~1 hour total) into Postgres with 10 indexes: GIN on weighted search vectors (title=A, author=B, publisher=C), trigram on title/author, B-tree on DOI/ISBN/lang/year. Server runs stdio for local Claude Code + HTTP+SSE for claude.ai. Per-request McpServer instantiation (sessionIdGenerator: undefined) scopes client keys as capabilities — never stored server-side.
Hardest problems
- Reconciliation, not dedup. Same MD5 appears across zlib3/upload/ia2/nexusstc/etc. Completeness scoring (non-null field count) determines which source wins per field — not last-writer, not priority lists.
- Format detection by magic bytes. Source
extensionunreliable across 50+ collections. Reader sniffs first 128 bytes →pdftotext/ Calibreebook-convert/djvutxt/ EPUB ZIP central-directory (distinguishes EPUB from DOCX). - AND → OR → trigram fallback chain. Each tier has different cost envelope (trigram expensive at 72M rows). Application-level short-circuit; each tier sanitizes input differently.
- Tool descriptions as prompts. 4 iterations tuned against Claude failures. "Query Strategies" few-shots embedded (specific book → title+author, author's works → author alone, broad topic → query, non-English → original language).
Numbers
- 150 GB raw → 80 GB Postgres index in ~1 hour
- 46K records/sec (2.7× Python version)
- ~150M raw records → 72M unique by MD5
- 48 commits, 4.5 days; Day 1 was a 9-hour continuous sprint delivering full stack
Key files
reader.ts (325, text extraction + LRU + page splitting), api.ts (272, REST + auto OpenAPI 3.1.0), server.ts (257, MCP tool defs), db.ts (223, FTS + trigram + exact DOI/ISBN), ingest/main.rs (662, parallel workers + zstd streaming + completeness UPSERT).
Tradeoffs
- Local index vs scraping — 150 GB + 80 GB storage + 1h upfront buys ms queries, respects robots.txt, kills AA-uptime dependency. Scraping on every search is ethically equivalent to iosifache/annas-mcp (rejected).
- MD5 as global PK — one row per unique file across all sources. Reconciliation at UPSERT time (completeness-scored) instead of source prioritization.
- Stateless per-request instantiation — fresh McpServer closing over client key. Structurally prevents key leakage; higher object churn.
- Memory-mode extraction via
/dev/shm— raw bytes stream throughpdftotext/djvutxtvia stdin into tmpfs, never touching disk. Only extracted text persists in bounded LRU. - Rootless deploy + sudoers allowlist — Docker group = root (mount escape). Remove
annas-deployfrom docker group; blockssh olaresat Claude Code hook; permit only scopeddocker composecommands via/etc/sudoers.d. No direct docker socket, nodocker run, source tree read-only. - Temp tables per worker → bulk COPY → per-worker UPSERT — direct COPY fails on conflict; per-row INSERT drops to 17K rec/s; in-memory dedup can't scale to 150M.
Infrastructure
Docker Compose (Postgres 17, Bun server, Rust ingest, Cloudflare tunnel) on home Olares K3s box (96 GB / 24c / 6.9 TB). Exposed at https://aa-mcp.hunterchen.ca via named Cloudflare Tunnel (Tailscale failed — pod only forwards SSH). Rate limiting: 60 req/min/IP via in-memory Map. REST /api/* requires key; MCP search runs keyless (download needs key anyway).
Readr (readr)
Self-hosted cross-platform e-book reader targeting Supernote A5X e-ink tablet. Cloud sync, offline support, typed + handwritten annotations, TTS, web dashboard. pnpm/turbo monorepo. Built over 7 active days (Feb 23 – Apr 12, 2026): 2-day scaffold sprint, 43-day gap, then 5 days of hardware integration + polish. 222 commits, ~92% AI co-authored (289 MB session data), 119+ source files.
Architecture
React Native (Expo SDK 54) mobile with embedded WebViews (foliate-js for EPUB, pdf.js for PDF). Hono API (Node 22) backed by Postgres 16, Redis 7, MinIO S3-compatible. Monorepo: apps/mobile (58 TS files), apps/server (47 TS, 53 endpoints / 10 routes), packages/sync-engine (~150 LOC CRDT-lite), shared validators, Python FastAPI TTS worker (Chatterbox/Kokoro). Deployed on Olares via two-stack Docker Compose (infra + app split after Cloudflare cache-poisoning bug).
Hardest problem: kernel-level handwriting for Supernote
Supernote's first-party Atelier: ~20ms latency, no ghosting. React Native + Skia: ~400ms with severe EPD ghosting. Closed the gap by bypassing Android's render pipeline.
Required reverse-engineering via APK decompilation (JADX), binder service discovery, kernel source reading, custom Kotlin modules. Discovered:
service_myservice(vendor binder service)- exact Parcel protocol from decompiled
HandWriteClient - EPD waveform constants from kernel headers (
EPD_A2vsEPD_FULL_GC16) /dev/ebcis world-writable — third-party apps can access
Architecture: kernel draws strokes directly at ~20ms; app captures framebuffer on save; Skia only for previously-saved re-rendering. Two native modules (506 + 624 lines) expose interfaces via reflection (firmware-agnostic vs compile-time linking).
Second-hardest: page numbering
8+ consecutive commits. foliate-js page counts depend on font/viewport. Two-phase: stub using byte-based location.total until background measurement via hidden <foliate-view> iterating sections and polling renderer.pages until stable.
Distinctive decisions
- WebView file access — Android inline-HTML WebView gets
about:blankorigin (blocks fetch/XHR tofile://). Solution: write HTML to local file, load viauri, use XHR (legacyfile://support). - Bearer-token auth, no passwords — client generates random token, stored as user identity.
- Content-addressable storage by SHA-256, multi-user dedup via refCounting.
- Two-stack Compose — infra (Postgres/Redis/MinIO) stays live; app stack rebuilds on push.
- Custom CRDT-lite (~150 LOC) — reading progress = LWW, annotations = permanent tombstones (no resurrection).
- Offline-first optimistic auth — read token from SecureStore immediately, probe server in background.
Numbers
- 14-table Postgres schema
- 17 bundled Google Fonts
- 108K-word offline dictionary (27 Metro-split JSON files, ~9 MB)
- 53 API endpoints
- Kernel handwriting: ~20ms vs Skia's ~400ms
Hunter Chessbot (hunter-chessbot)
Transfer-learning fine-tuning pipeline adapting pre-trained chess networks (Maia 1900, Maia 2200, Leela 11258) to play in a specific person's style by supervised learning on their game archive. Built in 7 days (Feb 3–10, 2026); 19 commits, ~9,700 lines Python, 6 trained models (2 production: Maia 1900 v1/v2; 4 experimental). Maia 2200 Hunter powers play-lc0.
Distinctive decisions
- Maia over Leela Zero. Leela Zero is RL-trained for optimal play; fine-tuning it on human games fights the objective. Maia is supervised on millions of Lichess games at specific skill levels — natural fit for "plays like a person." Results confirmed: Maia 2200 achieved 63.28% top-1 policy accuracy; Leela 11258 only 53.13% (human-style bases resist being overridden).
- Stop-gradient, not
layer.trainable. Lambda layers insertingtf.stop_gradient(). Frozen layers still participate in forward pass + batch-norm statistics but don't receive gradient updates. Preserves base representations more faithfully than layer freezing. - Value head permanently frozen; policy head fine-tuned. Goal is which moves the player chooses, not re-assessing positions. Value loss still computed for shared-trunk gradient balance, but head doesn't update.
- 1/32 down-sampling. Only 1 in 32 training positions used (~4,500 effective per epoch from ~1,800 games × ~80 moves). Prevents overfit on small personal corpus.
- Color-separated data. Games split white/black,
ChunkParseralternates. Board always shown from side-to-move perspective (180° rotated for Black), so balanced color exposure matters.
Hardest problems
- Board encoding + policy indexing. 112-plane input (13 piece types × 8 history + 8 meta). Required correct perspective flipping for Black (board rotation, color swap, castling reorder). Output policy is a compressed 1,858-element vector, mapped via
lc0_az_policy_map.py(56 queen + 8 knight + 9 underpromotion). Initial ONNX export bugs consumed significant debug time. - TF2 weight handling. Inherited TF2 infra from maia-individual. Multiple days: weight transposition on export (Conv2D
[H,W,in,out]→ LC0[out,in,H,W]), disabling LC0 saves during training (crashed on partial backprop), loading lc0 weights directly into TF. Batch-norm: LC0 stores variance, TF outputs stddev (rescale required). - ONNX export without rebuilding in PyTorch. Pipeline exports both
.pb.gz(LC0 engine) and.onnx(browser).export_onnx.pyrebuilds architecture from scratch in Keras, restores checkpoints, converts viatf2onnx— workaround for TF's awkward native ONNX export.
Numbers
- 6 trained models: Maia 1900 v1 (~50k steps), v2 (~100k); Leela 11258 at 25k/35k/50k; Maia 2200 at 20k
- Top-1 accuracy: Maia 2200 63.28% at 18.5k; Leela 11258-35k 53.13%
- Training data: ~1,800 games (blitz/rapid/classical), 90/10 train/val
- Effective positions/epoch: ~4,500 (1/32 down-sampling)
- Batch sizes: Maia 1900: 64; Maia 2200 / Leela: 128
- Model sizes: Maia 1.2–1.7 MB; Leela 8.1 MB
Code breakdown
- TF training backend (
tfprocess.py,chunkparser.py,training_shared.py): ~1,600 LOC - Encoding (
fen_to_vec.py,policy_index.py,lc0_az_policy_map.py): ~2,300 LOC - Export (
export_model.py,export_onnx.py): ~800 LOC - Data prep + orchestration: ~300 LOC
- Total: ~9,700; inherited ~5,600 from maia-individual fork + ~4,100 custom
Tradeoffs
- Supervised imitation, not preference learning. Cross-entropy on played move. DPO (preference between played and legal alternatives) would model style in the 35% where top-1 plateaus — exactly where imitation fails. SL was path of least resistance with ~1,800 games.
- Fork maia-individual vs clean PyTorch rewrite. Bought V4 chunk parser, SE-ResNet, lc0 weight format, policy-map matrix on day one. Cost: 3+ days debugging inherited TF2 bugs.
- Top-1 accuracy as sole metric. Cheap + matches Maia paper, but penalizes model for choosing moves the player would also like. No head-to-head Elo, no KL divergence from empirical distribution.
- No knowledge distillation from base-Maia policy. Kappe's BadGyal/GoodGyal precedent (q-ratio blend) would preserve strength while shifting style. Conscious deferral — KD needs cached base-model logits across dataset.
- Web UI built then deleted. React + ONNX WASM UI built (PR #1), debugged (PR #2), deleted (Feb 6–7). play-lc0 superseded it and inherited the encoding/policy/ONNX lessons.
Play Lc0 (play-lc0)
Fully client-side web app for playing chess against Lc0 neural networks. All inference in-browser via ONNX Runtime Web (WebGPU, WASM fallback). Built in 19 days (Feb 5–23, 2026, 9 active); ~16,000 lines TS/TSX across 51 files, 36 commits / 11 PRs, 53 networks spanning ~800–2,900 Elo, tournament system (Swiss + round-robin).
Core thesis: "play against a personality, not just strength." Curates 53 networks across 6 model families (11258 distilled, Maia, Gyal, official Lc0, transformers, specialty). The catalog is the product. Fully client-side = static site + R2 bucket; a URL gives a 50+ network chess lab with no server ops.
Distinctive decisions
- Web Worker isolation. NN inference + MCTS in dedicated worker; main thread handles UI, opening book, game state via chess.js. Typed message protocol. One worker per
Lc0Engine, not a shared queue — enables parallel tournament inference. - Board encoding from scratch. Replicated lc0's exact
[1,112,8,8]tensor + 1858-element policy index in TS rather than compiling lc0 to WASM. Trades debuggability for reimplementation risk — 6 encoding bugs caught on Feb 5 (policy ordering, promotion encoding, move flipping, history order, FEN init, halfmove divisor). - MCTS with PUCT (cPUCT=2.5). 0–800 nodes, 0–30s. 0-node (raw policy) is first-class because BT4's policy head alone is ~2,500–2,700 Elo. Temperature 0.15 default (not 0) — samples visits^(1/T) for "feels alive" experience.
- Gzip + IndexedDB caching. Models as
.onnx.bingzipped (30–45% reduction), decompressed viaDecompressionStream, cached in IndexedDB. Model hosting: bundled → Git LFS → Cloudflare R2 within one day (Pages 25 MB deploy limit). - LRU pool with Bélády's optimal eviction. Tournament pool of
Lc0Engineinstances evicted by next-use distance (pairings known in advance) instead of recency. Plain LRU would thrash. - Tournament runner in one 2,474-line hook (
useTournamentRunner.ts). Swiss/round-robin pairings (Berger tables), concurrent scheduling viaPromise.race, series reconciliation, FIDE performance rating, exponential backoff (1s–30s, 6 max). State inuseReffor synchronous read-after-write. - No router. Five screens (home/game/tournament/share-loading/share-confirm) via imperative state machine in
App.tsx. Share URLs are query payloads (?network=foo&fen=...), not symbolic routes.
Hardest problems
- Six simultaneous encoding bugs (Feb 5). Correct WDL but nonsensical moves. Traced to reference mismatch; fixed using pre-generated 1,858-entry canonical table from hunter-chessbot repo.
- ONNX bus error on Maia fine-tune.
KERN_PROTECTION_FAILUREin lc0 v0.32.1'sFloatOnnxWeightsAdapter::GetRawData()when model hastraining_params. Needed v0.21.0+ to work around. - useEffect anti-patterns (Feb 9). OpeningPicker/NetworkPicker oscillating selection (effects re-resolving before localStorage writes). Removed all 3 useEffects from OpeningPicker; replaced render-time resolution with conditional parent rendering.
- Vite
.gzinterception. sirv treated.gzas pre-compressed. Renamed to.onnx.bin.
Numbers
- 16K lines: ~1,900 for policy table, 2,474 for tournament runner, ~1,500 for UI, ~500 for MCTS/inference/encoding/decoding, ~800 for catalog
- 53 networks: 15 distilled (11258), 11 Maia, 8 Gyal, 5 official Lc0, 5 transformers (T1/t3/T82/BT3/BT4), 4 specialty. Sizes: 1.1 MB (Tiny Gyal) → 707 MB (BT4)
- 15K+ opening positions (full ECO database in trie)
- Performance: ~80–100 nodes/sec on small nets, ~8–10 on large (unbatched, single-node)
- First working app: 2 hours from initial commit. 53 networks converted: 1 day. Tournament mode: 1 day. MCTS: 1 day.
Tradeoffs
- Lc0 over Stockfish. Lc0 gives 53 personalities via swappable weights; Stockfish is stronger but plays one way.
- TypeScript MCTS vs lc0-to-WASM. TS = debuggability + chess.js access; correctness risk (every encoding bug is yours). Avoids lc0 build toolchain.
- Replay chess.js along each MCTS path (
O(N·d)replays). chess.js has no unmake; writing correct one (castling, ep, 50-move, 3-fold) would take a week. Chose perf ceiling (~100 nps) over correctness risk. - Temperature 0.15 default, not 0. 0 = deterministic/boring. 0.15 tuned for vibrancy without throwing games. PR #11 tested broader sampling; reverted.
- No batched inference. Current MCTS is single-node unbatched. Phase 2: virtual-loss diversity + batched
[B,112,8,8]for expected 5–8× throughput.
Cross-project patterns
- Dual-target / dual-backend validation shows up in ti84ce (CEmu runtime-swappable) and ce-games (desktop vs eZ80 via same-source compilation). Both catch divergence impossible to see on one target.
- Hand-coded canonical tables over programmatic generation — play-lc0's 1,858 policy index literal and hunter-chessbot's 1,858 UCI move ordering both ship the canonical artifact instead of recomputing; zero-chance-of-regression vs ~60 KB of source.
- Cycle-accurate trace diffing as the correctness oracle (ti84ce) and Texel tuning + emulator-in-the-loop tournaments (ce-games) both treat CEmu/Stockfish as the executable spec rather than writing unit tests.
- Tool descriptions as prompts (annas-archive-mcp) and tool descriptions = system prompts for the caller (both the AI MCP deep dive material and the actual engineering here) — iterate like prompts, not API docs.
- Human-directed security hardening (annas-archive-mcp rootless deploy / sudoers allowlist, readr
ebook-deployrootless user) — architectural guardrails beat "trust the agent."