Thorin

Enter password to continue

Skip to content

TI-84 Plus CE Emulator — Deep Technical Profile

Table of Contents

  1. Project Overview
  2. Architecture
  3. The Rust Emulator Core
  4. Cross-Platform Frontends
  5. Technical Tradeoffs & Decisions
  6. The CEmu Parity Campaign
  7. AI Agent Orchestration for Testing & Debugging
  8. Claude Code Usage Statistics
  9. Major Bugs & Debugging Stories
  10. Running Real Software: DOOM & Chess
  11. Development Timeline & Velocity
  12. PR & Commit History Analysis
  13. Key Files Reference

1. Project Overview

A cycle-accurate TI-84 Plus CE graphing calculator emulator built from scratch in Rust, with native frontends for Android (Kotlin/Jetpack Compose), iOS (Swift/SwiftUI), and Web (React/TypeScript/WASM). The emulator faithfully reproduces the Zilog eZ80 processor, all 13 hardware peripherals, and the TI-OS operating system — achieving instruction-level behavioral parity with CEmu, the established open-source reference emulator.

By the numbers:

  • ~15,000 lines of Rust core, ~38,000 total across all platforms
  • 332 commits across 80+ branches, 67 merged PRs
  • Built in 25 days (January 27 – February 20, 2026)
  • 168,140,000 cycles to boot TI-OS (verified against CEmu)
  • 277/455 unit tests passing (178 failures from pre-existing prefetch initialization, not regressions)
  • WASM binary: 148KB uncompressed, 96KB gzipped
  • ~80% AI co-authored (262/332 commits have Claude co-author attribution)

2. Architecture

2.1 Dual-Backend Design

The project's most distinctive architectural decision is its dual-backend system. Both the custom Rust emulator and the CEmu C reference emulator implement the same C ABI contract (core/include/emu.h), allowing any frontend to use either backend without code changes:

┌────────────────────────────────────────────────┐
│            Platform UI Layer                    │
│   (Android Compose, iOS SwiftUI, Web React)     │
└──────────────────┬─────────────────────────────┘
                   │ C ABI (emu.h)
┌──────────────────▼─────────────────────────────┐
│        FFI Bindings (JNI / Swift / WASM)         │
└────┬───────────────────────────────┬────────────┘
     │                               │
┌────▼──────────────┐       ┌────────▼─────────────┐
│   Rust Core        │       │   CEmu Adapter        │
│  (libemu_core.a)   │       │  (libcemu_adapter.a)  │
│  (emu_core.wasm)   │       │  (cemu.wasm)          │
└────────────────────┘       └──────────────────────┘

The C ABI contract (emu.h, 52 lines) defines 15 functions:

  • Lifecycle: emu_create(), emu_destroy()
  • ROM: emu_load_rom(), emu_send_file()
  • Execution: emu_reset(), emu_power_on(), emu_run_cycles()
  • Display: emu_framebuffer() (ARGB8888, 320×240), emu_is_lcd_on()
  • Input: emu_set_key(row, col, down) (8×7 matrix)
  • State: emu_save_state_size(), emu_save_state(), emu_load_state()
  • Misc: emu_get_backlight(), emu_set_log_callback()

Backend switching per platform:

  • Android: Dynamic loading via dlopen()/dlsym() in jni_loader.cpp. A BackendInterface struct holds 17 function pointers. Backends are separate .so libraries (libemu_rust.so, libemu_cemu.so) loaded at runtime.
  • iOS: Static linking with symbol prefixing. When built with ios_prefixed Cargo feature, Rust exports use rust_emu_* names. backend_bridge.c dispatches through a BackendInterface pointer.
  • Web: TypeScript class polymorphism with a createBackend(type) factory function. RustBackend wraps wasm-bindgen, CEmuBackend wraps Emscripten.

Why dual backends? The primary motivation was parity-driven development. Having CEmu as a runtime-swappable reference meant:

  1. Any behavior difference could be observed in real-time on the same device
  2. The Rust implementation could be validated against CEmu at every stage
  3. Users had a working fallback while the Rust core was incomplete
  4. A/B comparison screenshots could be generated instantly

The tradeoff: doubled integration surface area, two build systems (Cargo + CMake/Emscripten), and symbol collision management. On iOS, this required the ios_prefixed feature flag to avoid linker errors when both backends are statically linked.

2.2 Core Module Architecture

The Rust emulator core (core/src/, ~15,000 lines) is organized into 5 main modules and 13 peripheral modules:

core/src/
├── lib.rs          (432 lines)  — C ABI exports, SyncEmu thread-safe wrapper
├── emu.rs          (3168 lines) — Emulator orchestrator, execution loop, frame rendering
├── bus.rs          (1929 lines) — Memory bus, address decoding, flash unlock, debug ports
├── memory.rs       (587 lines)  — Flash (4MB), RAM (256KB+VRAM), NOR flash commands
├── scheduler.rs    (702 lines)  — Event scheduler, 7.68 GHz base clock, 9 event types
├── disasm.rs       (1544 lines) — Full eZ80 disassembler
├── ti_file.rs      (379 lines)  — .8xp/.8xv TI file format parser
├── wasm.rs         (325 lines)  — WASM FFI bindings
├── cpu/
│   ├── mod.rs      (653 lines)  — CPU state, step(), interrupt handling
│   ├── execute.rs  (2646 lines) — All instruction execution (largest file)
│   ├── helpers.rs  (858 lines)  — ALU ops, register access, fetch/prefetch
│   ├── flags.rs    (25 lines)   — Flag bit constants (C, N, PV, H, Z, S, F3, F5)
│   └── tests/      (5420 lines) — instructions, modes, parity tests
└── peripherals/
    ├── mod.rs      (599 lines)  — Port routing, tick orchestration, state persistence
    ├── control.rs  (838 lines)  — CPU speed, battery FSM, flash unlock, memory protection
    ├── lcd.rs      (1302 lines) — LCD controller, 5-state DMA engine, palette
    ├── timer.rs    (671 lines)  — 3× GPT with 2-cycle interrupt delay pipeline
    ├── rtc.rs      (674 lines)  — Real-time clock, 3-state machine, 6 interrupt types
    ├── keypad.rs   (899 lines)  — 8×7 matrix, scan modes, edge detection
    ├── spi.rs      (627 lines)  — SPI controller, 16-deep FIFO, panel stub
    ├── interrupt.rs(448 lines)  — 2-bank interrupt controller with inversion/latching
    ├── flash.rs    (576 lines)  — Flash controller registers, wait state management
    ├── sha256.rs   (307 lines)  — SHA-256 block compression (64-round)
    ├── panel.rs    (223 lines)  — ST7789V LCD panel stub (SPI target)
    ├── backlight.rs(72 lines)   — PWM backlight brightness
    └── watchdog.rs (248 lines)  — Watchdog timer stub

2.3 Design Principles

  1. No platform dependencies in the core. The Rust core has no std::fs, no std::io, no std::net. It doesn't know about files, logging, threading, or what platform it's running on. Everything flows through byte buffers via the C ABI. The only external crates are wasm-bindgen/js-sys/web-sys (optional, gated behind the wasm feature flag), and chrono (dev-only, for tests).

  2. Stable C ABI. All exports use extern "C" with #[no_mangle]. The SyncEmu wrapper wraps Emu in a Mutex<Emu> for thread safety. Raw pointers are used at the FFI boundary, with Box::into_raw / Box::from_raw for lifecycle management.

  3. Single-threaded deterministic core. The emulator is purely deterministic — given the same ROM and inputs, it produces the same outputs every time. Threading is the platform's responsibility.

  4. Buffer-based I/O. ROM loaded as &[u8], framebuffer exposed as *const u32 (ARGB8888), save states serialized to Vec<u8>. No file handles cross the FFI boundary.


3. The Rust Emulator Core

3.1 CPU: The eZ80 Processor

The TI-84 Plus CE uses a Zilog eZ80 processor — an extended Z80 with 24-bit addressing (ADL mode). This is NOT a standard Z80; it has critical differences that are poorly documented and caused numerous bugs during development.

CPU State (Cpu struct, cpu/mod.rs):

  • Main registers: a: u8, f: u8, bc/de/hl: u32 (24-bit)
  • Shadow registers: a_prime, f_prime, bc_prime, de_prime, hl_prime
  • Index registers: ix, iy (24-bit)
  • Dual stack pointers: sps (Z80 16-bit), spl (ADL 24-bit) — selected by L mode flag
  • Special: pc: u32, i: u16, r: u8, mbase: u8
  • State flags: iff1, iff2, im: InterruptMode, adl, halted
  • Per-instruction mode: l, il, suffix, madl, prefix, prefetch
  • ei_delay: u8 — 2-step delayed interrupt enable (EI enables interrupts after the NEXT instruction)

Instruction Execution (execute.rs, 2646 lines — the largest file):

The CPU uses x-y-z-p-q opcode decomposition. Each instruction goes through:

  1. Prefetch: Return the previously-prefetched byte, read the next byte into the prefetch buffer. This mirrors CEmu's hardware prefetch and is critical for cycle accuracy — without it, cycle counts were ~50% too low.

  2. Suffix detection loop: Opcodes 0x40, 0x49, 0x52, 0x5B are suffix opcodes (.SIS, .LIS, .SIL, .LIL) that modify the next instruction's L/IL modes. They execute atomically with the following instruction — a single step() call, not two. Getting this wrong caused trace count mismatches with CEmu.

  3. Dispatch: Based on the x field (bits 7:6): x=0 → execute_x0, x=1 → LD r,r' or HALT, x=2 → ALU A,r, x=3 → execute_x3. Prefixed instructions (CB/ED/DD/FD) have their own dispatch tables.

  4. Interrupt check: If iff1 && irq_pending, push return address and jump to 0x0038.

eZ80 Architectural Surprises (each of these caused boot failures):

| Discovery | Impact | How Found | | ---------------------------------------------------------------------------- | ------------------------------------------------ | ------------------------------------------- | -------------------- | | IM2 = IM1 on eZ80 (ignores I register, jumps to 0x0038) | Implementing standard Z80 IM2 crashed the boot | Trace comparison at ~9K steps | | Separate SPS/SPL stack pointers | Mixed-mode CALL/RET pushed wrong-width addresses | CEmu source reading + test failures | | Suffix opcodes execute atomically | Step count mismatched CEmu traces | Trace comparison showed 2 steps vs 1 | | R register rotation: LD R,A: (A<<1) | (A>>7) | R register diverged from CEmu | Parity test failures | | LD A,MB (ED 6E) — load memory base register | #1 boot blocker, not in Z80 docs | ROM disassembly | | F3/F5 flags preserved from previous F in ALU ops | Flag divergence in SPI polling loops | 29 dedicated parity tests | | ON key wakes from HALT even with DI | Boot sequence stalled at HALT | CEmu keypad_on_check() comparison | | OS Timer is a 4th timer (32K crystal, not documented) | Boot hangs at ~20M cycles without it | ROM code analysis showed it waits for bit 4 | | Block I/O instructions execute atomically (INI/IND + eZ80-specific variants) | Trace count mismatches | CEmu source comparison |

3.2 Bus: Memory Routing and Address Decoding

The Bus struct (bus.rs, 1929 lines) handles all memory access for the emulator. The 24-bit address space is decoded as:

RangeRegionSizeWait States
0x000000–0x3FFFFFFlash (ROM)4MB10 cycles read
0x400000–0xCFFFFFUnmappedLFSR pseudo-random on read
0xD00000–0xD3FFFFRAM256KB4 read / 2 write
0xD40000–0xD657FFVRAM~150KB4 read / 2 write
0xD65800–0xDFFFFFUnmappedLFSR pseudo-random
0xE00000–0xFFFFFFMMIO Ports2-4 cycles per port

Flash unlock detection: The bus monitors the fetch stream for a magic byte sequence (the FLASH_UNLOCK_SEQUENCE — DI; JR; DI; IM2; IM1; OUT0/IN0; BIT 2,A) that the ROM uses to unlock flash writes. When detected during privileged code fetch (PC in ROM range), flash write mode is enabled. This is a 16-byte or 17-byte pattern match on a 32-byte fetch ring buffer.

Memory protection: Three mechanisms enforced in write_byte():

  1. Stack limit (ports 0x3A-0x3C): SP below limit triggers NMI
  2. Protected range (ports 0x20-0x25): Writes to protected range from unprivileged code trigger NMI and are blocked
  3. Flash privilege (port 0x28): Only privileged code can write to flash

Debug port interception: The bus intercepts writes to special addresses used by the CE toolchain's dbg_printf:

  • 0xFB0000–0xFBFFFF: stdout (sequential address writes, NOT repeated writes to 0xFB0000)
  • 0xFC0000–0xFCFFFF: stderr
  • 0xFD0000: control (write 1 = clear console)
  • Null byte at 0xFB0000 exactly = program exit sentinel

Flash cache model: A 2-way set-associative cache with 128 sets and 32-byte cache lines, returning 2 cycles (same line), 3 cycles (cache hit), or 197 cycles (cache miss). This matches CEmu's flash_cache implementation exactly.

3.3 Scheduler: The 7.68 GHz Base Clock

The scheduler (scheduler.rs, 702 lines) is the timing backbone of the emulator. Rather than maintaining separate tick counters for each hardware clock, it uses a single base clock at 7,680,000,000 Hz — the LCM of all hardware clocks:

ClockRateBase ticks per tick
CPU (speed 3)48 MHz160
CPU (speed 2)24 MHz320
CPU (speed 1)12 MHz640
CPU (speed 0)6 MHz1280
Panel10 MHz768
Clock48M48 MHz160
Clock24M24 MHz320
Clock32K32.768 KHz234,375

Why the LCM approach? All hardware timer events can be scheduled in base ticks with pure integer arithmetic — no floating-point, no rounding errors, no drift. A timer ticking at 32.768 KHz fires every 234,375 base ticks, which divides evenly into the base clock rate. This design was ported directly from CEmu's schedule.c.

9 event types: RTC, SPI, TimerDelay, Timer0, Timer1, Timer2, OsTimer, LCD, LcdDma.

Overflow prevention: The process_second() method subtracts one second (7,680,000,000 base ticks) from all timestamps whenever the base counter crosses a second boundary. This prevents u64 overflow while maintaining relative timing. The INACTIVE_FLAG (bit 63) marks disabled events — since timestamps never reach bit 63 due to the one-second normalization, this bit is safely repurposed.

HALT fast-forward: When the CPU enters HALT, instead of spinning cycle-by-cycle, the emulator calls scheduler.cycles_until_next_event() and jumps forward to the next event. This is critical for performance — boot involves many HALT periods.

3.4 Peripherals: 13 Hardware Modules

LCD Controller (lcd.rs, 1302 lines)

The LCD runs a 5-state DMA engine:

FRONT_PORCH → SYNC → LNBU → BACK_PORCH → ACTIVE_VIDEO → (repeat)
  • FRONT_PORCH: Idle period before sync
  • SYNC: Horizontal/vertical sync pulses. Timing registers are parsed here.
  • LNBU: Line Buffer Update. Prefills a 256-byte FIFO before active video begins.
  • BACK_PORCH: Idle after sync. DMA is scheduled during this phase.
  • ACTIVE_VIDEO: Actual pixel DMA from VRAM at 0xD40000. The UPCURR register increments as pixels are transferred.

Two separate clock domains: LCD state machine events on CLOCK_24M (24 MHz), DMA transfers on CLOCK_48M (48 MHz). The process_dma() method handles pixel-by-pixel DMA advancement, while fast_forward_dma_events() provides O(1) bulk skip for performance.

DMA cycle stealing (~7.7% overhead): The LCD DMA and CPU contend for the memory bus. Rather than explicitly scheduling CPU wait cycles, the emulator tracks dma_last_mem_timestamp and calculates elapsed DMA cycles on each CPU memory access via process_dma_stealing(). This adds ~13M cycles to the 168M boot (~7.7% overhead, matching CEmu exactly).

Palette: 256 entries stored as 1555 ARGB, converted to both BGR565 and RGB565 on write. The 8bpp mode (used by DOOM) indexes into this palette for each pixel.

Cursor image RAM (0xE30800–0xE30BFF, 1024 bytes): A discovery made during DOOM support — LibLoad (a CE C library loader) uses this LCD hardware register space as scratch storage. Without implementing it, CE C programs crash.

Timer System (timer.rs, 671 lines)

Three General Purpose Timers with a shared 32-bit control register (3 bits per timer: enable, clock source, overflow enable) and a 2-cycle interrupt delay pipeline:

  • Cycle 0: Timer match/overflow detected → delay_status bit set
  • Cycle 1: Status becomes visible → delay_intrpt bit set
  • Cycle 2: Interrupt actually fires

This pipeline is implemented via the TimerDelay scheduler event and process_delay() state machine. Getting this wrong caused the graphing hang bug — timer interrupts fired too early, and the ISR looped infinitely because the status bits weren't visible yet.

RTC (rtc.rs, 674 lines)

Real-time clock with a 3-state machine:

  • TICK: Normal time counting (sec → min → hour → day with rollovers)
  • LATCH: Time latched for reading (prevents tearing)
  • LOAD_LATCH: Loading new time from load registers (bit-level transfer with write masks)

The load process is particularly complex: writing control bit 6 triggers a load that takes 51 ticks at 32 KHz to complete. A status register at offset 0x40 returns a bitmask showing which fields (sec/min/hour/day) have been transferred. The ROM polls this register during boot, and getting the timing wrong caused the emulator to boot into "Classic" mode instead of "MathPrint" mode.

Constants: LATCH_TICK_OFFSET = 16429 ticks (~0.5 seconds at 32 KHz). The RTC is scheduled at this offset from the start of each second.

OS Timer (in peripherals/mod.rs)

A 4th timer source not found in standard Z80 documentation, running off a 32.768 KHz crystal. Tick intervals vary by CPU speed: 73 ticks at 6 MHz (~449 Hz), 153 at 12 MHz (~214 Hz), 217 at 24 MHz (~151 Hz), 313 at 48 MHz (~105 Hz). The ROM enables OS Timer interrupt bit 4 and waits — without it, boot stalls indefinitely.

The OS Timer has a subtle ordering requirement matching CEmu: the interrupt state is set to the OLD value before toggling. Getting this wrong caused the OS Timer to fire at the wrong phase.

SPI Controller (spi.rs, 627 lines)

16-deep FIFO (not 4 as initially assumed), with a critical RX-only mode: when CR0 bit 11 (FLASH) is set and the TX FIFO is empty, transfers continue automatically filling the RX FIFO. Without this, the ROM's second SPI polling loop exits early at step ~699,910.

The SPI controller drives a ST7789V LCD panel stub (panel.rs, 223 lines) that absorbs 9-bit SPI frames and parses panel initialization commands (MADCTL, COLMOD, CASET, RASET, RAMWR).

Interrupt Controller (interrupt.rs, 448 lines)

Two banks of 32-bit interrupt registers (status, enabled, latched, inverted). The set_source() method handles edge/level/inverted semantics: if (set XOR (inverted & mask)) then set status bit, else clear it (preserving latched). The pulse() method creates proper edges for inverted+latched signals like WAKE.

Interrupt sources: ON_KEY(0), TIMER1(1), TIMER2(2), TIMER3(3), OSTIMER(4), KEYPAD(10), LCD(11), PWR(15), WAKE(19).

3.5 Execution Loop

The main execution loop (emu.rs, run_cycles()) is the heart of the emulator:

for each cycle budget:
    1. Record opcode in execution history (64-entry ring buffer for crash diagnostics)
    2. Handle any_key_wake (clears HALT if any key pressed)
    3. Execute one CPU instruction via cpu.step(bus)
    4. Check armed instruction trace (for debugging)
    5. Advance scheduler by elapsed CPU cycles
    6. Handle CPU speed changes (port 0x01 writes require cycle conversion)
    7. Process all pending scheduler events (RTC, SPI, timers, LCD, DMA)
    8. Process DMA cycle stealing
    9. Schedule SPI transfers if needed
    10. Check NMI flag (memory protection violations)
    11. Tick peripherals (timers with delay pipeline, keypad, OS Timer)
    12. HALT fast-forward (batch up to 10,000 cycles, cap at scheduler second boundary)
    13. Periodic diagnostic logging (every 60 frames)

Frame rendering is separate from execution: render_frame() dispatches to either render_frame_8bpp() (palette lookup using BGR565) or render_frame_16bpp() (direct RGB565 from VRAM) based on the LCD's BPP mode.


4. Cross-Platform Frontends

4.1 Shared Design

All three platforms use identical percentage-based layout constants:

  • LCD position: left 11.53%, top 6.92%, width 76.74%, height 24.92% of calculator body
  • Keypad area: top 34.57%, height 65.43% of calculator body
  • D-pad region: left 63.97%, top 13.72%, width 22.01%, height 14.74% of keypad area
  • Body aspect ratio: 963/2239 (from the calculator body image)
  • 49 button regions with identical percentage coordinates across all platforms, derived from a real TI-84 CE photograph via scripts/extract_buttons.py

All platforms use the same emulation timing: 800,000 cycles per frame at 60 FPS (= 48 MHz real-time), with non-linear speed steps from 0.25× to 20×.

State persistence uses SHA-256 of the ROM data, truncated to 16 hex chars, as a key. States are namespaced by backend type ("rust:<hash>" or "cemu:<hash>").

4.2 Android (android/, Kotlin + Jetpack Compose)

  • MainActivity.kt (~2100 lines, monolithic): Compose UI with EmulatorScreen composable. Emulation loop runs on Dispatchers.Default coroutine. Framebuffer copied to Bitmap via setPixels().
  • EmulatorBridge.kt: Loads libemu_jni.so via System.loadLibrary, which dynamically loads backend .so files. 18 JNI method declarations.
  • jni_loader.cpp (~630 lines): BackendInterface struct with function pointer table. loadBackend() uses dlopen() to load libemu_&lt;name&gt;.so. Thread-safe log callback deque (max 200 entries) forwarded to Android logcat.
  • cemu_adapter.c (~595 lines): Wraps CEmu's global-state API into instance-based interface. Singleton pattern. State save/load uses temp files because CEmu only supports FILE* API.
  • StateManager.kt: Thread-safe singleton, SHA-256 ROM hashing, auto-delete corrupted states.
  • Image keypad: ImageKeyButton composables with 2dp travel on press, brightness darkening via ColorMatrix(0.82).
  • D-pad: Canvas-drawn circular D-pad with arc segments, arrow indicators, hit-testing via angle/radius calculation.

4.3 iOS (ios/, Swift + SwiftUI)

  • ContentView.swift: EmulatorState class (ObservableObject) manages emulation lifecycle. Emulation loop runs on Task.detached(priority: .userInitiated).
  • EmulatorBridge.swift: NSLock-protected access to opaque C pointer. makeImage() creates CGImage from framebuffer using CGContext with premultipliedFirst | byteOrder32Little.
  • backend_bridge.c (~309 lines): Static linking variant. When HAS_RUST_BACKEND defined, declares rust_emu_* extern functions. current_backend pointer switches between rust_backend and cemu_backend const structs.
  • 3 Xcode build configurations: Backend-Rust.xcconfig (links -lemu_rust), Backend-CEmu.xcconfig (links -lemu_cemu), Backend-Both.xcconfig (both).
  • ImageKeypadView.swift: Same 49 button regions, DragGesture(minimumDistance: 0) for press detection.
  • AppState: Singleton monitoring scenePhase for auto-save on background.

4.4 Web (web/, React + TypeScript + Vite)

  • Calculator.tsx (~800 lines): Main component with requestAnimationFrame loop, time accumulator for frame pacing, safety cap of 4 frames per rAF tick (30 in turbo mode).
  • RustBackend.ts: Wraps wasm-bindgen WasmEmu class. State persistence uses WASM memory snapshots — dumps the entire linear memory (~29MB via memcpy, ~4ms) rather than field-by-field serialization. On restore, grows memory if needed and copies back. Custom binary format with "WM01" header.
  • CEmuBackend.ts: Wraps Emscripten module. Sets up global stubs (emul_is_inited, emul_is_paused). ARGB→RGBA pixel conversion on every frame.
  • Chess auto-launch: Polls canvas pixel (310, 10) for green battery icon to detect homescreen ready, then fires Prgm→Prgm→2→Enter key sequence.
  • PWA: vite-plugin-pwa with versioned ROM/WASM caching, update banner, offline support. ROM manifest system for tracking content hashes.
  • Drag-and-drop: Supports dragging both ROM files and .8xp/.8xv programs onto the calculator.
  • Keyboard mapping: 50+ key bindings including F1-F5 for function row, Shift for 2nd (solo-press detection), V for sqrt (2nd + x² combo), Ctrl+R for resend programs.

5. Technical Tradeoffs & Decisions

5.1 Cycle-Accurate Scheduler via LCM Base Clock

Decision: Use a single 7.68 GHz base clock instead of separate tick counters per hardware clock.

Tradeoff: Large tick values (u64 required) but zero floating-point error. All hardware timers divide evenly into the base clock. The process_second() overflow prevention method keeps values bounded.

Alternative considered: Per-clock tick counters with conversion functions. Rejected because fractional cycles accumulate rounding errors over millions of instructions, causing drift that's impossible to debug.

5.2 Prefetch Pipeline Emulation

Decision: Implement a single-byte prefetch buffer that charges memory access cycles during the current instruction.

Tradeoff: Added complexity to every instruction fetch path (every call to fetch_byte() must handle the prefetch buffer). But without it, cycle counts were ~50% of CEmu's (10 cycles instead of 20 for flash reads). This was non-negotiable for parity.

5.3 Manual Serialization vs. Serde

Decision: Custom to_bytes()/from_bytes() for state snapshots, with a STATE_VERSION byte.

Tradeoff: Precise control over byte layout and smaller snapshots, but maintenance burden — STATE_VERSION was bumped 8 times as peripherals were added, and missed fields caused "RAM Cleared" bugs.

Alternative explored: Serde+bincode in a separate worktree (calc-serde branch). Required custom handling for types like [u8; 4194304] (flash) that don't implement serde traits. Not merged.

Web's approach: Bypassed the problem entirely by snapshotting the entire WASM linear memory (~29MB memcpy in ~4ms). This eliminates all serialization bugs at the cost of larger save files.

5.4 Image-Based Keypad vs. Programmatic Buttons

Decision: Pivot from programmatic gradient buttons to a photograph-based overlay with percentage-based hit regions.

Tradeoff: More realistic appearance and perfect cross-platform consistency (same coordinates file used everywhere), but harder to modify and larger asset size. The extract_buttons.py script crops individual button images from a high-res TI-84 CE photo and generates button_regions.json.

5.5 Visual Polling for Auto-Launch

Decision: Detect homescreen readiness by checking a single canvas pixel (310, 10) for the green battery indicator, rather than using fixed timing delays.

History: The chess auto-launch went through 6 iterations: fixed 3.5s delay → 2s delay → 1.2s delay → center pixel check → status bar check → single battery pixel check.

Tradeoff: Adapts to actual boot speed (different ROMs boot at different speeds), but fragile if the OS skin changes.

5.6 DMA Cycle Stealing via Timestamp Tracking

Decision: Track dma_last_mem_timestamp and calculate stolen DMA cycles lazily on each CPU memory access, rather than explicitly scheduling CPU wait states.

Tradeoff: Simpler code (no explicit wait state scheduling), but harder to debug because the timing effect is implicit. The ~7.7% cycle overhead (13M of 168M boot cycles) emerges from the interaction between DMA and CPU memory access patterns.

5.7 Debug Port Interception via Bus Cold Path

Decision: Intercept writes to 0xFB0000-0xFDFFFF (CE toolchain debug ports) in the bus's unmapped MMIO cold path.

Critical discovery: sprintf writes to SEQUENTIAL addresses (0xFB0000, 0xFB0001, ...), NOT repeated writes to the same address. A null byte at 0xFB0000 exactly means program exit; a null at any other offset means flush the buffer. This took multiple debugging sessions to understand.


6. The CEmu Parity Campaign

6.1 Overview

The parity campaign was the single largest engineering effort in the project — a systematic 7-phase overhaul to make the Rust emulator match CEmu's behavior at the instruction level. The campaign was driven by an extensive toolchain of trace generation, comparison, and analysis tools.

6.2 The Parity Toolchain

Trace generation (Rust side):

  • cargo run --example debug -- trace [steps] — Generates space-separated trace files with step, cycles, PC, SP, AF, BC, DE, HL, IX, IY, ADL, IFF1, IFF2, IM, HALT, opcode
  • cargo run --example debug -- fulltrace [steps] — Generates comprehensive JSON traces including all I/O operations per instruction (RAM reads/writes, MMIO port access)

Trace generation (CEmu side):

  • tools/cemu-test/trace_gen.c — Links against CEmu's libcemucore.a, generates traces in the exact same format as the Rust side
  • tools/cemu-test/parity_check.c — Checks CEmu state at 14 cycle milestones (1M through 60M)

Comparison tools:

  • scripts/compare_traces.py — PC-synced comparison with prefix lookahead (CEmu counts DD/FD prefixes as separate instructions)
  • scripts/find_first_divergence.py — JSON fulltrace comparison with I/O operation matching
  • core/examples/dense_compare.rs — PC-aligned comparison using HashMap-based lookup with 5-step lookahead

Targeted investigation tools (11 specialized Rust examples):

  • find_divergence.rs — Tracks PC at known CEmu cycle checkpoints
  • scheduler_debug.rs — Monitors RTC event timing in scheduler
  • rtc_timing_compare.rs — Compares RTC load timing between implementations
  • check_0072fa.rs — Single-steps 70M cycles checking specific poll loop address
  • mathprint_check.rs — Monitors MathPrint flag at cycle checkpoints
  • Plus 6 more specialized analysis tools

6.3 The 7 Phases

Phase 1: CPU Instruction Correctness (Effort: L)

  • Fixed RETI IFF1 restore
  • Fixed register pair mapping (ED x=0 z=7 p=3: IY→IX)
  • Added missing eZ80 instructions: LD I,HL (ED C7), LD HL,I (ED D7), LEA IY,IX+d (ED 55)
  • Implemented block I/O (all Z80 + eZ80-specific variants)
  • Fixed EX DE,HL L-mode masking
  • Fixed block BC decrement (preserve BCU in Z80 mode)
  • Verification: Boot 132.79M cycles, PC=085B80. 250/436 tests passing.

Phase 2: Bus & Address Decoding (Effort: M)

  • Flash routing for 0x400000–0xBFFFFF
  • MMIO unmapped holes
  • Port range 0xF routing
  • SPI in memory-mapped path
  • Backlight routing
  • Verification: Boot 132.79M, 251/436 tests.

Phase 3: Peripheral Register Layout Rewrites (Effort: XL — the largest phase)

  • Timer rewrite: Replaced 3 separate Timer structs with unified GeneralTimers. Shared 32-bit control register (3 bits per timer), status register, mask register.
  • Keypad register packing: Single 32-bit control register with mode (2 bits) + rowWait (14 bits) + scanWait (16 bits). 16 data registers. GPIO enable. Reset mask 0xFFFF.
  • Watchdog offset fix: Counter and load value offsets were SWAPPED. Revision corrected to 0x00010602.
  • Verification: Boot 156.10M, 272/457 tests.

Phase 4: Scheduler & Timing (Effort: L)

  • SCHED_SECOND overflow prevention
  • CPU speed change event conversion
  • Panel clock rate 60Hz → 10,000,000 Hz
  • OS Timer interrupt phase fix (set state to OLD value before toggling)
  • Timer 32 KHz clock source
  • Timer 2-cycle interrupt delay pipeline: Match detected → status visible → interrupt fires. Required TimerDelay scheduler event, process_delay() state machine, delay_status: u32 and delay_intrpt: u16 fields.
  • Verification: Boot 156.10M, 272/457 tests.

Phase 5: RTC, SHA256, Control Ports (Effort: M)

  • RTC 3-state machine with time counting, latching, and load data transfer
  • SHA256 64-round compression function
  • Control port masks (port 0x01: & 0x13, port 0x29: & 1)
  • Flash size_config reset (0x07 → 0x00)
  • INT_PWR interrupt on reset
  • Verification: Boot 108.78M, 259/437 tests.

Phase 6: LCD & SPI Enhancements (Effort: L)

  • LCD DMA 5-state event machine with dual clock domains
  • 256-entry palette storage
  • SPI panel stub (ST7789V)
  • LCD ICR, MIS, UPCURR, LPCURR registers
  • Verification: Boot 156.10M, 277/455 tests.

Phase 7: CPU Advanced & Bus Protection (Effort: XL, Risk: High)

  • Separate SPS/SPL stack pointers (CPU SNAPSHOT_SIZE 64 → 67)
  • Mixed-mode CALL/RET/RST with MADL|ADL flag byte
  • Memory protection (stack limit NMI, protected range, flash privilege)
  • DMA scheduling with cycle stealing (~7.7%)
  • HALT fast-forward, interrupt prefetch_discard, R rotation
  • Verification: Boot 168.14M cycles, 277/455 tests. ALL PHASES COMPLETE.

6.4 The Comparison Report

PR #56's supporting document (docs/pr_comparison_report.md) was generated by 8 parallel analysis agents, each comparing a different subsystem of the Rust emulator against CEmu. It identified ~150 specific discrepancies organized into three tiers:

  • 20 critical issues (CPU missing instructions, register layout mismatches, entirely missing peripherals)
  • 19 high issues (timing differences, routing bugs, missing clock sources)
  • 25+ medium issues (edge cases, missing features, reset value differences)

This document served as the roadmap for all 7 phases.


7. AI Agent Orchestration for Testing & Debugging

7.1 Overview of AI's Role

The project was ~80% AI co-authored (262/332 commits). Claude Code was used in 37+ main conversation sessions with 472 subagent invocations across those sessions, generating ~73 MB of conversation data (~9 MB in main sessions, ~64 MB in subagent sessions). The heaviest sessions had 89 subagent files (session 13baee3d, Feb 8) and 76 subagent files (session dc8b876a, Jan 30-31).

Hunter directed the architecture, defined the parity methodology, tested on real devices, identified bugs, and corrected the AI when it went wrong. Claude served as implementation workhorse — writing code, running traces, and deploying subagents per Hunter's direction. The division of labor was:

Hunter (direction & decisions)AI (execution & research)
Designed dual-backend architectureDeployed up to 89 parallel subagents for research
Defined parity methodology & toolchainRead and compared CEmu C source code against Rust
Tested on real Android/iOS devicesWrote implementation code per Hunter's direction
Identified bugs via device testingGenerated traces and analyzed divergences
Corrected wrong AI approachesCreated PRs with descriptions
Decided what to build/cut/revertRebaked ROMs (17 consecutive times in one session)
Set quality bar ("I want perfect")Iterated until Hunter's bar was met

7.2 Subagent Usage Patterns

Subagents were the primary mechanism for scaling the AI's research and implementation capacity. The project used 472 subagent sessions across the 37 main sessions, with distinct usage patterns:

Research subagents (the most common pattern): These were deployed in parallel to read and analyze source code. The most dramatic example was the 8-agent peripheral audit (PR #56), where 8 subagents simultaneously compared each peripheral module in the Rust emulator against its CEmu C counterpart. Each agent would:

  1. Read the CEmu C source file (e.g., cemu-ref/core/timers.c)
  2. Read the corresponding Rust source file (e.g., core/src/peripherals/timer.rs)
  3. Produce a detailed report of every register offset, default value, timing behavior, and control flow difference

This pattern was repeated at smaller scale throughout the project. When investigating a bug, 2-3 research subagents might be deployed simultaneously — one reading the CEmu source for a specific subsystem, one reading the Rust implementation, and one analyzing trace divergences.

Implementation subagents: For larger feature work (e.g., the 7-phase parity overhaul), subagents were used to implement specific fixes within a phase while the main agent coordinated. A typical pattern:

  1. Main agent defines the fix needed (e.g., "rewrite timer register layout to match CEmu's packed 32-bit format")
  2. Subagent reads both implementations, writes the new Rust code
  3. Main agent integrates, runs tests, verifies parity

Worktree subagents: Claude Code's worktree feature was used for parallel feature development. Subagents worked in isolated git worktrees (e.g., calc-worktrees/rtc-ticking, calc-worktrees/doom, calc-worktrees/image-keypad) to develop features without blocking the main branch. The image keypad worktree had its own memory file with 6 subagent sessions documenting SwiftUI pitfalls and button region coordinate systems.

Session intensity distribution:

  • 2 sessions with 75+ subagents (the boot-to-homescreen push on Jan 30-31, and the image keypad + parity work on Feb 8)
  • ~5 sessions with 20-40 subagents (major feature work)
  • ~10 sessions with 5-15 subagents (focused debugging or feature implementation)
  • ~20 sessions with 0-5 subagents (quick fixes, documentation, interview prep)

Cross-session knowledge transfer: Because subagent context is lost when a session ends, the project relied on persistent files for continuity:

  • MEMORY.md (68 lines of distilled learnings in .claude/projects/)
  • CLAUDE.md (7.6KB of workflow docs, memory map, trace formats in repo root)
  • docs/findings.md (15KB of hardware discoveries)
  • docs/milestones.md (7.1KB phase tracker)

Each new session's first action was typically reading these files to rebuild context. When a session hit context limits (which happened frequently during the parity campaign), the findings from that session were written to findings.md or milestones.md before ending, ensuring the next session could continue.

7.3 The Parity Testing/Debugging Loop

The CEmu parity campaign was Hunter's core methodology for achieving correctness: systematically compare every subsystem against the reference implementation, prioritize fixes by dependency order, and verify each fix pushes the divergence point further. The AI executed this methodology at scale. Here's how it worked:

Phase 1: Systematic Comparison Research

Hunter directed a comprehensive audit of every peripheral subsystem against CEmu's C source. Claude deployed 8 parallel subagents simultaneously, each assigned a different subsystem:

  1. Agent comparing CPU execution (cpu.c vs execute.rs)
  2. Agent comparing bus/memory (bus.c/mem.c vs bus.rs/memory.rs)
  3. Agent comparing LCD/backlight (lcd.c vs lcd.rs)
  4. Agent comparing timers (timers.c vs timer.rs)
  5. Agent comparing interrupt controller (interrupt.c vs interrupt.rs)
  6. Agent comparing keypad (keypad.c vs keypad.rs)
  7. Agent comparing RTC/SHA256 (realclock.c/sha256.c vs rtc.rs/sha256.rs)
  8. Agent comparing scheduler/control (schedule.c/control.c vs scheduler.rs/control.rs)

Each agent read both the Rust and C source files in full and produced a detailed report of every discrepancy — register layout differences, timing behavior differences, missing features, wrong default values. The reports were merged into docs/pr_comparison_report.md (~150 issues).

Phase 2: Prioritized Fix Implementation

Hunter organized the ~150 issues into 7 phases based on dependency ordering (CPU first, then bus, then peripherals, then timing). Each phase had:

  • Clear deliverables (specific registers to fix, specific behaviors to match)
  • A verification checkpoint (boot cycle count + test count)
  • Effort and risk ratings

The AI implemented fixes per the phase plan, then verified by generating a trace and comparing against CEmu:

cargo run --example debug -- trace 100000    # Generate Rust trace
cd tools/cemu-test && ./trace_gen ../../ROM -n 100000  # Generate CEmu trace
python scripts/compare_traces.py cemu.log rust.log     # Compare

Phase 3: Divergence Bisection

When traces diverged, Hunter directed the AI to use a binary-search approach to find the exact instruction:

  1. Generate a long trace (100K+ steps)
  2. Find the first PC where Rust ≠ CEmu
  3. Read the surrounding instructions for context
  4. Look up the divergent instruction in CEmu's source
  5. Compare the implementation in Rust
  6. Fix the discrepancy
  7. Re-run and verify the fix pushed the divergence further out

This loop was repeated hundreds of times. Key milestones: 40K steps → 700K steps → 3.2M steps → full boot (3.6M steps pre-DMA, 168M with DMA).

Phase 4: Targeted Investigation Tools

When standard trace comparison was insufficient, Hunter directed the AI to build specialized investigation tools targeting specific subsystems. 11 custom Rust examples were created during the parity campaign:

  • rtc_timing_compare.rs — Compared RTC load timing between Rust and CEmu at 12 checkpoints from 0 to 50M cycles. Finding: "Our RTC load status returns 0x00 (complete) too early. This timing difference causes the poll loop at 0x0072FA to exit earlier."
  • scheduler_debug.rs — Monitored RTC event scheduling. Finding: At 48 MHz, 1 CPU cycle = 160 base ticks; RTC fires every 16,429 ticks at 32 KHz = ~24M CPU cycles delay.
  • check_0072fa.rs — Single-stepped 70M cycles checking one specific poll loop address that CEmu visits but Rust didn't. Finding: Different control flow due to RTC timing.
  • mathprint_check.rs — Monitored the MathPrint flag (0xD000C4 bit 5) at 8 cycle checkpoints. Finding: The flag was never set to 0x20 (MathPrint mode) because the RTC timing caused a different code path.

Phase 5: Multi-Session Continuity

The parity campaign spanned multiple conversation sessions that hit context limits. To maintain continuity:

  • The MEMORY.md file in .claude/projects/ stored distilled learnings (68 lines)
  • The CLAUDE.md file in the repo documented workflow, key addresses, trace formats
  • docs/findings.md (15KB) captured every hardware discovery
  • docs/milestones.md (7.1KB) tracked phase completion status

When a new session started, Claude would read these files to rebuild context, then continue from where the last session left off.

7.4 Specific Debugging Stories Driven by AI Agents

The SPI Divergence Hunt (699,900 → 3.2M steps):

  • Divergence at step 699,900: After SPI STATUS read, CEmu A=0x20 (2 transfers pending), Rust A=0x00 (all complete)
  • AI agents analyzed CEmu's scheduler-driven SPI: sched_set(SCHED_SPI, ticks), transfer duration = bitCount * ((cr1 & 0xFFFF) + 1) ticks at 24MHz
  • The initial "complete ALL transfers on first read" approach worked at step 418K (3 transfers) but failed at 699K (6 transfers, only 4 should complete)
  • Solution: Implement event-driven scheduler for SPI. This pushed parity to 3.2M steps.
  • Next divergence at 3,216,456: RTC load status timing. This took 4 more dedicated investigation tools to diagnose.

The 8-Agent Peripheral Audit (Feb 5-6):

  • 8 agents deployed simultaneously, each comparing a subsystem
  • Total: ~150 issues identified across all peripherals
  • Produced a 35KB comparison report that became the 7-phase implementation roadmap
  • The agent findings revealed that some peripheral implementations were fundamentally wrong (e.g., timer register layout was completely different, keypad control register packing was wrong, watchdog offsets were swapped)

The MathPrint vs Classic Mode Investigation (5 custom tools):

  • Problem: Emulator boots into Classic mode, CEmu boots into MathPrint mode
  • mathprint_check.rs: Found the flag is never set to 0x20
  • check_0072fa.rs: Found the poll loop at 0x0072FA behaves differently
  • rtc_timing_compare.rs: Found RTC load completes in ~75K cycles (Rust) vs ~24M cycles (CEmu)
  • scheduler_debug.rs: Found the RTC event offset of 16,429 ticks was correct but the load processing timing was wrong
  • Resolution: Fix RTC load state machine to process at correct 32 KHz rate

7.5 Human-AI Interaction Patterns

Hunter's direction and quality enforcement:

  • Set the parity standard: "i dont want close, i want perfect"
  • Demanded correct approaches: "make sure this is the correct way to do things, i dont want to use hacky workarounds"
  • Redirected flailing AI: "stop making random guesses and use comprehensive logging"
  • Maintained project knowledge when AI lost context: "are you forgetting the things you learned in your findings?"
  • Prioritized correctness over test coverage: "genuinely i dont give a shit if tests fail, they weren't catching shit before"

Hunter correcting the AI:

  • AI focused on backlight brightness for the power-off bug when the real issue was the LCD enable bit — Hunter identified the correct root cause and redirected
  • AI frequently forgot previous session findings across context boundaries — Hunter pointed it back to findings.md and milestones.md
  • AI's interview prep contained unverifiable claims (e.g., calling a first-time-correct implementation a "redesign") — Hunter caught the fabrications and demanded accuracy
  • AI over-engineered the PWA implementation — Hunter reverted the work and directed a simpler approach
  • AI made the emulator boot into Classic mode instead of MathPrint — Hunter identified the symptom on device and directed the multi-tool investigation

What the AI executed well (when properly directed):

  • Deploying parallel subagents for research before writing code
  • Building specialized investigation tools for specific subsystems
  • The trace → compare → fix → verify loop (Hunter's methodology, AI's execution)
  • ROM baking automation (17 consecutive rebakes at 4-5 AM)
  • Cross-file refactoring (updating all 3 platforms simultaneously)

8. Claude Code Usage Statistics

8.1 Session & Token Data

MetricValue
Main conversation sessions37 session directories (24 indexed in sessions-index.json)
Main JSONL conversation files4 files, 9.2 MB total
Subagent invocations472 subagent JSONL files, 64 MB total
Total conversation data~73 MB
PRs created from sessions13
Unique branches worked on13
Related worktree projects2 (calc-web, calc-worktrees-image-keypad)

8.2 Token Usage (extracted from main JSONL files)

CategoryTokens
Input tokens1,017,775
Output tokens222,085
Cache read tokens462,457,150
Cache creation tokens73,134,709
Total~537M tokens

Note: These counts are from the 4 main JSONL files only. The 472 subagent sessions (64 MB of data) would add significantly more tokens — likely bringing the true total well above 1B tokens for the project.

8.3 Estimated Cost (Opus pay-per-token pricing)

CategoryRateEst. Cost
Input$15/M tokens$15
Output$75/M tokens$17
Cache reads$1.50/M tokens$694
Cache creation$18.75/M tokens$1,371
Main sessions total~$2,097

Note: This estimate uses Claude Opus 4 pricing. The actual cost depends on which model was used per session (Opus 4.5/4.6 vs Sonnet 4.5). Max plan subscribers pay a flat rate, so actual billing may differ. The subagent sessions' token costs are not included in this estimate.

8.4 Session Breakdown by JSONL File

FileSizeResponsesContent
e262789c.jsonl4.88 MB96Interview prep, chat export generation
534ad7db.jsonl3.82 MB538Chess mode, auto-launch, ROM baking, Shift+R
f814347a.jsonl285 KB22Test framework investigation
e910f4c6.jsonl189 KB35Web frontend test setup (Vitest, 44 keypad tests)

8.5 AI Model Distribution Across Commits

ModelCommitsNotes
Claude Opus 4.5215Primary model through mid-February
Claude Opus 4.674Adopted mid-February
Claude Sonnet 4.515Used sporadically
Codex1Single worktree snapshot
No AI attribution70Merge commits, manual fixes, config tweaks

9. Major Bugs & Debugging Stories

9.1 The Magnitude Error (10⁹ bug)

Symptom: 6+7 showed 1300000000, 99*99 showed wrong answer.

Investigation: Spanned many sessions. Traced BCD floating-point operations, checked OP1-OP6 register addresses, examined TI-OS format buffer.

Root cause: Using self.adl instead of self.l for data register wrapping. L mode controls data addressing width (16-bit vs 24-bit), while ADL controls instruction/PC width. LDIR/LDDR were using 24-bit addressing when they should have used 16-bit for data operations.

False leads: Initially suspected display formatting (decimal point not written), then keypad timing, then OS Timer frequency.

9.2 The Graphing Hang

Symptom: Screen freezes after graphing, loader stops spinning.

Root cause: Timer raw bits accumulating and causing infinite ISR loops. Timer interrupt clearing in tick_peripherals was not properly clearing the raw status bits before re-evaluating.

Fix: Proper implementation of the 2-cycle timer interrupt delay pipeline.

9.3 The "Done" Bug

Symptom: First calculation after boot shows "Done" instead of numeric result.

Root cause: TI-OS expression parser not initialized. The OS expects an ENTER key to have been processed before the first calculation.

Fix: Auto-inject ENTER key on first user interaction (in Rust core, cross-platform).

9.4 ON Key Wake From Sleep

Symptom: ON button doesn't work after power-off (APO or 2nd+ON).

Investigation: Multiple sessions tracing CEmu's keypad_on_check(), control.off flag, power state.

Root causes (multiple): Battery status port returning hardcoded 0 instead of 0xFE (the OS rejected wakes because it thought the battery was dead). on_key_wake was one-shot instead of persistent. The APD (Automatic Power Down) disable wasn't clearing the right flag at 0xD00088.

9.5 The 50× Performance Regression

Caused by: Phase 4 scheduler changes (timer interrupt delay pipeline).

Root cause: Scheduler base_cycles_offset u64 underflow — subtracting a larger value from a smaller one wrapped around to near-maximum u64.

Fix: Added next_event_ticks cache to avoid scanning all events, fixed the underflow.

9.6 State Restore "RAM Cleared"

Symptom: Restoring a saved state showed "RAM cleared" message.

Root cause: Multiple peripheral fields not included in state snapshots: SPI controller state, watchdog state, cursor registers, needs_lcd_event, needs_lcd_clear, memory protection registers (stack limit, protected range). The OS detected inconsistency on restore.

Fix: Multiple STATE_VERSION bumps (reached version 8) to add missing fields. Spawned multiple investigation subagents to systematically audit what was and wasn't saved.

9.7 Chess Opening Books Not Loading

Symptom: Chess shows "BK:NONE" (file open fails) vs CEmu showing "BK:32768".

Root cause: fileioc (CE C library) stores curr_slot at LCD register address 0xE30C11 and resize_amount at 0xE30C0C — in LCD cursor image RAM (0x800-0xBFF) which wasn't implemented.

How found: Added breakpoint mechanism to Emu struct, used disasm command, traced through _ChkFindSym bcall. Multiple subagent sessions to understand fileioc's internals.


10. Running Real Software: DOOM & Chess

10.1 DOOM Support (PR #68)

Getting DOOM running required several subsystem additions:

  • 8bpp LCD rendering: The calculator normally uses 16bpp RGB565, but DOOM uses 8bpp indexed color with a 256-entry palette. Added render_frame_8bpp() with palette lookup.
  • .8xp/.8xv file parser (ti_file.rs): Parse TI file format headers, checksums, variable entries. Supports programs, protected programs, AppVars.
  • Flash archive injection: inject_archive_entry() writes flag byte 0xFC, 2-byte size, type, version, self-referential address, name, data into flash. find_archive_free_addr() scans sectors 0x0C0000-0x3B0000.
  • SendKey mechanism: Pokes OS RAM directly at 0xD0058C (kbdKey), 0xD0058E (keyExtend), bit 5 of 0xD0009F (keyReady). Uses bus.poke_byte() to bypass memory protection.
  • Launch sequence: ENTER → CLEAR → Asm( → prgm → D,O,O,M → ENTER
  • LCD cursor image RAM: Extended LCD address space to include 0x800-0xBFF (1024 bytes) used by LibLoad as scratch storage.
  • Keypad range extension: Extended KEYPAD_END from 0x150048 to 0x151000 (full 4KB page) because DOOM needed the full range.

10.2 Chess Integration

The chess engine (from the ce-games submodule) is a fully-featured chess program running on the eZ80:

  • Alpha-beta negamax with PVS, aspiration windows, null-move pruning (R=2), LMR, futility pruning
  • Texel-tuned PeSTO piece-square tables
  • 4096-entry transposition table (always-replace)
  • Polyglot opening book split across AppVars (TI-OS 64KB limit), up to 131K entries
  • ~154K cycles/node, ~2000 Elo at Expert difficulty (15s/move)
  • Automated tournament system (emu_tournament.py) running eZ80 engine vs Stockfish

Web chess mode (/chess route): Fetches chess.bin (1.9MB gzipped ROM), auto-launches at 5× speed using visual polling for boot detection.


11. Development Timeline & Velocity

DateDayPRsKey Achievement
Jan 271#1-#4CPU + memory + peripherals. Full eZ80 in one afternoon.
Jan 282#6-#940K step parity. IM2 fix. OS Timer.
Jan 293#10-#13OS boots to home screen (3.6M steps). Scheduler implemented.
Jan 30-314-5#14-#20Keypad working. Magnitude error fixed (L vs ADL modes).
Feb 16#21-#33CEmu backend + iOS app + 13 PRs in one day.
Feb 27#34-#44Runtime backend switching. State persistence. Web app.
Feb 5-610-11#51-#587-phase CEmu parity overhaul (PR #56). WASM optimization.
Feb 7-912-14#59-#68DOOM runs. Image keypad. ON key wake.
Feb 10-1415-19#69-#78Live file send. Debug port interception. Sudoku.
Feb 16-2021-25#81-#86Chess mode. PWA offline. Shift+R dev shortcut.

Key velocity facts:

  • Boot-to-homescreen achieved in 3 days from initial commit
  • Full eZ80 CPU (3,675 lines, 124 tests) implemented in one afternoon
  • Tri-platform support (Android + iOS + Web) in 6 days
  • From "first instruction runs" to "DOOM runs" in 13 days

12. PR & Commit History Analysis

12.1 Merge Patterns

  • Squash merges: ~43 (64%) — used for most feature branches from PR #16 onward
  • Merge commits: ~24 (36%) — used for earlier PRs and larger merges
  • No rebases on main — clean linear history via squash

12.2 AI Co-Authorship

  • 262/332 commits (79%) have AI co-author attribution
  • Claude Opus 4.5: 215 commits (primary model through mid-Feb)
  • Claude Opus 4.6: 74 commits (adopted mid-February)
  • Claude Sonnet 4.5: 15 commits (sporadic)
  • Codex: 1 commit (worktree snapshot)
  • 70 commits (21%) have no AI attribution (merge commits, manual fixes, config tweaks)

12.3 Reverts

One revert: commit 645aeb1 reverted PR #2's CPU implementation immediately after squash-merge, then re-merged from a different branch topology. This was a merge strategy correction, not a code quality issue.

12.4 Closed PRs (8 total)

  • PRs #46-#50 (5 individual CEmu parity features): All absorbed into monolithic PR #56
  • PR #35 (unified state persistence): Superseded by per-platform PRs #39, #41, #43
  • PR #63 (state restore perf): Incorporated into later PRs
  • PR #85 (chess + Shift+R): Split into PRs #84 and #86

12.5 PR Dependency Chains

  • Core emulation: #1→#2→#4→#6→#7→#8→#9→#10→#11→#12→#13 (boot achieved)
  • CEmu parity: #22→#23→#30→#45→#51→#56→#59
  • Gaming: #61→#68→#70→#71→#74→#77→#78→#82→#84
  • State persistence: #39→#41→#42→#43→#53→#57→#81

13. Key Files Reference

Core Emulator

FileLinesPurpose
core/src/emu.rs3168Main orchestrator, execution loop, frame rendering
core/src/cpu/execute.rs2646All instruction execution (largest file)
core/src/bus.rs1929Memory routing, flash unlock, debug ports
core/src/disasm.rs1544Full eZ80 disassembler
core/src/peripherals/lcd.rs1302LCD controller + 5-state DMA engine
core/src/peripherals/keypad.rs8998×7 key matrix, scan modes, edge detection
core/src/cpu/helpers.rs858ALU, register access, prefetch
core/src/peripherals/control.rs838CPU speed, battery FSM, memory protection
core/src/scheduler.rs702Event scheduler, 7.68 GHz base clock
core/src/peripherals/rtc.rs674RTC 3-state machine
core/src/peripherals/timer.rs6713× GPT with delay pipeline
core/src/peripherals/spi.rs627SPI + 16-deep FIFO
core/src/peripherals/mod.rs599Port routing, tick orchestration
core/src/memory.rs587Flash + RAM + NOR commands
core/src/peripherals/flash.rs576Flash controller registers
core/src/peripherals/interrupt.rs4482-bank interrupt controller
core/src/lib.rs432C ABI exports, SyncEmu
core/src/ti_file.rs379.8xp/.8xv parser
core/src/wasm.rs325WASM bindings
core/src/peripherals/sha256.rs307SHA-256 compression
core/include/emu.h52C ABI contract

Debug Tools

FilePurpose
core/examples/debug.rs (~2900 lines)Swiss Army knife CLI: boot, trace, fulltrace, screen, vram, calc, sendfile, bakerom, run, rundoom
tools/cemu-test/trace_gen.cCEmu trace generator in matching format
tools/cemu-test/parity_check.cCEmu state checker at cycle milestones
scripts/compare_traces.pyPC-synced trace comparison
scripts/find_first_divergence.pyJSON fulltrace comparison with I/O matching

Documentation

FilePurpose
docs/findings.md (15KB)All hardware discoveries, bug findings, lessons
docs/milestones.md (7.1KB)7-phase parity roadmap (all complete)
docs/pr_comparison_report.md8-agent CEmu comparison (~150 issues)
CLAUDE.md (7.6KB)Claude Code workflow, memory map, trace format
README.md (17KB)Comprehensive project docs
outline_v2.mdBlog post draft with narrative framing

Platform Frontends

FileLinesPurpose
android/.../MainActivity.kt~2100Monolithic Android UI
android/.../jni_loader.cpp~630JNI + dynamic backend loading
android/.../cemu_adapter.c~595CEmu wrapper for Android
web/src/Calculator.tsx~800Main web component
web/src/emulator/RustBackend.tsWASM memory snapshot save/load
ios/Calc/Bridge/EmulatorBridge.swiftSwift FFI bridge
ios/Calc/Bridge/backend_bridge.c~309iOS static backend switching