TI-84 Plus CE Emulator — Deep Technical Profile

Build timeline — 25 active days across 4 phases (Jan 27 – Feb 20, 2026)
Scaffold + eZ80 CPU (1 day) — memory, bus, instruction set, early tests
Peripherals → boot to OS (5 days) — flash controller, ON key wake, interrupts, 3.2M+ instruction trace parity, boot to home screen
Multi-platform + CEmu parity campaign (7 days) — iOS/Android/web backends, dual-backend switching, WASM, 7-phase cycle-accurate trace comparison push, DMA/SPI/LCD fidelity
Games + polish (13 days) — image keypad, .8xp/.8xv loading, DOOM support, baked-in Sudoku/chess ROMs, PWA service worker, save-state fixes, speed slider

Project Overview

Libraries & Frameworks

Architecture
The Rust Emulator Core
Cross-Platform Frontends
Technical Tradeoffs & Decisions
The CEmu Parity Campaign
AI Agent Orchestration for Testing & Debugging
Claude Code Usage Statistics
Major Bugs & Debugging Stories
Running Real Software: DOOM & Chess
Development Timeline & Velocity
PR & Commit History Analysis
Key Files Reference

1. Project Overview

A cycle-accurate TI-84 Plus CE graphing calculator emulator built from scratch in Rust, with native frontends for Android (Kotlin/Jetpack Compose), iOS (Swift/SwiftUI), and Web (React/TypeScript/WASM). The emulator faithfully reproduces the Zilog eZ80 processor, all 13 hardware peripherals, and the TI-OS operating system — achieving instruction-level behavioral parity with CEmu, the established open-source reference emulator.

By the numbers:

~15,000 lines of Rust core, ~38,000 total across all platforms
332 commits across 80+ branches, 67 merged PRs
Built in 25 days (January 27 – February 20, 2026)
168,140,000 cycles to boot TI-OS (verified against CEmu)
277/455 unit tests passing (178 failures from pre-existing prefetch initialization, not regressions)
WASM binary: 148KB uncompressed, 96KB gzipped
~80% AI co-authored (262/332 commits have Claude co-author attribution)

Libraries & Frameworks

Rust core (`core/Cargo.toml`)

wasm-bindgen / js-sys / web-sys — generate the JS glue layer that wraps the Rust emulator's C ABI for the web build; gated behind the wasm feature.
console_error_panic_hook — routes Rust panics in the WASM build to the browser console so crashes aren't silent.
chrono — dev-only, used by test fixtures that need deterministic date/time values.
No other runtime crates — the core is intentionally no_std-style (no std::fs, no threads, no network).

Android frontend (`android/`)

Jetpack Compose (BOM 2023.10.01) + Material 3 — the entire UI (calculator, keypad, D-pad, settings) is Compose.
kotlinx-coroutines-android — emulation loop runs on Dispatchers.Default; UI state flows back on the main dispatcher.
androidx.activity-compose / lifecycle-runtime-ktx / core-ktx — standard Compose+lifecycle glue for the single-Activity app.
Android NDK + CMake 3.22.1 — compiles jni_loader.cpp, cemu_adapter.c, and the CEmu C sources into the per-backend .sos.
cargo-ndk — cross-compiles the Rust core to arm64-v8a / armeabi-v7a / x86_64 / x86 as static libs that get linked into libemu_rust.so.
JUnit 4 + AndroidX Test + Espresso — unit and instrumentation test harness.

iOS frontend (`ios/`)

SwiftUI — all UI (calculator view, keypad, settings sheets) is SwiftUI, driven by an ObservableObject EmulatorState.
Foundation / CoreGraphics — NSLock guards the FFI pointer; CGContext + CGImage turn the raw ARGB framebuffer into something Image can render.
CryptoKit — SHA-256 hashes ROMs for save-state association.
UniformTypeIdentifiers — file-type UTIs for the ROM/.8xp document picker.
os.log — structured logging with per-subsystem categories.
Xcode build configs + .xcconfig files — Backend-Rust / Backend-CEmu / Backend-Both control which static libs link in.

Web frontend (`web/`)

React 19 + react-dom — UI for the browser emulator.
Vite 7 + @vitejs/plugin-react — dev server, HMR, production bundling.
vite-plugin-pwa (Workbox) — service-worker generation for offline install/launch.
TypeScript 5.9 — types for the backend interface, the two backend wrappers, and UI code.
wasm-pack / wasm-bindgen — produces emu_core.wasm + JS bindings from the Rust core.
Emscripten (emcc) — compiles CEmu C sources to WebCEmu.wasm (skips the C adapter; CEmuBackend.ts adapts in TS).
Vitest + jsdom + @testing-library/{react,jest-dom,user-event} — test suite for TS backends and components.
ESLint 9 + typescript-eslint + React hooks/refresh plugins — lint config.

CEmu reference (`cemu-ref/`)

Pure C against stdlib only (stdlib.h, stdio.h, string.h, stdint.h, time.h, etc.) — no third-party C deps. emscripten.h is included only when compiling the WASM target.

Build orchestration

Cargo for Rust, CMake + NDK for Android native, Xcode for iOS, Emscripten + Vite for web, top-level Make targets (make android, make ios, make web) to kick each off.

2. Architecture

2.1 Dual-Backend Design

The project's most distinctive architectural decision is its dual-backend system. Both the custom Rust emulator and the CEmu C reference emulator conform to the same 15-function contract in core/include/emu.h — but each platform implements its own bridge to wire that contract up to its UI runtime. The bridges are not shared: Android uses dlopen, iOS uses static linking, and web skips C ABI entirely and goes through a TypeScript interface.

┌────────────────────────────────────────────────────────────────────┐
│                         Platform UI Layer                           │
│       Android Compose    │    iOS SwiftUI    │    Web React         │
└──────────┬───────────────┴────────┬──────────┴─────────┬───────────┘
           │                        │                    │
   ┌───────▼────────┐       ┌───────▼────────┐   ┌───────▼────────┐
   │ Android Bridge │       │   iOS Bridge   │   │   Web Bridge   │
   │                │       │                │   │                │
   │ JNI +          │       │ Swift +        │   │ TS factory +   │
   │ jni_loader.cpp │       │ backend_       │   │ EmulatorBack-  │
   │ (dlopen .so)   │       │ bridge.c       │   │ end interface  │
   │                │       │ (static link)  │   │ (dyn import)   │
   └───────┬────────┘       └───────┬────────┘   └────────┬───────┘
           │                        │                     │
           └────────────────┬───────┴─────────────────────┘
                            │ C ABI: 15 extern "C" emu_* fns (emu.h, 52 lines)
                            │ (on web: wasm-bindgen / Emscripten wrap it into TS)
        ┌───────────────────┴────────────────────┐
        │                                        │
┌───────▼────────────────────────┐    ┌──────────▼─────────────────────┐
│  Rust Emulator Core            │    │  CEmu Reference Emulator       │
│  (core/, ~15,000 lines)        │    │  (upstream C in cemu-ref/)     │
│                                │    │                                │
│  eZ80 CPU                      │    │  Full eZ80 system:             │
│   • execute.rs (2,646 lines)   │    │   CPU + MMU, memory, flash,    │
│   • step, flags, ALU, helpers  │    │   LCD, timers, keypad,         │
│  Memory bus (1,929 lines)      │    │   SPI, RTC, interrupts,        │
│  Flash / RAM / VRAM (587)      │    │   SHA-256, watchdog, …         │
│  Event scheduler (702)         │    │                                │
│  13 peripheral modules:        │    │  Uses global/singleton state,  │
│   LCD, Timer, Keypad, SPI,     │    │   FILE*-based save/load.       │
│   RTC, Flash, SHA, Interrupt,  │    │                                │
│   Backlight, Watchdog, Panel,  │    │  cemu_adapter.c (~595 lines,   │
│   Control, (Peripherals root)  │    │   shared Android+iOS) wraps it │
│                                │    │   into instance-based emu_* C  │
│  Exports: #[no_mangle]         │    │   ABI matching emu.h.          │
│   extern "C" fn emu_*          │    │                                │
│                                │    │  Web has no adapter: Emscripten│
│                                │    │   export list is called direct │
│                                │    │   from CEmuBackend.ts.         │
│                                │    │                                │
│  Artifacts per platform:       │    │  Artifacts per platform:       │
│   Android: libemu_rust.so      │    │   Android: libemu_cemu.so      │
│   iOS:     libemu_core.a       │    │   iOS:     libcemu_adapter.a   │
│   Web:     emu_core.wasm       │    │   Web:     WebCEmu.wasm        │
└────────────────────────────────┘    └────────────────────────────────┘

The shared C ABI contract (emu.h, 52 lines) defines 15 functions both backends must export:

Lifecycle: emu_create(), emu_destroy()
ROM: emu_load_rom(), emu_send_file()
Execution: emu_reset(), emu_power_on(), emu_run_cycles()
Display: emu_framebuffer() (ARGB8888, 320×240), emu_is_lcd_on()
Input: emu_set_key(row, col, down) (8×7 matrix)
State: emu_save_state_size(), emu_save_state(), emu_load_state()
Misc: emu_get_backlight(), emu_set_log_callback()

What each platform bridge actually does:

Android (android/app/src/main/cpp/jni_loader.cpp, ~630 lines) — JNI layer loads libemu_jni.so, which dlopen()s a backend .so (libemu_rust.so or libemu_cemu.so) and resolves backend_*-prefixed symbols via dlsym() into a local BackendInterface dispatch table (17 fields: handle, name, the 15 ABI functions, plus an optional set_temp_dir). Backends live in separate .sos, so symbol collisions are impossible — RTLD_LOCAL gives each its own namespace.
iOS (ios/Calc/Bridge/backend_bridge.c, ~309 lines + EmulatorBridge.swift) — App Store policy forbids dlopen of non-system dylibs, so both backends are statically linked. Collision is avoided by building the Rust core with the ios_prefixed Cargo feature, which renames its exports to rust_emu_*, and by compiling the CEmu adapter with IOS_PREFIXED=1 which macro-rewrites its exports to cemu_*. backend_bridge.c declares both prefixed symbol sets extern, populates two const BackendInterface structs (16 fields: name + 15 fn ptrs), and emu_backend_set(name) swings a current_backend pointer between them. Three xcconfigs (Backend-Rust, Backend-CEmu, Backend-Both) control which .as link in.
Web (web/src/emulator/) — No C ABI at the binding seam. A TypeScript interface EmulatorBackend (types.ts) is implemented by RustBackend.ts (wrapping wasm-bindgen-generated bindings around emu_core.wasm) and CEmuBackend.ts (wrapping an Emscripten module around WebCEmu.wasm). createBackend(type) in index.ts is a plain TS factory; each .wasm module lives in its own instance, so there is no shared symbol space to collide in.

The CEmu adapter itself — cemu_adapter.c (~595 lines, under android/.../cpp/cemu/) — is shared between Android and iOS (iOS CMake reuses the same source, just with IOS_PREFIXED=1). Web has no equivalent: CEmuBackend.ts calls Emscripten exports directly. So the emulator cores are shared across platforms; the bridges that host them are not.

Why dual backends? The primary motivation was parity-driven development. Having CEmu as a runtime-swappable reference meant:

Any behavior difference could be observed in real-time on the same device
The Rust implementation could be validated against CEmu at every stage
Users had a working fallback while the Rust core was incomplete
A/B comparison screenshots could be generated instantly

The tradeoff: doubled integration surface area, two build systems (Cargo + CMake/Emscripten), and per-platform collision handling — dlopen + RTLD_LOCAL on Android, dual prefixing (ios_prefixed + IOS_PREFIXED=1) on iOS, separate .wasm instances on web.

2.2 Core Module Architecture

The Rust emulator core (core/src/, ~15,000 lines) is organized into 5 main modules and 13 peripheral modules:

core/src/
├── lib.rs          (432 lines)  — C ABI exports, SyncEmu thread-safe wrapper
├── emu.rs          (3168 lines) — Emulator orchestrator, execution loop, frame rendering
├── bus.rs          (1929 lines) — Memory bus, address decoding, flash unlock, debug ports
├── memory.rs       (587 lines)  — Flash (4MB), RAM (256KB+VRAM), NOR flash commands
├── scheduler.rs    (702 lines)  — Event scheduler, 7.68 GHz base clock, 9 event types
├── disasm.rs       (1544 lines) — Full eZ80 disassembler
├── ti_file.rs      (379 lines)  — .8xp/.8xv TI file format parser
├── wasm.rs         (325 lines)  — WASM FFI bindings
├── cpu/
│   ├── mod.rs      (653 lines)  — CPU state, step(), interrupt handling
│   ├── execute.rs  (2646 lines) — All instruction execution (largest file)
│   ├── helpers.rs  (858 lines)  — ALU ops, register access, fetch/prefetch
│   ├── flags.rs    (25 lines)   — Flag bit constants (C, N, PV, H, Z, S, F3, F5)
│   └── tests/      (5420 lines) — instructions, modes, parity tests
└── peripherals/
    ├── mod.rs      (599 lines)  — Port routing, tick orchestration, state persistence
    ├── control.rs  (838 lines)  — CPU speed, battery FSM, flash unlock, memory protection
    ├── lcd.rs      (1302 lines) — LCD controller, 5-state DMA engine, palette
    ├── timer.rs    (671 lines)  — 3× GPT with 2-cycle interrupt delay pipeline
    ├── rtc.rs      (674 lines)  — Real-time clock, 3-state machine, 6 interrupt types
    ├── keypad.rs   (899 lines)  — 8×7 matrix, scan modes, edge detection
    ├── spi.rs      (627 lines)  — SPI controller, 16-deep FIFO, panel stub
    ├── interrupt.rs(448 lines)  — 2-bank interrupt controller with inversion/latching
    ├── flash.rs    (576 lines)  — Flash controller registers, wait state management
    ├── sha256.rs   (307 lines)  — SHA-256 block compression (64-round)
    ├── panel.rs    (223 lines)  — ST7789V LCD panel stub (SPI target)
    ├── backlight.rs(72 lines)   — PWM backlight brightness
    └── watchdog.rs (248 lines)  — Watchdog timer stub

2.3 Design Principles

No platform dependencies in the core. The Rust core has no std::fs, no std::io, no std::net. It doesn't know about files, logging, threading, or what platform it's running on. Everything flows through byte buffers via the C ABI. The only external crates are wasm-bindgen/js-sys/web-sys (optional, gated behind the wasm feature flag), and chrono (dev-only, for tests).
Stable C ABI. All exports use extern "C" with #[no_mangle]. The SyncEmu wrapper wraps Emu in a Mutex<Emu> for thread safety. Raw pointers are used at the FFI boundary, with Box::into_raw / Box::from_raw for lifecycle management.
Single-threaded deterministic core. The emulator is purely deterministic — given the same ROM and inputs, it produces the same outputs every time. Threading is the platform's responsibility.
Buffer-based I/O. ROM loaded as &[u8], framebuffer exposed as *const u32 (ARGB8888), save states serialized to Vec<u8>. No file handles cross the FFI boundary.

3. The Rust Emulator Core

3.1 CPU: The eZ80 Processor

The TI-84 Plus CE uses a Zilog eZ80 processor — an extended Z80 with 24-bit addressing (ADL mode). This is NOT a standard Z80; it has critical differences that are poorly documented and caused numerous bugs during development.

CPU State (Cpu struct, cpu/mod.rs):

Main registers: a: u8, f: u8, bc/de/hl: u32 (24-bit)
Shadow registers: a_prime, f_prime, bc_prime, de_prime, hl_prime
Index registers: ix, iy (24-bit)
Dual stack pointers: sps (Z80 16-bit), spl (ADL 24-bit) — selected by L mode flag
Special: pc: u32, i: u16, r: u8, mbase: u8
State flags: iff1, iff2, im: InterruptMode, adl, halted
Per-instruction mode: l, il, suffix, madl, prefix, prefetch
ei_delay: u8 — 2-step delayed interrupt enable (EI enables interrupts after the NEXT instruction)

Instruction Execution (execute.rs, 2646 lines — the largest file):

The CPU uses x-y-z-p-q opcode decomposition. Each instruction goes through:

Prefetch: Return the previously-prefetched byte, read the next byte into the prefetch buffer. This mirrors CEmu's hardware prefetch and is critical for cycle accuracy — without it, cycle counts were ~50% too low.
Suffix detection loop: Opcodes 0x40, 0x49, 0x52, 0x5B are suffix opcodes (.SIS, .LIS, .SIL, .LIL) that modify the next instruction's L/IL modes. They execute atomically with the following instruction — a single step() call, not two. Getting this wrong caused trace count mismatches with CEmu.
Dispatch: Based on the x field (bits 7:6): x=0 → execute_x0, x=1 → LD r,r' or HALT, x=2 → ALU A,r, x=3 → execute_x3. Prefixed instructions (CB/ED/DD/FD) have their own dispatch tables.
Interrupt check: If iff1 && irq_pending, push return address and jump to 0x0038.

eZ80 Architectural Surprises (each of these caused boot failures):

Discovery	Impact	How Found
IM2 = IM1 on eZ80 (ignores I register, jumps to 0x0038)	Implementing standard Z80 IM2 crashed the boot	Trace comparison at ~9K steps
Separate SPS/SPL stack pointers	Mixed-mode CALL/RET pushed wrong-width addresses	CEmu source reading + test failures
Suffix opcodes execute atomically	Step count mismatched CEmu traces	Trace comparison showed 2 steps vs 1
R register rotation: `LD R,A` uses `(A<<1) \| (A>>7)`	R register diverged from CEmu (parity test failures)	Trace comparison
`LD A,MB` (ED 6E) — load memory base register	#1 boot blocker, not in Z80 docs	ROM disassembly
F3/F5 flags preserved from previous F in ALU ops	Flag divergence in SPI polling loops	29 dedicated parity tests
ON key wakes from HALT even with DI	Boot sequence stalled at HALT	CEmu `keypad_on_check()` comparison
OS Timer is a 4th timer (32K crystal, not documented)	Boot hangs at ~20M cycles without it	ROM code analysis showed it waits for bit 4
Block I/O instructions execute atomically (INI/IND + eZ80-specific variants)	Trace count mismatches	CEmu source comparison

3.2 Bus: Memory Routing and Address Decoding

The Bus struct (bus.rs, 1929 lines) handles all memory access for the emulator. The 24-bit address space is decoded as:

Range	Region	Size	Wait States
0x000000–0x3FFFFF	Flash (ROM)	4MB	10 cycles read
0x400000–0xCFFFFF	Unmapped	—	LFSR pseudo-random on read
0xD00000–0xD3FFFF	RAM	256KB	4 read / 2 write
0xD40000–0xD657FF	VRAM	~150KB	4 read / 2 write
0xD65800–0xDFFFFF	Unmapped	—	LFSR pseudo-random
0xE00000–0xFFFFFF	MMIO Ports	—	2-4 cycles per port

Flash unlock detection: The bus monitors the fetch stream for a magic byte sequence (the FLASH_UNLOCK_SEQUENCE — DI; JR; DI; IM2; IM1; OUT0/IN0; BIT 2,A) that the ROM uses to unlock flash writes. When detected during privileged code fetch (PC in ROM range), flash write mode is enabled. This is a 16-byte or 17-byte pattern match on a 32-byte fetch ring buffer.

Memory protection: Three mechanisms enforced in write_byte():

Stack limit (ports 0x3A-0x3C): SP below limit triggers NMI
Protected range (ports 0x20-0x25): Writes to protected range from unprivileged code trigger NMI and are blocked
Flash privilege (port 0x28): Only privileged code can write to flash

Debug port interception: The bus intercepts writes to special addresses used by the CE toolchain's dbg_printf:

0xFB0000–0xFBFFFF: stdout (sequential address writes, NOT repeated writes to 0xFB0000)
0xFC0000–0xFCFFFF: stderr
0xFD0000: control (write 1 = clear console)
Null byte at 0xFB0000 exactly = program exit sentinel

Flash cache model: A 2-way set-associative cache with 128 sets and 32-byte cache lines, returning 2 cycles (same line), 3 cycles (cache hit), or 197 cycles (cache miss). This matches CEmu's flash_cache implementation exactly.

3.3 Scheduler: The 7.68 GHz Base Clock

The scheduler (scheduler.rs, 702 lines) is the timing backbone of the emulator. Rather than maintaining separate tick counters for each hardware clock, it uses a single base clock at 7,680,000,000 Hz — the LCM of all hardware clocks:

Clock	Rate	Base ticks per tick
CPU (speed 3)	48 MHz	160
CPU (speed 2)	24 MHz	320
CPU (speed 1)	12 MHz	640
CPU (speed 0)	6 MHz	1280
Panel	10 MHz	768
Clock48M	48 MHz	160
Clock24M	24 MHz	320
Clock32K	32.768 KHz	234,375

Why the LCM approach? All hardware timer events can be scheduled in base ticks with pure integer arithmetic — no floating-point, no rounding errors, no drift. A timer ticking at 32.768 KHz fires every 234,375 base ticks, which divides evenly into the base clock rate. This design was ported directly from CEmu's schedule.c.

9 event types: RTC, SPI, TimerDelay, Timer0, Timer1, Timer2, OsTimer, LCD, LcdDma.

Overflow prevention: The process_second() method subtracts one second (7,680,000,000 base ticks) from all timestamps whenever the base counter crosses a second boundary. This prevents u64 overflow while maintaining relative timing. The INACTIVE_FLAG (bit 63) marks disabled events — since timestamps never reach bit 63 due to the one-second normalization, this bit is safely repurposed.

HALT fast-forward: When the CPU enters HALT, instead of spinning cycle-by-cycle, the emulator calls scheduler.cycles_until_next_event() and jumps forward to the next event. This is critical for performance — boot involves many HALT periods.

3.4 Peripherals: 13 Hardware Modules

LCD Controller (`lcd.rs`, 1302 lines)

The LCD runs a 5-state DMA engine:

FRONT_PORCH → SYNC → LNBU → BACK_PORCH → ACTIVE_VIDEO → (repeat)

FRONT_PORCH: Idle period before sync
SYNC: Horizontal/vertical sync pulses. Timing registers are parsed here.
LNBU: Line Buffer Update. Prefills a 256-byte FIFO before active video begins.
BACK_PORCH: Idle after sync. DMA is scheduled during this phase.
ACTIVE_VIDEO: Actual pixel DMA from VRAM at 0xD40000. The UPCURR register increments as pixels are transferred.

Two separate clock domains: LCD state machine events on CLOCK_24M (24 MHz), DMA transfers on CLOCK_48M (48 MHz). The process_dma() method handles pixel-by-pixel DMA advancement, while fast_forward_dma_events() provides O(1) bulk skip for performance.

DMA cycle stealing (~7.7% overhead): The LCD DMA and CPU contend for the memory bus. Rather than explicitly scheduling CPU wait cycles, the emulator tracks dma_last_mem_timestamp and calculates elapsed DMA cycles on each CPU memory access via process_dma_stealing(). This adds ~13M cycles to the 168M boot (~7.7% overhead, matching CEmu exactly).

Palette: 256 entries stored as 1555 ARGB, converted to both BGR565 and RGB565 on write. The 8bpp mode (used by DOOM) indexes into this palette for each pixel.

Cursor image RAM (0xE30800–0xE30BFF, 1024 bytes): A discovery made during DOOM support — LibLoad (a CE C library loader) uses this LCD hardware register space as scratch storage. Without implementing it, CE C programs crash.

Timer System (`timer.rs`, 671 lines)

Three General Purpose Timers with a shared 32-bit control register (3 bits per timer: enable, clock source, overflow enable) and a 2-cycle interrupt delay pipeline:

Cycle 0: Timer match/overflow detected → delay_status bit set
Cycle 1: Status becomes visible → delay_intrpt bit set
Cycle 2: Interrupt actually fires

This pipeline is implemented via the TimerDelay scheduler event and process_delay() state machine. Getting this wrong caused the graphing hang bug — timer interrupts fired too early, and the ISR looped infinitely because the status bits weren't visible yet.

RTC (`rtc.rs`, 674 lines)

Real-time clock with a 3-state machine:

TICK: Normal time counting (sec → min → hour → day with rollovers)
LATCH: Time latched for reading (prevents tearing)
LOAD_LATCH: Loading new time from load registers (bit-level transfer with write masks)

The load process is particularly complex: writing control bit 6 triggers a load that takes 51 ticks at 32 KHz to complete. A status register at offset 0x40 returns a bitmask showing which fields (sec/min/hour/day) have been transferred. The ROM polls this register during boot, and getting the timing wrong caused the emulator to boot into "Classic" mode instead of "MathPrint" mode.

Constants: LATCH_TICK_OFFSET = 16429 ticks (~0.5 seconds at 32 KHz). The RTC is scheduled at this offset from the start of each second.

OS Timer (in `peripherals/mod.rs`)

A 4th timer source not found in standard Z80 documentation, running off a 32.768 KHz crystal. Tick intervals vary by CPU speed: 73 ticks at 6 MHz (~449 Hz), 153 at 12 MHz (~214 Hz), 217 at 24 MHz (~151 Hz), 313 at 48 MHz (~105 Hz). The ROM enables OS Timer interrupt bit 4 and waits — without it, boot stalls indefinitely.

The OS Timer has a subtle ordering requirement matching CEmu: the interrupt state is set to the OLD value before toggling. Getting this wrong caused the OS Timer to fire at the wrong phase.

SPI Controller (`spi.rs`, 627 lines)

16-deep FIFO (not 4 as initially assumed), with a critical RX-only mode: when CR0 bit 11 (FLASH) is set and the TX FIFO is empty, transfers continue automatically filling the RX FIFO. Without this, the ROM's second SPI polling loop exits early at step ~699,910.

The SPI controller drives a ST7789V LCD panel stub (panel.rs, 223 lines) that absorbs 9-bit SPI frames and parses panel initialization commands (MADCTL, COLMOD, CASET, RASET, RAMWR).

Interrupt Controller (`interrupt.rs`, 448 lines)

Two banks of 32-bit interrupt registers (status, enabled, latched, inverted). The set_source() method handles edge/level/inverted semantics: if (set XOR (inverted & mask)) then set status bit, else clear it (preserving latched). The pulse() method creates proper edges for inverted+latched signals like WAKE.

Interrupt sources: ON_KEY(0), TIMER1(1), TIMER2(2), TIMER3(3), OSTIMER(4), KEYPAD(10), LCD(11), PWR(15), WAKE(19).

3.5 Execution Loop

The main execution loop (emu.rs, run_cycles()) is the heart of the emulator:

for each cycle budget:
    1. Record opcode in execution history (64-entry ring buffer for crash diagnostics)
    2. Handle any_key_wake (clears HALT if any key pressed)
    3. Execute one CPU instruction via cpu.step(bus)
    4. Check armed instruction trace (for debugging)
    5. Advance scheduler by elapsed CPU cycles
    6. Handle CPU speed changes (port 0x01 writes require cycle conversion)
    7. Process all pending scheduler events (RTC, SPI, timers, LCD, DMA)
    8. Process DMA cycle stealing
    9. Schedule SPI transfers if needed
    10. Check NMI flag (memory protection violations)
    11. Tick peripherals (timers with delay pipeline, keypad, OS Timer)
    12. HALT fast-forward (batch up to 10,000 cycles, cap at scheduler second boundary)
    13. Periodic diagnostic logging (every 60 frames)

Frame rendering is separate from execution: render_frame() dispatches to either render_frame_8bpp() (palette lookup using BGR565) or render_frame_16bpp() (direct RGB565 from VRAM) based on the LCD's BPP mode.

4. Cross-Platform Frontends

4.1 Shared Design

All three platforms use identical percentage-based layout constants:

LCD position: left 11.53%, top 6.92%, width 76.74%, height 24.92% of calculator body
Keypad area: top 34.57%, height 65.43% of calculator body
D-pad region: left 63.97%, top 13.72%, width 22.01%, height 14.74% of keypad area
Body aspect ratio: 963/2239 (from the calculator body image)
49 button regions with identical percentage coordinates across all platforms, derived from a real TI-84 CE photograph via scripts/extract_buttons.py

All platforms use the same emulation timing: 800,000 cycles per frame at 60 FPS (= 48 MHz real-time), with non-linear speed steps from 0.25× to 20×.

State persistence uses SHA-256 of the ROM data, truncated to 16 hex chars, as a key. States are namespaced by backend type ("rust:<hash>" or "cemu:<hash>").

4.2 Android (`android/`, Kotlin + Jetpack Compose)

MainActivity.kt (~2100 lines, monolithic): Compose UI with EmulatorScreen composable. Emulation loop runs on Dispatchers.Default coroutine. Framebuffer copied to Bitmap via setPixels().
EmulatorBridge.kt: Loads libemu_jni.so via System.loadLibrary, which dynamically loads backend .so files. 18 JNI method declarations.
jni_loader.cpp (~630 lines): BackendInterface struct with function pointer table. loadBackend() uses dlopen() to load libemu_<name>.so. Thread-safe log callback deque (max 200 entries) forwarded to Android logcat.
cemu_adapter.c (~595 lines): Wraps CEmu's global-state API into instance-based interface. Singleton pattern. State save/load uses temp files because CEmu only supports FILE* API.
StateManager.kt: Thread-safe singleton, SHA-256 ROM hashing, auto-delete corrupted states.
Image keypad: ImageKeyButton composables with 2dp travel on press, brightness darkening via ColorMatrix(0.82).
D-pad: Canvas-drawn circular D-pad with arc segments, arrow indicators, hit-testing via angle/radius calculation.

4.3 iOS (`ios/`, Swift + SwiftUI)

ContentView.swift: EmulatorState class (ObservableObject) manages emulation lifecycle. Emulation loop runs on Task.detached(priority: .userInitiated).
EmulatorBridge.swift: NSLock-protected access to opaque C pointer. makeImage() creates CGImage from framebuffer using CGContext with premultipliedFirst | byteOrder32Little.
backend_bridge.c (~309 lines): Static linking variant. When HAS_RUST_BACKEND defined, declares rust_emu_* extern functions. current_backend pointer switches between rust_backend and cemu_backend const structs.
3 Xcode build configurations: Backend-Rust.xcconfig (links -lemu_rust), Backend-CEmu.xcconfig (links -lemu_cemu), Backend-Both.xcconfig (both).
ImageKeypadView.swift: Same 49 button regions, DragGesture(minimumDistance: 0) for press detection.
AppState: Singleton monitoring scenePhase for auto-save on background.

4.4 Web (`web/`, React + TypeScript + Vite)

Calculator.tsx (~800 lines): Main component with requestAnimationFrame loop, time accumulator for frame pacing, safety cap of 4 frames per rAF tick (30 in turbo mode).
RustBackend.ts: Wraps wasm-bindgen WasmEmu class. State persistence uses WASM memory snapshots — dumps the entire linear memory (~29MB via memcpy, ~4ms) rather than field-by-field serialization. On restore, grows memory if needed and copies back. Custom binary format with "WM01" header.
CEmuBackend.ts: Wraps Emscripten module. Sets up global stubs (emul_is_inited, emul_is_paused). ARGB→RGBA pixel conversion on every frame.
Chess auto-launch: Polls canvas pixel (310, 10) for green battery icon to detect homescreen ready, then fires Prgm→Prgm→2→Enter key sequence.
PWA: vite-plugin-pwa with versioned ROM/WASM caching, update banner, offline support. ROM manifest system for tracking content hashes.
Drag-and-drop: Supports dragging both ROM files and .8xp/.8xv programs onto the calculator.
Keyboard mapping: 50+ key bindings including F1-F5 for function row, Shift for 2nd (solo-press detection), V for sqrt (2nd + x² combo), Ctrl+R for resend programs.

5. Technical Tradeoffs & Decisions

5.1 Cycle-Accurate Scheduler via LCM Base Clock

Decision: Use a single 7.68 GHz base clock instead of separate tick counters per hardware clock.

Tradeoff: Large tick values (u64 required) but zero floating-point error. All hardware timers divide evenly into the base clock. The process_second() overflow prevention method keeps values bounded.

Alternative considered: Per-clock tick counters with conversion functions. Rejected because fractional cycles accumulate rounding errors over millions of instructions, causing drift that's impossible to debug.

5.2 Prefetch Pipeline Emulation

Decision: Implement a single-byte prefetch buffer that charges memory access cycles during the current instruction.

Tradeoff: Added complexity to every instruction fetch path (every call to fetch_byte() must handle the prefetch buffer). But without it, cycle counts were ~50% of CEmu's (10 cycles instead of 20 for flash reads). This was non-negotiable for parity.

5.3 Manual Serialization vs. Serde

Decision: Custom to_bytes()/from_bytes() for state snapshots, with a STATE_VERSION byte.

Tradeoff: Precise control over byte layout and smaller snapshots, but maintenance burden — STATE_VERSION was bumped 8 times as peripherals were added, and missed fields caused "RAM Cleared" bugs.

Alternative explored: Serde+bincode in a separate worktree (calc-serde branch). Required custom handling for types like [u8; 4194304] (flash) that don't implement serde traits. Not merged.

Web's approach: Bypassed the problem entirely by snapshotting the entire WASM linear memory (~29MB memcpy in ~4ms). This eliminates all serialization bugs at the cost of larger save files.

5.4 Image-Based Keypad vs. Programmatic Buttons

Decision: Pivot from programmatic gradient buttons to a photograph-based overlay with percentage-based hit regions.

Tradeoff: More realistic appearance and perfect cross-platform consistency (same coordinates file used everywhere), but harder to modify and larger asset size. The extract_buttons.py script crops individual button images from a high-res TI-84 CE photo and generates button_regions.json.

5.5 Visual Polling for Auto-Launch

Decision: Detect homescreen readiness by checking a single canvas pixel (310, 10) for the green battery indicator, rather than using fixed timing delays.

History: The chess auto-launch went through 6 iterations: fixed 3.5s delay → 2s delay → 1.2s delay → center pixel check → status bar check → single battery pixel check.

Tradeoff: Adapts to actual boot speed (different ROMs boot at different speeds), but fragile if the OS skin changes.

5.6 DMA Cycle Stealing via Timestamp Tracking

Decision: Track dma_last_mem_timestamp and calculate stolen DMA cycles lazily on each CPU memory access, rather than explicitly scheduling CPU wait states.

Tradeoff: Simpler code (no explicit wait state scheduling), but harder to debug because the timing effect is implicit. The ~7.7% cycle overhead (13M of 168M boot cycles) emerges from the interaction between DMA and CPU memory access patterns.

5.7 Debug Port Interception via Bus Cold Path

Decision: Intercept writes to 0xFB0000-0xFDFFFF (CE toolchain debug ports) in the bus's unmapped MMIO cold path.

Critical discovery: sprintf writes to SEQUENTIAL addresses (0xFB0000, 0xFB0001, ...), NOT repeated writes to the same address. A null byte at 0xFB0000 exactly means program exit; a null at any other offset means flush the buffer. This took multiple debugging sessions to understand.

5.8 Strategic / "why do this at all" tradeoffs

Why build this when CEmu already exists — CEmu is the gold-standard desktop emulator. The project's reason to exist is "CEmu runs on my laptop; I want to run my calculator from my phone and browser." Reframing the goal from "emulate the TI-84 CE" to "make the TI-84 CE portable" is what forced the tri-platform + WASM bet. A pure-desktop project would have had no reason not to just fork CEmu. Inferred.
Tri-platform (iOS + Android + web) for a project with effectively zero users — indie emulators typically pick one platform. Tri-platform multiplies ops complexity 3× but was the whole point — if the project ran on one platform there'd be no cross-platform architecture story. The no_std core + narrow C ABI decisions only pay off if you actually land on all three. Inferred.
Full-system OS emulation (boot the real ROM) over running user apps against a stubbed OS — apps-only would have been 10× easier; you could stub interrupts, LCD timing, and the OS event loop. Booting the 5.8.2 ROM is what surfaced the undocumented quirks (IM2=IM1, OS Timer, SPS/SPL) because the firmware exercises hardware the way no app alone does. "It boots real firmware" is the credibility claim that demanded this scope. Evidenced in Section 6 parity campaign.
Cycle-accurate from the start, not retrofitted — functional emulation would have booted the OS in days, not weeks. Choosing cycle-accurate upfront made CEmu traces the executable spec; anything off by a few cycles breaks interrupt timing and the boot hangs. A functional emulator retrofitted for timing would have meant rewriting the core mid-project. Evidenced — firmware is sensitive to timer interrupts, LCD refresh, RTC ticks.
25 days of obsessive cycle-level work on a no-user project — the defensible framing is that the project is the interview answer. The trace-diff methodology, 8-agent audit, 472 subagents, and CEmu parity campaign are the actual deliverable — the working emulator is the artifact proving the methodology. A simpler emulator shipped in 3 days would be a weaker portfolio signal. Inferred.

5.9 Additional architectural tradeoffs worth naming

Rust core + narrow C ABI over pure C or Rust-native bindings per platform — 15 extern "C" functions let CEmu slot in as a swappable backend and let the same .a drop into JNI / static-link on iOS / compile to WASM. A pure-C core (matching CEmu) would have made FFI trivial but given up memory safety and a zero-dependency WASM target; Rust-native bindings (UniFFI, wasm-bindgen) would have made each platform nicer but broken the dual-backend story. Inferred.
no_std-style core: no std::fs, no threads, no allocator for WASM — framebuffer is *const u32, ROMs are &[u8], save state is Vec<u8>. Platform owns frame pacing, persistence, logging. Makes the core trivially deterministic (precondition for trace-diffing) and drops into WASM without a runtime; cost is pushing the same concerns into three separate platform codebases. Inferred.
Android dlopen vs iOS static-link-with-symbol-prefix — asymmetric backend switching. iOS App Store policy forbids dlopen of non-system dylibs, forcing static linking of both cores and two prefixing schemes to avoid symbol collisions: Cargo's ios_prefixed feature renames Rust exports to rust_emu_*, and IOS_PREFIXED=1 macro-rewrites CEmu adapter exports to cemu_*. Android has no such constraint, so runtime dlopen with RTLD_LOCAL is strictly better there (smaller per-backend APKs, parallel builds, zero prefixing needed). Policy-dictated, not preference. Inferred.
Trace-diff against CEmu as the correctness oracle, not unit tests — 11 custom Rust examples + 3 Python scripts + a patched CEmu trace_gen became the parity toolchain; 178 unit tests are allowed to fail. eZ80 quirks (IM2≡IM1, SPS/SPL split, suffix atomicity, undocumented OS Timer) aren't in any published spec — CEmu's behavior is the executable spec. Unit tests can't express "boots to MathPrint instead of Classic." Evidenced — I in chat: "genuinely i dont give a shit if tests fail, they weren't catching shit before."
Big-bang subagent audit (8 parallel) for the parity roadmap, not sequential bisection — one session dispatched 8 agents to read CEmu source in parallel and produce the ~150-item roadmap that became the 7-phase plan. Pure trace-bisection would have surfaced dependencies emergently (CPU → bus → peripherals → timing) and wasted weeks on symptoms. Cost: ~20% of "critical" items needed human triage because subagents couldn't verify against traces. Evidenced in pr_comparison_report.md.

5.10 Code-level tradeoffs visible in the source

Core crate exposes a C ABI even for the web build, not a wasm-bindgen Rust API — core/Cargo.toml sets crate-type = ["staticlib", "rlib", "cdylib"] with ~15 C functions (emu_create, emu_run_cycles, emu_framebuffer, …) in emu.h. An idiomatic Rust build would give each platform its own binding style (wasm-bindgen for web, uniffi/swift-bridge for mobile). Picking C as the lowest common denominator means one header drives Swift's bridging header, Android's JNI wrapper, and a CEmu-compatible backend interface with identical signatures — at the cost of a global Mutex<Emu> on every FFI call. Evidenced.
Swappable backend ABI shapes the whole cross-platform layer, not just the test harness — web/src/emulator/types.ts defines a 20-method EmulatorBackend TypeScript interface implemented by both RustBackend and CEmuBackend; Android compiles two separate .so files via backend_wrapper.cpp with BACKEND_NAME set at compile time. Shipping only the Rust core and deleting the CEmu path would have been simpler — but keeping CEmu runtime-selectable means divergence bugs can be A/B-tested on-device, which is what makes the parity campaign mean anything. Evidenced.
Monolithic cpu/execute.rs (2,646 lines) with nested match y/z/p/q decoding, not a 256-entry dispatch table — the classic Z80 bitfield decomposition in giant nested matches; no opcode table of function pointers. Rust's match on small integers compiles to a jump table anyway, so the perf argument is moot — keeping the structured decomposition lets DD/FD/ED/CB prefix variants share helpers without a second dispatch indirection. Cost: one very large file, harder to diff against per-opcode tests. Evidenced.
Peripherals as sibling concrete modules keyed by MMIO base address, no MmioDevice trait — peripherals/mod.rs declares 12 concrete modules (flash, interrupt, keypad, lcd, …) and the bus dispatches by address range. A trait MmioDevice { fn read(&mut self, off) -> u8 } + Vec<Box<dyn _>> would be more "Rust-like." The peripheral set is hardware-fixed so no plugin story is needed, dynamic dispatch would block inlining on hot paths (LCD palette reads), and each peripheral keeps peripheral-specific public methods (e.g. KeypadController::set_key) directly callable from the FFI layer without an enum wrapper. Evidenced.

6. The CEmu Parity Campaign

6.1 Overview

The parity campaign was the single largest engineering effort in the project — a systematic 7-phase overhaul to make the Rust emulator match CEmu's behavior at the instruction level. The campaign was driven by an extensive toolchain of trace generation, comparison, and analysis tools.

6.2 The Parity Toolchain

Trace generation (Rust side):

cargo run --example debug -- trace [steps] — Generates space-separated trace files with step, cycles, PC, SP, AF, BC, DE, HL, IX, IY, ADL, IFF1, IFF2, IM, HALT, opcode
cargo run --example debug -- fulltrace [steps] — Generates comprehensive JSON traces including all I/O operations per instruction (RAM reads/writes, MMIO port access)

Trace generation (CEmu side):

tools/cemu-test/trace_gen.c — Links against CEmu's libcemucore.a, generates traces in the exact same format as the Rust side
tools/cemu-test/parity_check.c — Checks CEmu state at 14 cycle milestones (1M through 60M)

Comparison tools:

scripts/compare_traces.py — PC-synced comparison with prefix lookahead (CEmu counts DD/FD prefixes as separate instructions)
scripts/find_first_divergence.py — JSON fulltrace comparison with I/O operation matching
core/examples/dense_compare.rs — PC-aligned comparison using HashMap-based lookup with 5-step lookahead

Targeted investigation tools (11 specialized Rust examples):

find_divergence.rs — Tracks PC at known CEmu cycle checkpoints
scheduler_debug.rs — Monitors RTC event timing in scheduler
rtc_timing_compare.rs — Compares RTC load timing between implementations
check_0072fa.rs — Single-steps 70M cycles checking specific poll loop address
mathprint_check.rs — Monitors MathPrint flag at cycle checkpoints
Plus 6 more specialized analysis tools

6.3 The 7 Phases

Phase 1: CPU Instruction Correctness (Effort: L)

Fixed RETI IFF1 restore
Fixed register pair mapping (ED x=0 z=7 p=3: IY→IX)
Added missing eZ80 instructions: LD I,HL (ED C7), LD HL,I (ED D7), LEA IY,IX+d (ED 55)
Implemented block I/O (all Z80 + eZ80-specific variants)
Fixed EX DE,HL L-mode masking
Fixed block BC decrement (preserve BCU in Z80 mode)
Verification: Boot 132.79M cycles, PC=085B80. 250/436 tests passing.

Phase 2: Bus & Address Decoding (Effort: M)

Flash routing for 0x400000–0xBFFFFF
MMIO unmapped holes
Port range 0xF routing
SPI in memory-mapped path
Backlight routing
Verification: Boot 132.79M, 251/436 tests.

Phase 3: Peripheral Register Layout Rewrites (Effort: XL — the largest phase)

Timer rewrite: Replaced 3 separate Timer structs with unified GeneralTimers. Shared 32-bit control register (3 bits per timer), status register, mask register.
Keypad register packing: Single 32-bit control register with mode (2 bits) + rowWait (14 bits) + scanWait (16 bits). 16 data registers. GPIO enable. Reset mask 0xFFFF.
Watchdog offset fix: Counter and load value offsets were SWAPPED. Revision corrected to 0x00010602.
Verification: Boot 156.10M, 272/457 tests.

Phase 4: Scheduler & Timing (Effort: L)

SCHED_SECOND overflow prevention
CPU speed change event conversion
Panel clock rate 60Hz → 10,000,000 Hz
OS Timer interrupt phase fix (set state to OLD value before toggling)
Timer 32 KHz clock source
Timer 2-cycle interrupt delay pipeline: Match detected → status visible → interrupt fires. Required TimerDelay scheduler event, process_delay() state machine, delay_status: u32 and delay_intrpt: u16 fields.
Verification: Boot 156.10M, 272/457 tests.

Phase 5: RTC, SHA256, Control Ports (Effort: M)

RTC 3-state machine with time counting, latching, and load data transfer
SHA256 64-round compression function
Control port masks (port 0x01: & 0x13, port 0x29: & 1)
Flash size_config reset (0x07 → 0x00)
INT_PWR interrupt on reset
Verification: Boot 108.78M, 259/437 tests.

Phase 6: LCD & SPI Enhancements (Effort: L)

LCD DMA 5-state event machine with dual clock domains
256-entry palette storage
SPI panel stub (ST7789V)
LCD ICR, MIS, UPCURR, LPCURR registers
Verification: Boot 156.10M, 277/455 tests.

Phase 7: CPU Advanced & Bus Protection (Effort: XL, Risk: High)

Separate SPS/SPL stack pointers (CPU SNAPSHOT_SIZE 64 → 67)
Mixed-mode CALL/RET/RST with MADL|ADL flag byte
Memory protection (stack limit NMI, protected range, flash privilege)
DMA scheduling with cycle stealing (~7.7%)
HALT fast-forward, interrupt prefetch_discard, R rotation
Verification: Boot 168.14M cycles, 277/455 tests. ALL PHASES COMPLETE.

6.4 The Comparison Report

PR #56's supporting document (docs/pr_comparison_report.md) was generated by 8 parallel analysis agents, each comparing a different subsystem of the Rust emulator against CEmu. It identified ~150 specific discrepancies organized into three tiers:

20 critical issues (CPU missing instructions, register layout mismatches, entirely missing peripherals)
19 high issues (timing differences, routing bugs, missing clock sources)
25+ medium issues (edge cases, missing features, reset value differences)

This document served as the roadmap for all 7 phases.

7. AI Agent Orchestration for Testing & Debugging

7.1 Overview of AI's Role

The project was ~80% AI co-authored (262/332 commits). Claude Code was used in 37+ main conversation sessions with 472 subagent invocations across those sessions, generating ~73 MB of conversation data (~9 MB in main sessions, ~64 MB in subagent sessions). The heaviest sessions had 89 subagent files (session 13baee3d, Feb 8) and 76 subagent files (session dc8b876a, Jan 30-31).

I directed the architecture, defined the parity methodology, tested on real devices, identified bugs, and corrected the AI when it went wrong. Claude served as implementation workhorse — writing code, running traces, and deploying subagents per my direction. The division of labor was:

I (direction & decisions)	AI (execution & research)
Designed dual-backend architecture	Deployed up to 89 parallel subagents for research
Defined parity methodology & toolchain	Read and compared CEmu C source code against Rust
Tested on real Android/iOS devices	Wrote implementation code per my direction
Identified bugs via device testing	Generated traces and analyzed divergences
Corrected wrong AI approaches	Created PRs with descriptions
Decided what to build/cut/revert	Rebaked ROMs (17 consecutive times in one session)
Set quality bar ("I want perfect")	Iterated until my bar was met

7.2 Subagent Usage Patterns

Subagents were the primary mechanism for scaling the AI's research and implementation capacity. The project used 472 subagent sessions across the 37 main sessions, with distinct usage patterns:

Research subagents (the most common pattern): These were deployed in parallel to read and analyze source code. The most dramatic example was the 8-agent peripheral audit (PR #56), where 8 subagents simultaneously compared each peripheral module in the Rust emulator against its CEmu C counterpart. Each agent would:

Read the CEmu C source file (e.g., cemu-ref/core/timers.c)
Read the corresponding Rust source file (e.g., core/src/peripherals/timer.rs)
Produce a detailed report of every register offset, default value, timing behavior, and control flow difference

This pattern was repeated at smaller scale throughout the project. When investigating a bug, 2-3 research subagents might be deployed simultaneously — one reading the CEmu source for a specific subsystem, one reading the Rust implementation, and one analyzing trace divergences.

Implementation subagents: For larger feature work (e.g., the 7-phase parity overhaul), subagents were used to implement specific fixes within a phase while the main agent coordinated. A typical pattern:

Main agent defines the fix needed (e.g., "rewrite timer register layout to match CEmu's packed 32-bit format")
Subagent reads both implementations, writes the new Rust code
Main agent integrates, runs tests, verifies parity

Worktree subagents: Claude Code's worktree feature was used for parallel feature development. Subagents worked in isolated git worktrees (e.g., calc-worktrees/rtc-ticking, calc-worktrees/doom, calc-worktrees/image-keypad) to develop features without blocking the main branch. The image keypad worktree had its own memory file with 6 subagent sessions documenting SwiftUI pitfalls and button region coordinate systems.

Session intensity distribution:

2 sessions with 75+ subagents (the boot-to-homescreen push on Jan 30-31, and the image keypad + parity work on Feb 8)
~5 sessions with 20-40 subagents (major feature work)
~10 sessions with 5-15 subagents (focused debugging or feature implementation)
~20 sessions with 0-5 subagents (quick fixes, documentation, interview prep)

Cross-session knowledge transfer: Because subagent context is lost when a session ends, the project relied on persistent files for continuity:

MEMORY.md (68 lines of distilled learnings in .claude/projects/)
CLAUDE.md (7.6KB of workflow docs, memory map, trace formats in repo root)
docs/findings.md (15KB of hardware discoveries)
docs/milestones.md (7.1KB phase tracker)

Each new session's first action was typically reading these files to rebuild context. When a session hit context limits (which happened frequently during the parity campaign), the findings from that session were written to findings.md or milestones.md before ending, ensuring the next session could continue.

7.3 The Parity Testing/Debugging Loop

The CEmu parity campaign was my core methodology for achieving correctness: systematically compare every subsystem against the reference implementation, prioritize fixes by dependency order, and verify each fix pushes the divergence point further. The AI executed this methodology at scale. Here's how it worked:

Phase 1: Systematic Comparison Research

I directed a comprehensive audit of every peripheral subsystem against CEmu's C source. Claude deployed 8 parallel subagents simultaneously, each assigned a different subsystem:

Agent comparing CPU execution (cpu.c vs execute.rs)
Agent comparing bus/memory (bus.c/mem.c vs bus.rs/memory.rs)
Agent comparing LCD/backlight (lcd.c vs lcd.rs)
Agent comparing timers (timers.c vs timer.rs)
Agent comparing interrupt controller (interrupt.c vs interrupt.rs)
Agent comparing keypad (keypad.c vs keypad.rs)
Agent comparing RTC/SHA256 (realclock.c/sha256.c vs rtc.rs/sha256.rs)
Agent comparing scheduler/control (schedule.c/control.c vs scheduler.rs/control.rs)

Each agent read both the Rust and C source files in full and produced a detailed report of every discrepancy — register layout differences, timing behavior differences, missing features, wrong default values. The reports were merged into docs/pr_comparison_report.md (~150 issues).

Phase 2: Prioritized Fix Implementation

I organized the ~150 issues into 7 phases based on dependency ordering (CPU first, then bus, then peripherals, then timing). Each phase had:

Clear deliverables (specific registers to fix, specific behaviors to match)
A verification checkpoint (boot cycle count + test count)
Effort and risk ratings

The AI implemented fixes per the phase plan, then verified by generating a trace and comparing against CEmu:

cargo run --example debug -- trace 100000    # Generate Rust trace
cd tools/cemu-test && ./trace_gen ../../ROM -n 100000  # Generate CEmu trace
python scripts/compare_traces.py cemu.log rust.log     # Compare

Phase 3: Divergence Bisection

When traces diverged, I directed the AI to use a binary-search approach to find the exact instruction:

Generate a long trace (100K+ steps)
Find the first PC where Rust ≠ CEmu
Read the surrounding instructions for context
Look up the divergent instruction in CEmu's source
Compare the implementation in Rust
Fix the discrepancy
Re-run and verify the fix pushed the divergence further out

This loop was repeated hundreds of times. Key milestones: 40K steps → 700K steps → 3.2M steps → full boot (3.6M steps pre-DMA, 168M with DMA).

Phase 4: Targeted Investigation Tools

When standard trace comparison was insufficient, I directed the AI to build specialized investigation tools targeting specific subsystems. 11 custom Rust examples were created during the parity campaign:

rtc_timing_compare.rs — Compared RTC load timing between Rust and CEmu at 12 checkpoints from 0 to 50M cycles. Finding: "Our RTC load status returns 0x00 (complete) too early. This timing difference causes the poll loop at 0x0072FA to exit earlier."
scheduler_debug.rs — Monitored RTC event scheduling. Finding: At 48 MHz, 1 CPU cycle = 160 base ticks; RTC fires every 16,429 ticks at 32 KHz = ~24M CPU cycles delay.
check_0072fa.rs — Single-stepped 70M cycles checking one specific poll loop address that CEmu visits but Rust didn't. Finding: Different control flow due to RTC timing.
mathprint_check.rs — Monitored the MathPrint flag (0xD000C4 bit 5) at 8 cycle checkpoints. Finding: The flag was never set to 0x20 (MathPrint mode) because the RTC timing caused a different code path.

Phase 5: Multi-Session Continuity

The parity campaign spanned multiple conversation sessions that hit context limits. To maintain continuity:

The MEMORY.md file in .claude/projects/ stored distilled learnings (68 lines)
The CLAUDE.md file in the repo documented workflow, key addresses, trace formats
docs/findings.md (15KB) captured every hardware discovery
docs/milestones.md (7.1KB) tracked phase completion status

When a new session started, Claude would read these files to rebuild context, then continue from where the last session left off.

7.4 Specific Debugging Stories Driven by AI Agents

The SPI Divergence Hunt (699,900 → 3.2M steps):

Divergence at step 699,900: After SPI STATUS read, CEmu A=0x20 (2 transfers pending), Rust A=0x00 (all complete)
AI agents analyzed CEmu's scheduler-driven SPI: sched_set(SCHED_SPI, ticks), transfer duration = bitCount * ((cr1 & 0xFFFF) + 1) ticks at 24MHz
The initial "complete ALL transfers on first read" approach worked at step 418K (3 transfers) but failed at 699K (6 transfers, only 4 should complete)
Solution: Implement event-driven scheduler for SPI. This pushed parity to 3.2M steps.
Next divergence at 3,216,456: RTC load status timing. This took 4 more dedicated investigation tools to diagnose.

The 8-Agent Peripheral Audit (Feb 5-6):

8 agents deployed simultaneously, each comparing a subsystem
Total: ~150 issues identified across all peripherals
Produced a 35KB comparison report that became the 7-phase implementation roadmap
The agent findings revealed that some peripheral implementations were fundamentally wrong (e.g., timer register layout was completely different, keypad control register packing was wrong, watchdog offsets were swapped)

The MathPrint vs Classic Mode Investigation (5 custom tools):

Problem: Emulator boots into Classic mode, CEmu boots into MathPrint mode
mathprint_check.rs: Found the flag is never set to 0x20
check_0072fa.rs: Found the poll loop at 0x0072FA behaves differently
rtc_timing_compare.rs: Found RTC load completes in ~75K cycles (Rust) vs ~24M cycles (CEmu)
scheduler_debug.rs: Found the RTC event offset of 16,429 ticks was correct but the load processing timing was wrong
Resolution: Fix RTC load state machine to process at correct 32 KHz rate

7.5 Human-AI Interaction Patterns

my direction and quality enforcement:

Set the parity standard: "i dont want close, i want perfect"
Demanded correct approaches: "make sure this is the correct way to do things, i dont want to use hacky workarounds"
Redirected flailing AI: "stop making random guesses and use comprehensive logging"
Maintained project knowledge when AI lost context: "are you forgetting the things you learned in your findings?"
Prioritized correctness over test coverage: "genuinely i dont give a shit if tests fail, they weren't catching shit before"

I correcting the AI:

AI focused on backlight brightness for the power-off bug when the real issue was the LCD enable bit — I identified the correct root cause and redirected
AI frequently forgot previous session findings across context boundaries — I pointed it back to findings.md and milestones.md
AI's interview prep contained unverifiable claims (e.g., calling a first-time-correct implementation a "redesign") — I caught the fabrications and demanded accuracy
AI over-engineered the PWA implementation — I reverted the work and directed a simpler approach
AI made the emulator boot into Classic mode instead of MathPrint — I identified the symptom on device and directed the multi-tool investigation

What the AI executed well (when properly directed):

Deploying parallel subagents for research before writing code
Building specialized investigation tools for specific subsystems
The trace → compare → fix → verify loop (my methodology, AI's execution)
ROM baking automation (17 consecutive rebakes at 4-5 AM)
Cross-file refactoring (updating all 3 platforms simultaneously)

8. Claude Code Usage Statistics

8.1 Session & Token Data

Metric	Value
Main conversation sessions	37 session directories (24 indexed in sessions-index.json)
Main JSONL conversation files	4 files, 9.2 MB total
Subagent invocations	472 subagent JSONL files, 64 MB total
Total conversation data	~73 MB
PRs created from sessions	13
Unique branches worked on	13
Related worktree projects	2 (calc-web, calc-worktrees-image-keypad)

8.2 Token Usage (extracted from main JSONL files)

Category	Tokens
Input tokens	1,017,775
Output tokens	222,085
Cache read tokens	462,457,150
Cache creation tokens	73,134,709
Total	~537M tokens

Note: These counts are from the 4 main JSONL files only. The 472 subagent sessions (64 MB of data) would add significantly more tokens — likely bringing the true total well above 1B tokens for the project.

8.3 Estimated Cost (Opus pay-per-token pricing)

Category	Rate	Est. Cost
Input	$15/M tokens	$15
Output	$75/M tokens	$17
Cache reads	$1.50/M tokens	$694
Cache creation	$18.75/M tokens	$1,371
Main sessions total		~$2,097

Note: This estimate uses Claude Opus 4 pricing. The actual cost depends on which model was used per session (Opus 4.5/4.6 vs Sonnet 4.5). Max plan subscribers pay a flat rate, so actual billing may differ. The subagent sessions' token costs are not included in this estimate.

8.4 Session Breakdown by JSONL File

File	Size	Responses	Content
`e262789c.jsonl`	4.88 MB	96	Interview prep, chat export generation
`534ad7db.jsonl`	3.82 MB	538	Chess mode, auto-launch, ROM baking, Shift+R
`f814347a.jsonl`	285 KB	22	Test framework investigation
`e910f4c6.jsonl`	189 KB	35	Web frontend test setup (Vitest, 44 keypad tests)

8.5 AI Model Distribution Across Commits

Model	Commits	Notes
Claude Opus 4.5	215	Primary model through mid-February
Claude Opus 4.6	74	Adopted mid-February
Claude Sonnet 4.5	15	Used sporadically
Codex	1	Single worktree snapshot
No AI attribution	70	Merge commits, manual fixes, config tweaks

9. Major Bugs & Debugging Stories

9.1 The Magnitude Error (10⁹ bug)

Symptom: 6+7 showed 1300000000, 99*99 showed wrong answer.

Investigation: Spanned many sessions. Traced BCD floating-point operations, checked OP1-OP6 register addresses, examined TI-OS format buffer.

Root cause: Using self.adl instead of self.l for data register wrapping. L mode controls data addressing width (16-bit vs 24-bit), while ADL controls instruction/PC width. LDIR/LDDR were using 24-bit addressing when they should have used 16-bit for data operations.

False leads: Initially suspected display formatting (decimal point not written), then keypad timing, then OS Timer frequency.

9.2 The Graphing Hang

Symptom: Screen freezes after graphing, loader stops spinning.

Root cause: Timer raw bits accumulating and causing infinite ISR loops. Timer interrupt clearing in tick_peripherals was not properly clearing the raw status bits before re-evaluating.

Fix: Proper implementation of the 2-cycle timer interrupt delay pipeline.

9.3 The "Done" Bug

Symptom: First calculation after boot shows "Done" instead of numeric result.

Root cause: TI-OS expression parser not initialized. The OS expects an ENTER key to have been processed before the first calculation.

Fix: Auto-inject ENTER key on first user interaction (in Rust core, cross-platform).

9.4 ON Key Wake From Sleep

Symptom: ON button doesn't work after power-off (APO or 2nd+ON).

Investigation: Multiple sessions tracing CEmu's keypad_on_check(), control.off flag, power state.

Root causes (multiple): Battery status port returning hardcoded 0 instead of 0xFE (the OS rejected wakes because it thought the battery was dead). on_key_wake was one-shot instead of persistent. The APD (Automatic Power Down) disable wasn't clearing the right flag at 0xD00088.

9.5 The 50× Performance Regression

Caused by: Phase 4 scheduler changes (timer interrupt delay pipeline).

Root cause: Scheduler base_cycles_offset u64 underflow — subtracting a larger value from a smaller one wrapped around to near-maximum u64.

Fix: Added next_event_ticks cache to avoid scanning all events, fixed the underflow.

9.6 State Restore "RAM Cleared"

Symptom: Restoring a saved state showed "RAM cleared" message.

Root cause: Multiple peripheral fields not included in state snapshots: SPI controller state, watchdog state, cursor registers, needs_lcd_event, needs_lcd_clear, memory protection registers (stack limit, protected range). The OS detected inconsistency on restore.

Fix: Multiple STATE_VERSION bumps (reached version 8) to add missing fields. Spawned multiple investigation subagents to systematically audit what was and wasn't saved.

9.7 Chess Opening Books Not Loading

Symptom: Chess shows "BK:NONE" (file open fails) vs CEmu showing "BK:32768".

Root cause: fileioc (CE C library) stores curr_slot at LCD register address 0xE30C11 and resize_amount at 0xE30C0C — in LCD cursor image RAM (0x800-0xBFF) which wasn't implemented.

How found: Added breakpoint mechanism to Emu struct, used disasm command, traced through _ChkFindSym bcall. Multiple subagent sessions to understand fileioc's internals.

10. Running Real Software: DOOM & Chess

10.1 DOOM Support (PR #68)

Getting DOOM running required several subsystem additions:

8bpp LCD rendering: The calculator normally uses 16bpp RGB565, but DOOM uses 8bpp indexed color with a 256-entry palette. Added render_frame_8bpp() with palette lookup.
.8xp/.8xv file parser (ti_file.rs): Parse TI file format headers, checksums, variable entries. Supports programs, protected programs, AppVars.
Flash archive injection: inject_archive_entry() writes flag byte 0xFC, 2-byte size, type, version, self-referential address, name, data into flash. find_archive_free_addr() scans sectors 0x0C0000-0x3B0000.
SendKey mechanism: Pokes OS RAM directly at 0xD0058C (kbdKey), 0xD0058E (keyExtend), bit 5 of 0xD0009F (keyReady). Uses bus.poke_byte() to bypass memory protection.
Launch sequence: ENTER → CLEAR → Asm( → prgm → D,O,O,M → ENTER
LCD cursor image RAM: Extended LCD address space to include 0x800-0xBFF (1024 bytes) used by LibLoad as scratch storage.
Keypad range extension: Extended KEYPAD_END from 0x150048 to 0x151000 (full 4KB page) because DOOM needed the full range.

10.2 Chess Integration

The chess engine (from the ce-games submodule) is a fully-featured chess program running on the eZ80:

Alpha-beta negamax with PVS, aspiration windows, null-move pruning (R=2), LMR, futility pruning
Texel-tuned PeSTO piece-square tables
4096-entry transposition table (always-replace)
Polyglot opening book split across AppVars (TI-OS 64KB limit), up to 131K entries
~154K cycles/node, ~2000 Elo at Expert difficulty (15s/move)
Automated tournament system (emu_tournament.py) running eZ80 engine vs Stockfish

Web chess mode (/chess route): Fetches chess.bin (1.9MB gzipped ROM), auto-launches at 5× speed using visual polling for boot detection.

11. Development Timeline & Velocity

Date	Day	PRs	Key Achievement
Jan 27	1	#1-#4	CPU + memory + peripherals. Full eZ80 in one afternoon.
Jan 28	2	#6-#9	40K step parity. IM2 fix. OS Timer.
Jan 29	3	#10-#13	OS boots to home screen (3.6M steps). Scheduler implemented.
Jan 30-31	4-5	#14-#20	Keypad working. Magnitude error fixed (L vs ADL modes).
Feb 1	6	#21-#33	CEmu backend + iOS app + 13 PRs in one day.
Feb 2	7	#34-#44	Runtime backend switching. State persistence. Web app.
Feb 5-6	10-11	#51-#58	7-phase CEmu parity overhaul (PR #56). WASM optimization.
Feb 7-9	12-14	#59-#68	DOOM runs. Image keypad. ON key wake.
Feb 10-14	15-19	#69-#78	Live file send. Debug port interception. Sudoku.
Feb 16-20	21-25	#81-#86	Chess mode. PWA offline. Shift+R dev shortcut.

Key velocity facts:

Boot-to-homescreen achieved in 3 days from initial commit
Full eZ80 CPU (3,675 lines, 124 tests) implemented in one afternoon
Tri-platform support (Android + iOS + Web) in 6 days
From "first instruction runs" to "DOOM runs" in 13 days

12. PR & Commit History Analysis

12.1 Merge Patterns

Squash merges: ~43 (64%) — used for most feature branches from PR #16 onward
Merge commits: ~24 (36%) — used for earlier PRs and larger merges
No rebases on main — clean linear history via squash

12.2 AI Co-Authorship

262/332 commits (79%) have AI co-author attribution
Claude Opus 4.5: 215 commits (primary model through mid-Feb)
Claude Opus 4.6: 74 commits (adopted mid-February)
Claude Sonnet 4.5: 15 commits (sporadic)
Codex: 1 commit (worktree snapshot)
70 commits (21%) have no AI attribution (merge commits, manual fixes, config tweaks)

12.3 Reverts

One revert: commit 645aeb1 reverted PR #2's CPU implementation immediately after squash-merge, then re-merged from a different branch topology. This was a merge strategy correction, not a code quality issue.

12.4 Closed PRs (8 total)

PRs #46-#50 (5 individual CEmu parity features): All absorbed into monolithic PR #56
PR #35 (unified state persistence): Superseded by per-platform PRs #39, #41, #43
PR #63 (state restore perf): Incorporated into later PRs
PR #85 (chess + Shift+R): Split into PRs #84 and #86

12.5 PR Dependency Chains

Core emulation: #1→#2→#4→#6→#7→#8→#9→#10→#11→#12→#13 (boot achieved)
CEmu parity: #22→#23→#30→#45→#51→#56→#59
Gaming: #61→#68→#70→#71→#74→#77→#78→#82→#84
State persistence: #39→#41→#42→#43→#53→#57→#81

13. Key Files Reference

Core Emulator

File	Lines	Purpose
`core/src/emu.rs`	3168	Main orchestrator, execution loop, frame rendering
`core/src/cpu/execute.rs`	2646	All instruction execution (largest file)
`core/src/bus.rs`	1929	Memory routing, flash unlock, debug ports
`core/src/disasm.rs`	1544	Full eZ80 disassembler
`core/src/peripherals/lcd.rs`	1302	LCD controller + 5-state DMA engine
`core/src/peripherals/keypad.rs`	899	8×7 key matrix, scan modes, edge detection
`core/src/cpu/helpers.rs`	858	ALU, register access, prefetch
`core/src/peripherals/control.rs`	838	CPU speed, battery FSM, memory protection
`core/src/scheduler.rs`	702	Event scheduler, 7.68 GHz base clock
`core/src/peripherals/rtc.rs`	674	RTC 3-state machine
`core/src/peripherals/timer.rs`	671	3× GPT with delay pipeline
`core/src/peripherals/spi.rs`	627	SPI + 16-deep FIFO
`core/src/peripherals/mod.rs`	599	Port routing, tick orchestration
`core/src/memory.rs`	587	Flash + RAM + NOR commands
`core/src/peripherals/flash.rs`	576	Flash controller registers
`core/src/peripherals/interrupt.rs`	448	2-bank interrupt controller
`core/src/lib.rs`	432	C ABI exports, SyncEmu
`core/src/ti_file.rs`	379	.8xp/.8xv parser
`core/src/wasm.rs`	325	WASM bindings
`core/src/peripherals/sha256.rs`	307	SHA-256 compression
`core/include/emu.h`	52	C ABI contract

Debug Tools

File	Purpose
`core/examples/debug.rs` (~2900 lines)	Swiss Army knife CLI: boot, trace, fulltrace, screen, vram, calc, sendfile, bakerom, run, rundoom
`tools/cemu-test/trace_gen.c`	CEmu trace generator in matching format
`tools/cemu-test/parity_check.c`	CEmu state checker at cycle milestones
`scripts/compare_traces.py`	PC-synced trace comparison
`scripts/find_first_divergence.py`	JSON fulltrace comparison with I/O matching

Documentation

File	Purpose
`docs/findings.md` (15KB)	All hardware discoveries, bug findings, lessons
`docs/milestones.md` (7.1KB)	7-phase parity roadmap (all complete)
`docs/pr_comparison_report.md`	8-agent CEmu comparison (~150 issues)
`CLAUDE.md` (7.6KB)	Claude Code workflow, memory map, trace format
`README.md` (17KB)	Comprehensive project docs
`outline_v2.md`	Blog post draft with narrative framing

Platform Frontends

File	Lines	Purpose
`android/.../MainActivity.kt`	~2100	Monolithic Android UI
`android/.../jni_loader.cpp`	~630	JNI + dynamic backend loading
`android/.../cemu_adapter.c`	~595	CEmu wrapper for Android
`web/src/Calculator.tsx`	~800	Main web component
`web/src/emulator/RustBackend.ts`	—	WASM memory snapshot save/load
`ios/Calc/Bridge/EmulatorBridge.swift`	—	Swift FFI bridge
`ios/Calc/Bridge/backend_bridge.c`	~309	iOS static backend switching

Thorin

TI-84 Plus CE Emulator — Deep Technical Profile ​

Table of Contents ​

1. Project Overview ​

Libraries & Frameworks ​

Rust core (core/Cargo.toml) ​

Android frontend (android/) ​

iOS frontend (ios/) ​

Web frontend (web/) ​

CEmu reference (cemu-ref/) ​

Build orchestration ​

2. Architecture ​

2.1 Dual-Backend Design ​

2.2 Core Module Architecture ​

2.3 Design Principles ​

3. The Rust Emulator Core ​

3.1 CPU: The eZ80 Processor ​

3.2 Bus: Memory Routing and Address Decoding ​

3.3 Scheduler: The 7.68 GHz Base Clock ​

3.4 Peripherals: 13 Hardware Modules ​

LCD Controller (lcd.rs, 1302 lines) ​

Timer System (timer.rs, 671 lines) ​

RTC (rtc.rs, 674 lines) ​

OS Timer (in peripherals/mod.rs) ​

SPI Controller (spi.rs, 627 lines) ​

Interrupt Controller (interrupt.rs, 448 lines) ​

3.5 Execution Loop ​

4. Cross-Platform Frontends ​

4.1 Shared Design ​

4.2 Android (android/, Kotlin + Jetpack Compose) ​

4.3 iOS (ios/, Swift + SwiftUI) ​

4.4 Web (web/, React + TypeScript + Vite) ​

5. Technical Tradeoffs & Decisions ​

5.1 Cycle-Accurate Scheduler via LCM Base Clock ​

5.2 Prefetch Pipeline Emulation ​

5.3 Manual Serialization vs. Serde ​

5.4 Image-Based Keypad vs. Programmatic Buttons ​

5.5 Visual Polling for Auto-Launch ​

5.6 DMA Cycle Stealing via Timestamp Tracking ​

5.7 Debug Port Interception via Bus Cold Path ​

5.8 Strategic / "why do this at all" tradeoffs ​

5.9 Additional architectural tradeoffs worth naming ​

5.10 Code-level tradeoffs visible in the source ​

6. The CEmu Parity Campaign ​

6.1 Overview ​

6.2 The Parity Toolchain ​

6.3 The 7 Phases ​

6.4 The Comparison Report ​

7. AI Agent Orchestration for Testing & Debugging ​

7.1 Overview of AI's Role ​

7.2 Subagent Usage Patterns ​

7.3 The Parity Testing/Debugging Loop ​

7.4 Specific Debugging Stories Driven by AI Agents ​

7.5 Human-AI Interaction Patterns ​

8. Claude Code Usage Statistics ​

8.1 Session & Token Data ​

8.2 Token Usage (extracted from main JSONL files) ​

8.3 Estimated Cost (Opus pay-per-token pricing) ​

8.4 Session Breakdown by JSONL File ​

8.5 AI Model Distribution Across Commits ​

9. Major Bugs & Debugging Stories ​

9.1 The Magnitude Error (10⁹ bug) ​

9.2 The Graphing Hang ​

9.3 The "Done" Bug ​

9.4 ON Key Wake From Sleep ​

9.5 The 50× Performance Regression ​

9.6 State Restore "RAM Cleared" ​

9.7 Chess Opening Books Not Loading ​

10. Running Real Software: DOOM & Chess ​

10.1 DOOM Support (PR #68) ​

10.2 Chess Integration ​

11. Development Timeline & Velocity ​

12. PR & Commit History Analysis ​

12.1 Merge Patterns ​

12.2 AI Co-Authorship ​

12.3 Reverts ​

12.4 Closed PRs (8 total) ​

12.5 PR Dependency Chains ​

13. Key Files Reference ​

Core Emulator ​