Appearance
TI-84 Plus CE Emulator — Deep Technical Profile
Build timeline — 25 active days across 4 phases (Jan 27 – Feb 20, 2026)
- Scaffold + eZ80 CPU (1 day) — memory, bus, instruction set, early tests
- Peripherals → boot to OS (5 days) — flash controller, ON key wake, interrupts, 3.2M+ instruction trace parity, boot to home screen
- Multi-platform + CEmu parity campaign (7 days) — iOS/Android/web backends, dual-backend switching, WASM, 7-phase cycle-accurate trace comparison push, DMA/SPI/LCD fidelity
- Games + polish (13 days) — image keypad, .8xp/.8xv loading, DOOM support, baked-in Sudoku/chess ROMs, PWA service worker, save-state fixes, speed slider
Table of Contents
- Architecture
- The Rust Emulator Core
- Cross-Platform Frontends
- Technical Tradeoffs & Decisions
- The CEmu Parity Campaign
- AI Agent Orchestration for Testing & Debugging
- Claude Code Usage Statistics
- Major Bugs & Debugging Stories
- Running Real Software: DOOM & Chess
- Development Timeline & Velocity
- PR & Commit History Analysis
- Key Files Reference
1. Project Overview
A cycle-accurate TI-84 Plus CE graphing calculator emulator built from scratch in Rust, with native frontends for Android (Kotlin/Jetpack Compose), iOS (Swift/SwiftUI), and Web (React/TypeScript/WASM). The emulator faithfully reproduces the Zilog eZ80 processor, all 13 hardware peripherals, and the TI-OS operating system — achieving instruction-level behavioral parity with CEmu, the established open-source reference emulator.
By the numbers:
- ~15,000 lines of Rust core, ~38,000 total across all platforms
- 332 commits across 80+ branches, 67 merged PRs
- Built in 25 days (January 27 – February 20, 2026)
- 168,140,000 cycles to boot TI-OS (verified against CEmu)
- 277/455 unit tests passing (178 failures from pre-existing prefetch initialization, not regressions)
- WASM binary: 148KB uncompressed, 96KB gzipped
- ~80% AI co-authored (262/332 commits have Claude co-author attribution)
Libraries & Frameworks
Rust core (core/Cargo.toml)
- wasm-bindgen / js-sys / web-sys — generate the JS glue layer that wraps the Rust emulator's C ABI for the web build; gated behind the
wasmfeature. - console_error_panic_hook — routes Rust panics in the WASM build to the browser console so crashes aren't silent.
- chrono — dev-only, used by test fixtures that need deterministic date/time values.
- No other runtime crates — the core is intentionally
no_std-style (nostd::fs, no threads, no network).
Android frontend (android/)
- Jetpack Compose (BOM 2023.10.01) + Material 3 — the entire UI (calculator, keypad, D-pad, settings) is Compose.
- kotlinx-coroutines-android — emulation loop runs on
Dispatchers.Default; UI state flows back on the main dispatcher. - androidx.activity-compose / lifecycle-runtime-ktx / core-ktx — standard Compose+lifecycle glue for the single-Activity app.
- Android NDK + CMake 3.22.1 — compiles
jni_loader.cpp,cemu_adapter.c, and the CEmu C sources into the per-backend.sos. - cargo-ndk — cross-compiles the Rust core to
arm64-v8a/armeabi-v7a/x86_64/x86as static libs that get linked intolibemu_rust.so. - JUnit 4 + AndroidX Test + Espresso — unit and instrumentation test harness.
iOS frontend (ios/)
- SwiftUI — all UI (calculator view, keypad, settings sheets) is SwiftUI, driven by an
ObservableObjectEmulatorState. - Foundation / CoreGraphics —
NSLockguards the FFI pointer;CGContext+CGImageturn the raw ARGB framebuffer into somethingImagecan render. - CryptoKit — SHA-256 hashes ROMs for save-state association.
- UniformTypeIdentifiers — file-type UTIs for the ROM/
.8xpdocument picker. - os.log — structured logging with per-subsystem categories.
- Xcode build configs +
.xcconfigfiles —Backend-Rust/Backend-CEmu/Backend-Bothcontrol which static libs link in.
Web frontend (web/)
- React 19 + react-dom — UI for the browser emulator.
- Vite 7 + @vitejs/plugin-react — dev server, HMR, production bundling.
- vite-plugin-pwa (Workbox) — service-worker generation for offline install/launch.
- TypeScript 5.9 — types for the backend interface, the two backend wrappers, and UI code.
- wasm-pack / wasm-bindgen — produces
emu_core.wasm+ JS bindings from the Rust core. - Emscripten (
emcc) — compiles CEmu C sources toWebCEmu.wasm(skips the C adapter;CEmuBackend.tsadapts in TS). - Vitest + jsdom + @testing-library/{react,jest-dom,user-event} — test suite for TS backends and components.
- ESLint 9 + typescript-eslint + React hooks/refresh plugins — lint config.
CEmu reference (cemu-ref/)
- Pure C against stdlib only (
stdlib.h,stdio.h,string.h,stdint.h,time.h, etc.) — no third-party C deps.emscripten.his included only when compiling the WASM target.
Build orchestration
- Cargo for Rust, CMake + NDK for Android native, Xcode for iOS, Emscripten + Vite for web, top-level Make targets (
make android,make ios,make web) to kick each off.
2. Architecture
2.1 Dual-Backend Design
The project's most distinctive architectural decision is its dual-backend system. Both the custom Rust emulator and the CEmu C reference emulator conform to the same 15-function contract in core/include/emu.h — but each platform implements its own bridge to wire that contract up to its UI runtime. The bridges are not shared: Android uses dlopen, iOS uses static linking, and web skips C ABI entirely and goes through a TypeScript interface.
┌────────────────────────────────────────────────────────────────────┐
│ Platform UI Layer │
│ Android Compose │ iOS SwiftUI │ Web React │
└──────────┬───────────────┴────────┬──────────┴─────────┬───────────┘
│ │ │
┌───────▼────────┐ ┌───────▼────────┐ ┌───────▼────────┐
│ Android Bridge │ │ iOS Bridge │ │ Web Bridge │
│ │ │ │ │ │
│ JNI + │ │ Swift + │ │ TS factory + │
│ jni_loader.cpp │ │ backend_ │ │ EmulatorBack- │
│ (dlopen .so) │ │ bridge.c │ │ end interface │
│ │ │ (static link) │ │ (dyn import) │
└───────┬────────┘ └───────┬────────┘ └────────┬───────┘
│ │ │
└────────────────┬───────┴─────────────────────┘
│ C ABI: 15 extern "C" emu_* fns (emu.h, 52 lines)
│ (on web: wasm-bindgen / Emscripten wrap it into TS)
┌───────────────────┴────────────────────┐
│ │
┌───────▼────────────────────────┐ ┌──────────▼─────────────────────┐
│ Rust Emulator Core │ │ CEmu Reference Emulator │
│ (core/, ~15,000 lines) │ │ (upstream C in cemu-ref/) │
│ │ │ │
│ eZ80 CPU │ │ Full eZ80 system: │
│ • execute.rs (2,646 lines) │ │ CPU + MMU, memory, flash, │
│ • step, flags, ALU, helpers │ │ LCD, timers, keypad, │
│ Memory bus (1,929 lines) │ │ SPI, RTC, interrupts, │
│ Flash / RAM / VRAM (587) │ │ SHA-256, watchdog, … │
│ Event scheduler (702) │ │ │
│ 13 peripheral modules: │ │ Uses global/singleton state, │
│ LCD, Timer, Keypad, SPI, │ │ FILE*-based save/load. │
│ RTC, Flash, SHA, Interrupt, │ │ │
│ Backlight, Watchdog, Panel, │ │ cemu_adapter.c (~595 lines, │
│ Control, (Peripherals root) │ │ shared Android+iOS) wraps it │
│ │ │ into instance-based emu_* C │
│ Exports: #[no_mangle] │ │ ABI matching emu.h. │
│ extern "C" fn emu_* │ │ │
│ │ │ Web has no adapter: Emscripten│
│ │ │ export list is called direct │
│ │ │ from CEmuBackend.ts. │
│ │ │ │
│ Artifacts per platform: │ │ Artifacts per platform: │
│ Android: libemu_rust.so │ │ Android: libemu_cemu.so │
│ iOS: libemu_core.a │ │ iOS: libcemu_adapter.a │
│ Web: emu_core.wasm │ │ Web: WebCEmu.wasm │
└────────────────────────────────┘ └────────────────────────────────┘The shared C ABI contract (emu.h, 52 lines) defines 15 functions both backends must export:
- Lifecycle:
emu_create(),emu_destroy() - ROM:
emu_load_rom(),emu_send_file() - Execution:
emu_reset(),emu_power_on(),emu_run_cycles() - Display:
emu_framebuffer()(ARGB8888, 320×240),emu_is_lcd_on() - Input:
emu_set_key(row, col, down)(8×7 matrix) - State:
emu_save_state_size(),emu_save_state(),emu_load_state() - Misc:
emu_get_backlight(),emu_set_log_callback()
What each platform bridge actually does:
- Android (
android/app/src/main/cpp/jni_loader.cpp, ~630 lines) — JNI layer loadslibemu_jni.so, whichdlopen()s a backend.so(libemu_rust.soorlibemu_cemu.so) and resolvesbackend_*-prefixed symbols viadlsym()into a localBackendInterfacedispatch table (17 fields:handle,name, the 15 ABI functions, plus an optionalset_temp_dir). Backends live in separate.sos, so symbol collisions are impossible —RTLD_LOCALgives each its own namespace. - iOS (
ios/Calc/Bridge/backend_bridge.c, ~309 lines +EmulatorBridge.swift) — App Store policy forbidsdlopenof non-system dylibs, so both backends are statically linked. Collision is avoided by building the Rust core with theios_prefixedCargo feature, which renames its exports torust_emu_*, and by compiling the CEmu adapter withIOS_PREFIXED=1which macro-rewrites its exports tocemu_*.backend_bridge.cdeclares both prefixed symbol sets extern, populates twoconst BackendInterfacestructs (16 fields:name+ 15 fn ptrs), andemu_backend_set(name)swings acurrent_backendpointer between them. Three xcconfigs (Backend-Rust,Backend-CEmu,Backend-Both) control which.as link in. - Web (
web/src/emulator/) — No C ABI at the binding seam. A TypeScript interfaceEmulatorBackend(types.ts) is implemented byRustBackend.ts(wrapping wasm-bindgen-generated bindings aroundemu_core.wasm) andCEmuBackend.ts(wrapping an Emscripten module aroundWebCEmu.wasm).createBackend(type)inindex.tsis a plain TS factory; each.wasmmodule lives in its own instance, so there is no shared symbol space to collide in.
The CEmu adapter itself — cemu_adapter.c (~595 lines, under android/.../cpp/cemu/) — is shared between Android and iOS (iOS CMake reuses the same source, just with IOS_PREFIXED=1). Web has no equivalent: CEmuBackend.ts calls Emscripten exports directly. So the emulator cores are shared across platforms; the bridges that host them are not.
Why dual backends? The primary motivation was parity-driven development. Having CEmu as a runtime-swappable reference meant:
- Any behavior difference could be observed in real-time on the same device
- The Rust implementation could be validated against CEmu at every stage
- Users had a working fallback while the Rust core was incomplete
- A/B comparison screenshots could be generated instantly
The tradeoff: doubled integration surface area, two build systems (Cargo + CMake/Emscripten), and per-platform collision handling — dlopen + RTLD_LOCAL on Android, dual prefixing (ios_prefixed + IOS_PREFIXED=1) on iOS, separate .wasm instances on web.
2.2 Core Module Architecture
The Rust emulator core (core/src/, ~15,000 lines) is organized into 5 main modules and 13 peripheral modules:
core/src/
├── lib.rs (432 lines) — C ABI exports, SyncEmu thread-safe wrapper
├── emu.rs (3168 lines) — Emulator orchestrator, execution loop, frame rendering
├── bus.rs (1929 lines) — Memory bus, address decoding, flash unlock, debug ports
├── memory.rs (587 lines) — Flash (4MB), RAM (256KB+VRAM), NOR flash commands
├── scheduler.rs (702 lines) — Event scheduler, 7.68 GHz base clock, 9 event types
├── disasm.rs (1544 lines) — Full eZ80 disassembler
├── ti_file.rs (379 lines) — .8xp/.8xv TI file format parser
├── wasm.rs (325 lines) — WASM FFI bindings
├── cpu/
│ ├── mod.rs (653 lines) — CPU state, step(), interrupt handling
│ ├── execute.rs (2646 lines) — All instruction execution (largest file)
│ ├── helpers.rs (858 lines) — ALU ops, register access, fetch/prefetch
│ ├── flags.rs (25 lines) — Flag bit constants (C, N, PV, H, Z, S, F3, F5)
│ └── tests/ (5420 lines) — instructions, modes, parity tests
└── peripherals/
├── mod.rs (599 lines) — Port routing, tick orchestration, state persistence
├── control.rs (838 lines) — CPU speed, battery FSM, flash unlock, memory protection
├── lcd.rs (1302 lines) — LCD controller, 5-state DMA engine, palette
├── timer.rs (671 lines) — 3× GPT with 2-cycle interrupt delay pipeline
├── rtc.rs (674 lines) — Real-time clock, 3-state machine, 6 interrupt types
├── keypad.rs (899 lines) — 8×7 matrix, scan modes, edge detection
├── spi.rs (627 lines) — SPI controller, 16-deep FIFO, panel stub
├── interrupt.rs(448 lines) — 2-bank interrupt controller with inversion/latching
├── flash.rs (576 lines) — Flash controller registers, wait state management
├── sha256.rs (307 lines) — SHA-256 block compression (64-round)
├── panel.rs (223 lines) — ST7789V LCD panel stub (SPI target)
├── backlight.rs(72 lines) — PWM backlight brightness
└── watchdog.rs (248 lines) — Watchdog timer stub2.3 Design Principles
No platform dependencies in the core. The Rust core has no
std::fs, nostd::io, nostd::net. It doesn't know about files, logging, threading, or what platform it's running on. Everything flows through byte buffers via the C ABI. The only external crates arewasm-bindgen/js-sys/web-sys(optional, gated behind thewasmfeature flag), andchrono(dev-only, for tests).Stable C ABI. All exports use
extern "C"with#[no_mangle]. TheSyncEmuwrapper wrapsEmuin aMutex<Emu>for thread safety. Raw pointers are used at the FFI boundary, withBox::into_raw/Box::from_rawfor lifecycle management.Single-threaded deterministic core. The emulator is purely deterministic — given the same ROM and inputs, it produces the same outputs every time. Threading is the platform's responsibility.
Buffer-based I/O. ROM loaded as
&[u8], framebuffer exposed as*const u32(ARGB8888), save states serialized toVec<u8>. No file handles cross the FFI boundary.
3. The Rust Emulator Core
3.1 CPU: The eZ80 Processor
The TI-84 Plus CE uses a Zilog eZ80 processor — an extended Z80 with 24-bit addressing (ADL mode). This is NOT a standard Z80; it has critical differences that are poorly documented and caused numerous bugs during development.
CPU State (Cpu struct, cpu/mod.rs):
- Main registers:
a: u8,f: u8,bc/de/hl: u32(24-bit) - Shadow registers:
a_prime,f_prime,bc_prime,de_prime,hl_prime - Index registers:
ix,iy(24-bit) - Dual stack pointers:
sps(Z80 16-bit),spl(ADL 24-bit) — selected by L mode flag - Special:
pc: u32,i: u16,r: u8,mbase: u8 - State flags:
iff1,iff2,im: InterruptMode,adl,halted - Per-instruction mode:
l,il,suffix,madl,prefix,prefetch ei_delay: u8— 2-step delayed interrupt enable (EI enables interrupts after the NEXT instruction)
Instruction Execution (execute.rs, 2646 lines — the largest file):
The CPU uses x-y-z-p-q opcode decomposition. Each instruction goes through:
Prefetch: Return the previously-prefetched byte, read the next byte into the prefetch buffer. This mirrors CEmu's hardware prefetch and is critical for cycle accuracy — without it, cycle counts were ~50% too low.
Suffix detection loop: Opcodes 0x40, 0x49, 0x52, 0x5B are suffix opcodes (.SIS, .LIS, .SIL, .LIL) that modify the next instruction's L/IL modes. They execute atomically with the following instruction — a single
step()call, not two. Getting this wrong caused trace count mismatches with CEmu.Dispatch: Based on the x field (bits 7:6): x=0 →
execute_x0, x=1 → LD r,r' or HALT, x=2 → ALU A,r, x=3 →execute_x3. Prefixed instructions (CB/ED/DD/FD) have their own dispatch tables.Interrupt check: If
iff1 && irq_pending, push return address and jump to 0x0038.
eZ80 Architectural Surprises (each of these caused boot failures):
| Discovery | Impact | How Found |
|---|---|---|
| IM2 = IM1 on eZ80 (ignores I register, jumps to 0x0038) | Implementing standard Z80 IM2 crashed the boot | Trace comparison at ~9K steps |
| Separate SPS/SPL stack pointers | Mixed-mode CALL/RET pushed wrong-width addresses | CEmu source reading + test failures |
| Suffix opcodes execute atomically | Step count mismatched CEmu traces | Trace comparison showed 2 steps vs 1 |
R register rotation: LD R,A uses (A<<1) | (A>>7) | R register diverged from CEmu (parity test failures) | Trace comparison |
LD A,MB (ED 6E) — load memory base register | #1 boot blocker, not in Z80 docs | ROM disassembly |
| F3/F5 flags preserved from previous F in ALU ops | Flag divergence in SPI polling loops | 29 dedicated parity tests |
| ON key wakes from HALT even with DI | Boot sequence stalled at HALT | CEmu keypad_on_check() comparison |
| OS Timer is a 4th timer (32K crystal, not documented) | Boot hangs at ~20M cycles without it | ROM code analysis showed it waits for bit 4 |
| Block I/O instructions execute atomically (INI/IND + eZ80-specific variants) | Trace count mismatches | CEmu source comparison |
3.2 Bus: Memory Routing and Address Decoding
The Bus struct (bus.rs, 1929 lines) handles all memory access for the emulator. The 24-bit address space is decoded as:
| Range | Region | Size | Wait States |
|---|---|---|---|
| 0x000000–0x3FFFFF | Flash (ROM) | 4MB | 10 cycles read |
| 0x400000–0xCFFFFF | Unmapped | — | LFSR pseudo-random on read |
| 0xD00000–0xD3FFFF | RAM | 256KB | 4 read / 2 write |
| 0xD40000–0xD657FF | VRAM | ~150KB | 4 read / 2 write |
| 0xD65800–0xDFFFFF | Unmapped | — | LFSR pseudo-random |
| 0xE00000–0xFFFFFF | MMIO Ports | — | 2-4 cycles per port |
Flash unlock detection: The bus monitors the fetch stream for a magic byte sequence (the FLASH_UNLOCK_SEQUENCE — DI; JR; DI; IM2; IM1; OUT0/IN0; BIT 2,A) that the ROM uses to unlock flash writes. When detected during privileged code fetch (PC in ROM range), flash write mode is enabled. This is a 16-byte or 17-byte pattern match on a 32-byte fetch ring buffer.
Memory protection: Three mechanisms enforced in write_byte():
- Stack limit (ports 0x3A-0x3C): SP below limit triggers NMI
- Protected range (ports 0x20-0x25): Writes to protected range from unprivileged code trigger NMI and are blocked
- Flash privilege (port 0x28): Only privileged code can write to flash
Debug port interception: The bus intercepts writes to special addresses used by the CE toolchain's dbg_printf:
- 0xFB0000–0xFBFFFF: stdout (sequential address writes, NOT repeated writes to 0xFB0000)
- 0xFC0000–0xFCFFFF: stderr
- 0xFD0000: control (write 1 = clear console)
- Null byte at 0xFB0000 exactly = program exit sentinel
Flash cache model: A 2-way set-associative cache with 128 sets and 32-byte cache lines, returning 2 cycles (same line), 3 cycles (cache hit), or 197 cycles (cache miss). This matches CEmu's flash_cache implementation exactly.
3.3 Scheduler: The 7.68 GHz Base Clock
The scheduler (scheduler.rs, 702 lines) is the timing backbone of the emulator. Rather than maintaining separate tick counters for each hardware clock, it uses a single base clock at 7,680,000,000 Hz — the LCM of all hardware clocks:
| Clock | Rate | Base ticks per tick |
|---|---|---|
| CPU (speed 3) | 48 MHz | 160 |
| CPU (speed 2) | 24 MHz | 320 |
| CPU (speed 1) | 12 MHz | 640 |
| CPU (speed 0) | 6 MHz | 1280 |
| Panel | 10 MHz | 768 |
| Clock48M | 48 MHz | 160 |
| Clock24M | 24 MHz | 320 |
| Clock32K | 32.768 KHz | 234,375 |
Why the LCM approach? All hardware timer events can be scheduled in base ticks with pure integer arithmetic — no floating-point, no rounding errors, no drift. A timer ticking at 32.768 KHz fires every 234,375 base ticks, which divides evenly into the base clock rate. This design was ported directly from CEmu's schedule.c.
9 event types: RTC, SPI, TimerDelay, Timer0, Timer1, Timer2, OsTimer, LCD, LcdDma.
Overflow prevention: The process_second() method subtracts one second (7,680,000,000 base ticks) from all timestamps whenever the base counter crosses a second boundary. This prevents u64 overflow while maintaining relative timing. The INACTIVE_FLAG (bit 63) marks disabled events — since timestamps never reach bit 63 due to the one-second normalization, this bit is safely repurposed.
HALT fast-forward: When the CPU enters HALT, instead of spinning cycle-by-cycle, the emulator calls scheduler.cycles_until_next_event() and jumps forward to the next event. This is critical for performance — boot involves many HALT periods.
3.4 Peripherals: 13 Hardware Modules
LCD Controller (lcd.rs, 1302 lines)
The LCD runs a 5-state DMA engine:
FRONT_PORCH → SYNC → LNBU → BACK_PORCH → ACTIVE_VIDEO → (repeat)- FRONT_PORCH: Idle period before sync
- SYNC: Horizontal/vertical sync pulses. Timing registers are parsed here.
- LNBU: Line Buffer Update. Prefills a 256-byte FIFO before active video begins.
- BACK_PORCH: Idle after sync. DMA is scheduled during this phase.
- ACTIVE_VIDEO: Actual pixel DMA from VRAM at 0xD40000. The UPCURR register increments as pixels are transferred.
Two separate clock domains: LCD state machine events on CLOCK_24M (24 MHz), DMA transfers on CLOCK_48M (48 MHz). The process_dma() method handles pixel-by-pixel DMA advancement, while fast_forward_dma_events() provides O(1) bulk skip for performance.
DMA cycle stealing (~7.7% overhead): The LCD DMA and CPU contend for the memory bus. Rather than explicitly scheduling CPU wait cycles, the emulator tracks dma_last_mem_timestamp and calculates elapsed DMA cycles on each CPU memory access via process_dma_stealing(). This adds ~13M cycles to the 168M boot (~7.7% overhead, matching CEmu exactly).
Palette: 256 entries stored as 1555 ARGB, converted to both BGR565 and RGB565 on write. The 8bpp mode (used by DOOM) indexes into this palette for each pixel.
Cursor image RAM (0xE30800–0xE30BFF, 1024 bytes): A discovery made during DOOM support — LibLoad (a CE C library loader) uses this LCD hardware register space as scratch storage. Without implementing it, CE C programs crash.
Timer System (timer.rs, 671 lines)
Three General Purpose Timers with a shared 32-bit control register (3 bits per timer: enable, clock source, overflow enable) and a 2-cycle interrupt delay pipeline:
- Cycle 0: Timer match/overflow detected →
delay_statusbit set - Cycle 1: Status becomes visible →
delay_intrptbit set - Cycle 2: Interrupt actually fires
This pipeline is implemented via the TimerDelay scheduler event and process_delay() state machine. Getting this wrong caused the graphing hang bug — timer interrupts fired too early, and the ISR looped infinitely because the status bits weren't visible yet.
RTC (rtc.rs, 674 lines)
Real-time clock with a 3-state machine:
- TICK: Normal time counting (sec → min → hour → day with rollovers)
- LATCH: Time latched for reading (prevents tearing)
- LOAD_LATCH: Loading new time from load registers (bit-level transfer with write masks)
The load process is particularly complex: writing control bit 6 triggers a load that takes 51 ticks at 32 KHz to complete. A status register at offset 0x40 returns a bitmask showing which fields (sec/min/hour/day) have been transferred. The ROM polls this register during boot, and getting the timing wrong caused the emulator to boot into "Classic" mode instead of "MathPrint" mode.
Constants: LATCH_TICK_OFFSET = 16429 ticks (~0.5 seconds at 32 KHz). The RTC is scheduled at this offset from the start of each second.
OS Timer (in peripherals/mod.rs)
A 4th timer source not found in standard Z80 documentation, running off a 32.768 KHz crystal. Tick intervals vary by CPU speed: 73 ticks at 6 MHz (~449 Hz), 153 at 12 MHz (~214 Hz), 217 at 24 MHz (~151 Hz), 313 at 48 MHz (~105 Hz). The ROM enables OS Timer interrupt bit 4 and waits — without it, boot stalls indefinitely.
The OS Timer has a subtle ordering requirement matching CEmu: the interrupt state is set to the OLD value before toggling. Getting this wrong caused the OS Timer to fire at the wrong phase.
SPI Controller (spi.rs, 627 lines)
16-deep FIFO (not 4 as initially assumed), with a critical RX-only mode: when CR0 bit 11 (FLASH) is set and the TX FIFO is empty, transfers continue automatically filling the RX FIFO. Without this, the ROM's second SPI polling loop exits early at step ~699,910.
The SPI controller drives a ST7789V LCD panel stub (panel.rs, 223 lines) that absorbs 9-bit SPI frames and parses panel initialization commands (MADCTL, COLMOD, CASET, RASET, RAMWR).
Interrupt Controller (interrupt.rs, 448 lines)
Two banks of 32-bit interrupt registers (status, enabled, latched, inverted). The set_source() method handles edge/level/inverted semantics: if (set XOR (inverted & mask)) then set status bit, else clear it (preserving latched). The pulse() method creates proper edges for inverted+latched signals like WAKE.
Interrupt sources: ON_KEY(0), TIMER1(1), TIMER2(2), TIMER3(3), OSTIMER(4), KEYPAD(10), LCD(11), PWR(15), WAKE(19).
3.5 Execution Loop
The main execution loop (emu.rs, run_cycles()) is the heart of the emulator:
for each cycle budget:
1. Record opcode in execution history (64-entry ring buffer for crash diagnostics)
2. Handle any_key_wake (clears HALT if any key pressed)
3. Execute one CPU instruction via cpu.step(bus)
4. Check armed instruction trace (for debugging)
5. Advance scheduler by elapsed CPU cycles
6. Handle CPU speed changes (port 0x01 writes require cycle conversion)
7. Process all pending scheduler events (RTC, SPI, timers, LCD, DMA)
8. Process DMA cycle stealing
9. Schedule SPI transfers if needed
10. Check NMI flag (memory protection violations)
11. Tick peripherals (timers with delay pipeline, keypad, OS Timer)
12. HALT fast-forward (batch up to 10,000 cycles, cap at scheduler second boundary)
13. Periodic diagnostic logging (every 60 frames)Frame rendering is separate from execution: render_frame() dispatches to either render_frame_8bpp() (palette lookup using BGR565) or render_frame_16bpp() (direct RGB565 from VRAM) based on the LCD's BPP mode.
4. Cross-Platform Frontends
4.1 Shared Design
All three platforms use identical percentage-based layout constants:
- LCD position: left 11.53%, top 6.92%, width 76.74%, height 24.92% of calculator body
- Keypad area: top 34.57%, height 65.43% of calculator body
- D-pad region: left 63.97%, top 13.72%, width 22.01%, height 14.74% of keypad area
- Body aspect ratio: 963/2239 (from the calculator body image)
- 49 button regions with identical percentage coordinates across all platforms, derived from a real TI-84 CE photograph via
scripts/extract_buttons.py
All platforms use the same emulation timing: 800,000 cycles per frame at 60 FPS (= 48 MHz real-time), with non-linear speed steps from 0.25× to 20×.
State persistence uses SHA-256 of the ROM data, truncated to 16 hex chars, as a key. States are namespaced by backend type ("rust:<hash>" or "cemu:<hash>").
4.2 Android (android/, Kotlin + Jetpack Compose)
MainActivity.kt(~2100 lines, monolithic): Compose UI withEmulatorScreencomposable. Emulation loop runs onDispatchers.Defaultcoroutine. Framebuffer copied to Bitmap viasetPixels().EmulatorBridge.kt: Loadslibemu_jni.soviaSystem.loadLibrary, which dynamically loads backend.sofiles. 18 JNI method declarations.jni_loader.cpp(~630 lines):BackendInterfacestruct with function pointer table.loadBackend()usesdlopen()to loadlibemu_<name>.so. Thread-safe log callback deque (max 200 entries) forwarded to Android logcat.cemu_adapter.c(~595 lines): Wraps CEmu's global-state API into instance-based interface. Singleton pattern. State save/load uses temp files because CEmu only supportsFILE*API.StateManager.kt: Thread-safe singleton, SHA-256 ROM hashing, auto-delete corrupted states.- Image keypad:
ImageKeyButtoncomposables with 2dp travel on press, brightness darkening viaColorMatrix(0.82). - D-pad: Canvas-drawn circular D-pad with arc segments, arrow indicators, hit-testing via angle/radius calculation.
4.3 iOS (ios/, Swift + SwiftUI)
ContentView.swift:EmulatorStateclass (ObservableObject) manages emulation lifecycle. Emulation loop runs onTask.detached(priority: .userInitiated).EmulatorBridge.swift:NSLock-protected access to opaque C pointer.makeImage()createsCGImagefrom framebuffer usingCGContextwithpremultipliedFirst | byteOrder32Little.backend_bridge.c(~309 lines): Static linking variant. WhenHAS_RUST_BACKENDdefined, declaresrust_emu_*extern functions.current_backendpointer switches betweenrust_backendandcemu_backendconst structs.- 3 Xcode build configurations:
Backend-Rust.xcconfig(links-lemu_rust),Backend-CEmu.xcconfig(links-lemu_cemu),Backend-Both.xcconfig(both). ImageKeypadView.swift: Same 49 button regions,DragGesture(minimumDistance: 0)for press detection.AppState: Singleton monitoringscenePhasefor auto-save on background.
4.4 Web (web/, React + TypeScript + Vite)
Calculator.tsx(~800 lines): Main component withrequestAnimationFrameloop, time accumulator for frame pacing, safety cap of 4 frames per rAF tick (30 in turbo mode).RustBackend.ts: Wraps wasm-bindgenWasmEmuclass. State persistence uses WASM memory snapshots — dumps the entire linear memory (~29MB viamemcpy, ~4ms) rather than field-by-field serialization. On restore, grows memory if needed and copies back. Custom binary format with "WM01" header.CEmuBackend.ts: Wraps Emscripten module. Sets up global stubs (emul_is_inited,emul_is_paused). ARGB→RGBA pixel conversion on every frame.- Chess auto-launch: Polls canvas pixel (310, 10) for green battery icon to detect homescreen ready, then fires Prgm→Prgm→2→Enter key sequence.
- PWA:
vite-plugin-pwawith versioned ROM/WASM caching, update banner, offline support. ROM manifest system for tracking content hashes. - Drag-and-drop: Supports dragging both ROM files and .8xp/.8xv programs onto the calculator.
- Keyboard mapping: 50+ key bindings including F1-F5 for function row, Shift for 2nd (solo-press detection), V for sqrt (2nd + x² combo), Ctrl+R for resend programs.
5. Technical Tradeoffs & Decisions
5.1 Cycle-Accurate Scheduler via LCM Base Clock
Decision: Use a single 7.68 GHz base clock instead of separate tick counters per hardware clock.
Tradeoff: Large tick values (u64 required) but zero floating-point error. All hardware timers divide evenly into the base clock. The process_second() overflow prevention method keeps values bounded.
Alternative considered: Per-clock tick counters with conversion functions. Rejected because fractional cycles accumulate rounding errors over millions of instructions, causing drift that's impossible to debug.
5.2 Prefetch Pipeline Emulation
Decision: Implement a single-byte prefetch buffer that charges memory access cycles during the current instruction.
Tradeoff: Added complexity to every instruction fetch path (every call to fetch_byte() must handle the prefetch buffer). But without it, cycle counts were ~50% of CEmu's (10 cycles instead of 20 for flash reads). This was non-negotiable for parity.
5.3 Manual Serialization vs. Serde
Decision: Custom to_bytes()/from_bytes() for state snapshots, with a STATE_VERSION byte.
Tradeoff: Precise control over byte layout and smaller snapshots, but maintenance burden — STATE_VERSION was bumped 8 times as peripherals were added, and missed fields caused "RAM Cleared" bugs.
Alternative explored: Serde+bincode in a separate worktree (calc-serde branch). Required custom handling for types like [u8; 4194304] (flash) that don't implement serde traits. Not merged.
Web's approach: Bypassed the problem entirely by snapshotting the entire WASM linear memory (~29MB memcpy in ~4ms). This eliminates all serialization bugs at the cost of larger save files.
5.4 Image-Based Keypad vs. Programmatic Buttons
Decision: Pivot from programmatic gradient buttons to a photograph-based overlay with percentage-based hit regions.
Tradeoff: More realistic appearance and perfect cross-platform consistency (same coordinates file used everywhere), but harder to modify and larger asset size. The extract_buttons.py script crops individual button images from a high-res TI-84 CE photo and generates button_regions.json.
5.5 Visual Polling for Auto-Launch
Decision: Detect homescreen readiness by checking a single canvas pixel (310, 10) for the green battery indicator, rather than using fixed timing delays.
History: The chess auto-launch went through 6 iterations: fixed 3.5s delay → 2s delay → 1.2s delay → center pixel check → status bar check → single battery pixel check.
Tradeoff: Adapts to actual boot speed (different ROMs boot at different speeds), but fragile if the OS skin changes.
5.6 DMA Cycle Stealing via Timestamp Tracking
Decision: Track dma_last_mem_timestamp and calculate stolen DMA cycles lazily on each CPU memory access, rather than explicitly scheduling CPU wait states.
Tradeoff: Simpler code (no explicit wait state scheduling), but harder to debug because the timing effect is implicit. The ~7.7% cycle overhead (13M of 168M boot cycles) emerges from the interaction between DMA and CPU memory access patterns.
5.7 Debug Port Interception via Bus Cold Path
Decision: Intercept writes to 0xFB0000-0xFDFFFF (CE toolchain debug ports) in the bus's unmapped MMIO cold path.
Critical discovery: sprintf writes to SEQUENTIAL addresses (0xFB0000, 0xFB0001, ...), NOT repeated writes to the same address. A null byte at 0xFB0000 exactly means program exit; a null at any other offset means flush the buffer. This took multiple debugging sessions to understand.
5.8 Strategic / "why do this at all" tradeoffs
- Why build this when CEmu already exists — CEmu is the gold-standard desktop emulator. The project's reason to exist is "CEmu runs on my laptop; I want to run my calculator from my phone and browser." Reframing the goal from "emulate the TI-84 CE" to "make the TI-84 CE portable" is what forced the tri-platform + WASM bet. A pure-desktop project would have had no reason not to just fork CEmu. Inferred.
- Tri-platform (iOS + Android + web) for a project with effectively zero users — indie emulators typically pick one platform. Tri-platform multiplies ops complexity 3× but was the whole point — if the project ran on one platform there'd be no cross-platform architecture story. The no_std core + narrow C ABI decisions only pay off if you actually land on all three. Inferred.
- Full-system OS emulation (boot the real ROM) over running user apps against a stubbed OS — apps-only would have been 10× easier; you could stub interrupts, LCD timing, and the OS event loop. Booting the 5.8.2 ROM is what surfaced the undocumented quirks (IM2=IM1, OS Timer, SPS/SPL) because the firmware exercises hardware the way no app alone does. "It boots real firmware" is the credibility claim that demanded this scope. Evidenced in Section 6 parity campaign.
- Cycle-accurate from the start, not retrofitted — functional emulation would have booted the OS in days, not weeks. Choosing cycle-accurate upfront made CEmu traces the executable spec; anything off by a few cycles breaks interrupt timing and the boot hangs. A functional emulator retrofitted for timing would have meant rewriting the core mid-project. Evidenced — firmware is sensitive to timer interrupts, LCD refresh, RTC ticks.
- 25 days of obsessive cycle-level work on a no-user project — the defensible framing is that the project is the interview answer. The trace-diff methodology, 8-agent audit, 472 subagents, and CEmu parity campaign are the actual deliverable — the working emulator is the artifact proving the methodology. A simpler emulator shipped in 3 days would be a weaker portfolio signal. Inferred.
5.9 Additional architectural tradeoffs worth naming
- Rust core + narrow C ABI over pure C or Rust-native bindings per platform — 15
extern "C"functions let CEmu slot in as a swappable backend and let the same.adrop into JNI / static-link on iOS / compile to WASM. A pure-C core (matching CEmu) would have made FFI trivial but given up memory safety and a zero-dependency WASM target; Rust-native bindings (UniFFI, wasm-bindgen) would have made each platform nicer but broken the dual-backend story. Inferred. no_std-style core: nostd::fs, no threads, no allocator for WASM — framebuffer is*const u32, ROMs are&[u8], save state isVec<u8>. Platform owns frame pacing, persistence, logging. Makes the core trivially deterministic (precondition for trace-diffing) and drops into WASM without a runtime; cost is pushing the same concerns into three separate platform codebases. Inferred.- Android
dlopenvs iOS static-link-with-symbol-prefix — asymmetric backend switching. iOS App Store policy forbidsdlopenof non-system dylibs, forcing static linking of both cores and two prefixing schemes to avoid symbol collisions: Cargo'sios_prefixedfeature renames Rust exports torust_emu_*, andIOS_PREFIXED=1macro-rewrites CEmu adapter exports tocemu_*. Android has no such constraint, so runtimedlopenwithRTLD_LOCALis strictly better there (smaller per-backend APKs, parallel builds, zero prefixing needed). Policy-dictated, not preference. Inferred. - Trace-diff against CEmu as the correctness oracle, not unit tests — 11 custom Rust examples + 3 Python scripts + a patched CEmu
trace_genbecame the parity toolchain; 178 unit tests are allowed to fail. eZ80 quirks (IM2≡IM1, SPS/SPL split, suffix atomicity, undocumented OS Timer) aren't in any published spec — CEmu's behavior is the executable spec. Unit tests can't express "boots to MathPrint instead of Classic." Evidenced — I in chat: "genuinely i dont give a shit if tests fail, they weren't catching shit before." - Big-bang subagent audit (8 parallel) for the parity roadmap, not sequential bisection — one session dispatched 8 agents to read CEmu source in parallel and produce the ~150-item roadmap that became the 7-phase plan. Pure trace-bisection would have surfaced dependencies emergently (CPU → bus → peripherals → timing) and wasted weeks on symptoms. Cost: ~20% of "critical" items needed human triage because subagents couldn't verify against traces. Evidenced in
pr_comparison_report.md.
5.10 Code-level tradeoffs visible in the source
- Core crate exposes a C ABI even for the web build, not a
wasm-bindgenRust API —core/Cargo.tomlsetscrate-type = ["staticlib", "rlib", "cdylib"]with ~15 C functions (emu_create,emu_run_cycles,emu_framebuffer, …) inemu.h. An idiomatic Rust build would give each platform its own binding style (wasm-bindgenfor web,uniffi/swift-bridgefor mobile). Picking C as the lowest common denominator means one header drives Swift's bridging header, Android's JNI wrapper, and a CEmu-compatible backend interface with identical signatures — at the cost of a globalMutex<Emu>on every FFI call. Evidenced. - Swappable backend ABI shapes the whole cross-platform layer, not just the test harness —
web/src/emulator/types.tsdefines a 20-methodEmulatorBackendTypeScript interface implemented by bothRustBackendandCEmuBackend; Android compiles two separate.sofiles viabackend_wrapper.cppwithBACKEND_NAMEset at compile time. Shipping only the Rust core and deleting the CEmu path would have been simpler — but keeping CEmu runtime-selectable means divergence bugs can be A/B-tested on-device, which is what makes the parity campaign mean anything. Evidenced. - Monolithic
cpu/execute.rs(2,646 lines) with nestedmatch y/z/p/qdecoding, not a 256-entry dispatch table — the classic Z80 bitfield decomposition in giant nested matches; no opcode table of function pointers. Rust'smatchon small integers compiles to a jump table anyway, so the perf argument is moot — keeping the structured decomposition lets DD/FD/ED/CB prefix variants share helpers without a second dispatch indirection. Cost: one very large file, harder to diff against per-opcode tests. Evidenced. - Peripherals as sibling concrete modules keyed by MMIO base address, no
MmioDevicetrait —peripherals/mod.rsdeclares 12 concrete modules (flash,interrupt,keypad,lcd, …) and the bus dispatches by address range. Atrait MmioDevice { fn read(&mut self, off) -> u8 }+Vec<Box<dyn _>>would be more "Rust-like." The peripheral set is hardware-fixed so no plugin story is needed, dynamic dispatch would block inlining on hot paths (LCD palette reads), and each peripheral keeps peripheral-specific public methods (e.g.KeypadController::set_key) directly callable from the FFI layer without an enum wrapper. Evidenced.
6. The CEmu Parity Campaign
6.1 Overview
The parity campaign was the single largest engineering effort in the project — a systematic 7-phase overhaul to make the Rust emulator match CEmu's behavior at the instruction level. The campaign was driven by an extensive toolchain of trace generation, comparison, and analysis tools.
6.2 The Parity Toolchain
Trace generation (Rust side):
cargo run --example debug -- trace [steps]— Generates space-separated trace files with step, cycles, PC, SP, AF, BC, DE, HL, IX, IY, ADL, IFF1, IFF2, IM, HALT, opcodecargo run --example debug -- fulltrace [steps]— Generates comprehensive JSON traces including all I/O operations per instruction (RAM reads/writes, MMIO port access)
Trace generation (CEmu side):
tools/cemu-test/trace_gen.c— Links against CEmu'slibcemucore.a, generates traces in the exact same format as the Rust sidetools/cemu-test/parity_check.c— Checks CEmu state at 14 cycle milestones (1M through 60M)
Comparison tools:
scripts/compare_traces.py— PC-synced comparison with prefix lookahead (CEmu counts DD/FD prefixes as separate instructions)scripts/find_first_divergence.py— JSON fulltrace comparison with I/O operation matchingcore/examples/dense_compare.rs— PC-aligned comparison using HashMap-based lookup with 5-step lookahead
Targeted investigation tools (11 specialized Rust examples):
find_divergence.rs— Tracks PC at known CEmu cycle checkpointsscheduler_debug.rs— Monitors RTC event timing in schedulerrtc_timing_compare.rs— Compares RTC load timing between implementationscheck_0072fa.rs— Single-steps 70M cycles checking specific poll loop addressmathprint_check.rs— Monitors MathPrint flag at cycle checkpoints- Plus 6 more specialized analysis tools
6.3 The 7 Phases
Phase 1: CPU Instruction Correctness (Effort: L)
- Fixed RETI IFF1 restore
- Fixed register pair mapping (ED x=0 z=7 p=3: IY→IX)
- Added missing eZ80 instructions: LD I,HL (ED C7), LD HL,I (ED D7), LEA IY,IX+d (ED 55)
- Implemented block I/O (all Z80 + eZ80-specific variants)
- Fixed EX DE,HL L-mode masking
- Fixed block BC decrement (preserve BCU in Z80 mode)
- Verification: Boot 132.79M cycles, PC=085B80. 250/436 tests passing.
Phase 2: Bus & Address Decoding (Effort: M)
- Flash routing for 0x400000–0xBFFFFF
- MMIO unmapped holes
- Port range 0xF routing
- SPI in memory-mapped path
- Backlight routing
- Verification: Boot 132.79M, 251/436 tests.
Phase 3: Peripheral Register Layout Rewrites (Effort: XL — the largest phase)
- Timer rewrite: Replaced 3 separate Timer structs with unified
GeneralTimers. Shared 32-bit control register (3 bits per timer), status register, mask register. - Keypad register packing: Single 32-bit control register with mode (2 bits) + rowWait (14 bits) + scanWait (16 bits). 16 data registers. GPIO enable. Reset mask 0xFFFF.
- Watchdog offset fix: Counter and load value offsets were SWAPPED. Revision corrected to 0x00010602.
- Verification: Boot 156.10M, 272/457 tests.
Phase 4: Scheduler & Timing (Effort: L)
- SCHED_SECOND overflow prevention
- CPU speed change event conversion
- Panel clock rate 60Hz → 10,000,000 Hz
- OS Timer interrupt phase fix (set state to OLD value before toggling)
- Timer 32 KHz clock source
- Timer 2-cycle interrupt delay pipeline: Match detected → status visible → interrupt fires. Required
TimerDelayscheduler event,process_delay()state machine,delay_status: u32anddelay_intrpt: u16fields. - Verification: Boot 156.10M, 272/457 tests.
Phase 5: RTC, SHA256, Control Ports (Effort: M)
- RTC 3-state machine with time counting, latching, and load data transfer
- SHA256 64-round compression function
- Control port masks (port 0x01:
& 0x13, port 0x29:& 1) - Flash size_config reset (0x07 → 0x00)
- INT_PWR interrupt on reset
- Verification: Boot 108.78M, 259/437 tests.
Phase 6: LCD & SPI Enhancements (Effort: L)
- LCD DMA 5-state event machine with dual clock domains
- 256-entry palette storage
- SPI panel stub (ST7789V)
- LCD ICR, MIS, UPCURR, LPCURR registers
- Verification: Boot 156.10M, 277/455 tests.
Phase 7: CPU Advanced & Bus Protection (Effort: XL, Risk: High)
- Separate SPS/SPL stack pointers (CPU SNAPSHOT_SIZE 64 → 67)
- Mixed-mode CALL/RET/RST with MADL|ADL flag byte
- Memory protection (stack limit NMI, protected range, flash privilege)
- DMA scheduling with cycle stealing (~7.7%)
- HALT fast-forward, interrupt prefetch_discard, R rotation
- Verification: Boot 168.14M cycles, 277/455 tests. ALL PHASES COMPLETE.
6.4 The Comparison Report
PR #56's supporting document (docs/pr_comparison_report.md) was generated by 8 parallel analysis agents, each comparing a different subsystem of the Rust emulator against CEmu. It identified ~150 specific discrepancies organized into three tiers:
- 20 critical issues (CPU missing instructions, register layout mismatches, entirely missing peripherals)
- 19 high issues (timing differences, routing bugs, missing clock sources)
- 25+ medium issues (edge cases, missing features, reset value differences)
This document served as the roadmap for all 7 phases.
7. AI Agent Orchestration for Testing & Debugging
7.1 Overview of AI's Role
The project was ~80% AI co-authored (262/332 commits). Claude Code was used in 37+ main conversation sessions with 472 subagent invocations across those sessions, generating ~73 MB of conversation data (~9 MB in main sessions, ~64 MB in subagent sessions). The heaviest sessions had 89 subagent files (session 13baee3d, Feb 8) and 76 subagent files (session dc8b876a, Jan 30-31).
I directed the architecture, defined the parity methodology, tested on real devices, identified bugs, and corrected the AI when it went wrong. Claude served as implementation workhorse — writing code, running traces, and deploying subagents per my direction. The division of labor was:
| I (direction & decisions) | AI (execution & research) |
|---|---|
| Designed dual-backend architecture | Deployed up to 89 parallel subagents for research |
| Defined parity methodology & toolchain | Read and compared CEmu C source code against Rust |
| Tested on real Android/iOS devices | Wrote implementation code per my direction |
| Identified bugs via device testing | Generated traces and analyzed divergences |
| Corrected wrong AI approaches | Created PRs with descriptions |
| Decided what to build/cut/revert | Rebaked ROMs (17 consecutive times in one session) |
| Set quality bar ("I want perfect") | Iterated until my bar was met |
7.2 Subagent Usage Patterns
Subagents were the primary mechanism for scaling the AI's research and implementation capacity. The project used 472 subagent sessions across the 37 main sessions, with distinct usage patterns:
Research subagents (the most common pattern): These were deployed in parallel to read and analyze source code. The most dramatic example was the 8-agent peripheral audit (PR #56), where 8 subagents simultaneously compared each peripheral module in the Rust emulator against its CEmu C counterpart. Each agent would:
- Read the CEmu C source file (e.g.,
cemu-ref/core/timers.c) - Read the corresponding Rust source file (e.g.,
core/src/peripherals/timer.rs) - Produce a detailed report of every register offset, default value, timing behavior, and control flow difference
This pattern was repeated at smaller scale throughout the project. When investigating a bug, 2-3 research subagents might be deployed simultaneously — one reading the CEmu source for a specific subsystem, one reading the Rust implementation, and one analyzing trace divergences.
Implementation subagents: For larger feature work (e.g., the 7-phase parity overhaul), subagents were used to implement specific fixes within a phase while the main agent coordinated. A typical pattern:
- Main agent defines the fix needed (e.g., "rewrite timer register layout to match CEmu's packed 32-bit format")
- Subagent reads both implementations, writes the new Rust code
- Main agent integrates, runs tests, verifies parity
Worktree subagents: Claude Code's worktree feature was used for parallel feature development. Subagents worked in isolated git worktrees (e.g., calc-worktrees/rtc-ticking, calc-worktrees/doom, calc-worktrees/image-keypad) to develop features without blocking the main branch. The image keypad worktree had its own memory file with 6 subagent sessions documenting SwiftUI pitfalls and button region coordinate systems.
Session intensity distribution:
- 2 sessions with 75+ subagents (the boot-to-homescreen push on Jan 30-31, and the image keypad + parity work on Feb 8)
- ~5 sessions with 20-40 subagents (major feature work)
- ~10 sessions with 5-15 subagents (focused debugging or feature implementation)
- ~20 sessions with 0-5 subagents (quick fixes, documentation, interview prep)
Cross-session knowledge transfer: Because subagent context is lost when a session ends, the project relied on persistent files for continuity:
MEMORY.md(68 lines of distilled learnings in.claude/projects/)CLAUDE.md(7.6KB of workflow docs, memory map, trace formats in repo root)docs/findings.md(15KB of hardware discoveries)docs/milestones.md(7.1KB phase tracker)
Each new session's first action was typically reading these files to rebuild context. When a session hit context limits (which happened frequently during the parity campaign), the findings from that session were written to findings.md or milestones.md before ending, ensuring the next session could continue.
7.3 The Parity Testing/Debugging Loop
The CEmu parity campaign was my core methodology for achieving correctness: systematically compare every subsystem against the reference implementation, prioritize fixes by dependency order, and verify each fix pushes the divergence point further. The AI executed this methodology at scale. Here's how it worked:
Phase 1: Systematic Comparison Research
I directed a comprehensive audit of every peripheral subsystem against CEmu's C source. Claude deployed 8 parallel subagents simultaneously, each assigned a different subsystem:
- Agent comparing CPU execution (cpu.c vs execute.rs)
- Agent comparing bus/memory (bus.c/mem.c vs bus.rs/memory.rs)
- Agent comparing LCD/backlight (lcd.c vs lcd.rs)
- Agent comparing timers (timers.c vs timer.rs)
- Agent comparing interrupt controller (interrupt.c vs interrupt.rs)
- Agent comparing keypad (keypad.c vs keypad.rs)
- Agent comparing RTC/SHA256 (realclock.c/sha256.c vs rtc.rs/sha256.rs)
- Agent comparing scheduler/control (schedule.c/control.c vs scheduler.rs/control.rs)
Each agent read both the Rust and C source files in full and produced a detailed report of every discrepancy — register layout differences, timing behavior differences, missing features, wrong default values. The reports were merged into docs/pr_comparison_report.md (~150 issues).
Phase 2: Prioritized Fix Implementation
I organized the ~150 issues into 7 phases based on dependency ordering (CPU first, then bus, then peripherals, then timing). Each phase had:
- Clear deliverables (specific registers to fix, specific behaviors to match)
- A verification checkpoint (boot cycle count + test count)
- Effort and risk ratings
The AI implemented fixes per the phase plan, then verified by generating a trace and comparing against CEmu:
cargo run --example debug -- trace 100000 # Generate Rust trace
cd tools/cemu-test && ./trace_gen ../../ROM -n 100000 # Generate CEmu trace
python scripts/compare_traces.py cemu.log rust.log # ComparePhase 3: Divergence Bisection
When traces diverged, I directed the AI to use a binary-search approach to find the exact instruction:
- Generate a long trace (100K+ steps)
- Find the first PC where Rust ≠ CEmu
- Read the surrounding instructions for context
- Look up the divergent instruction in CEmu's source
- Compare the implementation in Rust
- Fix the discrepancy
- Re-run and verify the fix pushed the divergence further out
This loop was repeated hundreds of times. Key milestones: 40K steps → 700K steps → 3.2M steps → full boot (3.6M steps pre-DMA, 168M with DMA).
Phase 4: Targeted Investigation Tools
When standard trace comparison was insufficient, I directed the AI to build specialized investigation tools targeting specific subsystems. 11 custom Rust examples were created during the parity campaign:
rtc_timing_compare.rs— Compared RTC load timing between Rust and CEmu at 12 checkpoints from 0 to 50M cycles. Finding: "Our RTC load status returns 0x00 (complete) too early. This timing difference causes the poll loop at 0x0072FA to exit earlier."scheduler_debug.rs— Monitored RTC event scheduling. Finding: At 48 MHz, 1 CPU cycle = 160 base ticks; RTC fires every 16,429 ticks at 32 KHz = ~24M CPU cycles delay.check_0072fa.rs— Single-stepped 70M cycles checking one specific poll loop address that CEmu visits but Rust didn't. Finding: Different control flow due to RTC timing.mathprint_check.rs— Monitored the MathPrint flag (0xD000C4 bit 5) at 8 cycle checkpoints. Finding: The flag was never set to 0x20 (MathPrint mode) because the RTC timing caused a different code path.
Phase 5: Multi-Session Continuity
The parity campaign spanned multiple conversation sessions that hit context limits. To maintain continuity:
- The
MEMORY.mdfile in.claude/projects/stored distilled learnings (68 lines) - The
CLAUDE.mdfile in the repo documented workflow, key addresses, trace formats docs/findings.md(15KB) captured every hardware discoverydocs/milestones.md(7.1KB) tracked phase completion status
When a new session started, Claude would read these files to rebuild context, then continue from where the last session left off.
7.4 Specific Debugging Stories Driven by AI Agents
The SPI Divergence Hunt (699,900 → 3.2M steps):
- Divergence at step 699,900: After SPI STATUS read, CEmu A=0x20 (2 transfers pending), Rust A=0x00 (all complete)
- AI agents analyzed CEmu's scheduler-driven SPI:
sched_set(SCHED_SPI, ticks), transfer duration =bitCount * ((cr1 & 0xFFFF) + 1)ticks at 24MHz - The initial "complete ALL transfers on first read" approach worked at step 418K (3 transfers) but failed at 699K (6 transfers, only 4 should complete)
- Solution: Implement event-driven scheduler for SPI. This pushed parity to 3.2M steps.
- Next divergence at 3,216,456: RTC load status timing. This took 4 more dedicated investigation tools to diagnose.
The 8-Agent Peripheral Audit (Feb 5-6):
- 8 agents deployed simultaneously, each comparing a subsystem
- Total: ~150 issues identified across all peripherals
- Produced a 35KB comparison report that became the 7-phase implementation roadmap
- The agent findings revealed that some peripheral implementations were fundamentally wrong (e.g., timer register layout was completely different, keypad control register packing was wrong, watchdog offsets were swapped)
The MathPrint vs Classic Mode Investigation (5 custom tools):
- Problem: Emulator boots into Classic mode, CEmu boots into MathPrint mode
mathprint_check.rs: Found the flag is never set to 0x20check_0072fa.rs: Found the poll loop at 0x0072FA behaves differentlyrtc_timing_compare.rs: Found RTC load completes in ~75K cycles (Rust) vs ~24M cycles (CEmu)scheduler_debug.rs: Found the RTC event offset of 16,429 ticks was correct but the load processing timing was wrong- Resolution: Fix RTC load state machine to process at correct 32 KHz rate
7.5 Human-AI Interaction Patterns
my direction and quality enforcement:
- Set the parity standard: "i dont want close, i want perfect"
- Demanded correct approaches: "make sure this is the correct way to do things, i dont want to use hacky workarounds"
- Redirected flailing AI: "stop making random guesses and use comprehensive logging"
- Maintained project knowledge when AI lost context: "are you forgetting the things you learned in your findings?"
- Prioritized correctness over test coverage: "genuinely i dont give a shit if tests fail, they weren't catching shit before"
I correcting the AI:
- AI focused on backlight brightness for the power-off bug when the real issue was the LCD enable bit — I identified the correct root cause and redirected
- AI frequently forgot previous session findings across context boundaries — I pointed it back to
findings.mdandmilestones.md - AI's interview prep contained unverifiable claims (e.g., calling a first-time-correct implementation a "redesign") — I caught the fabrications and demanded accuracy
- AI over-engineered the PWA implementation — I reverted the work and directed a simpler approach
- AI made the emulator boot into Classic mode instead of MathPrint — I identified the symptom on device and directed the multi-tool investigation
What the AI executed well (when properly directed):
- Deploying parallel subagents for research before writing code
- Building specialized investigation tools for specific subsystems
- The trace → compare → fix → verify loop (my methodology, AI's execution)
- ROM baking automation (17 consecutive rebakes at 4-5 AM)
- Cross-file refactoring (updating all 3 platforms simultaneously)
8. Claude Code Usage Statistics
8.1 Session & Token Data
| Metric | Value |
|---|---|
| Main conversation sessions | 37 session directories (24 indexed in sessions-index.json) |
| Main JSONL conversation files | 4 files, 9.2 MB total |
| Subagent invocations | 472 subagent JSONL files, 64 MB total |
| Total conversation data | ~73 MB |
| PRs created from sessions | 13 |
| Unique branches worked on | 13 |
| Related worktree projects | 2 (calc-web, calc-worktrees-image-keypad) |
8.2 Token Usage (extracted from main JSONL files)
| Category | Tokens |
|---|---|
| Input tokens | 1,017,775 |
| Output tokens | 222,085 |
| Cache read tokens | 462,457,150 |
| Cache creation tokens | 73,134,709 |
| Total | ~537M tokens |
Note: These counts are from the 4 main JSONL files only. The 472 subagent sessions (64 MB of data) would add significantly more tokens — likely bringing the true total well above 1B tokens for the project.
8.3 Estimated Cost (Opus pay-per-token pricing)
| Category | Rate | Est. Cost |
|---|---|---|
| Input | $15/M tokens | $15 |
| Output | $75/M tokens | $17 |
| Cache reads | $1.50/M tokens | $694 |
| Cache creation | $18.75/M tokens | $1,371 |
| Main sessions total | ~$2,097 |
Note: This estimate uses Claude Opus 4 pricing. The actual cost depends on which model was used per session (Opus 4.5/4.6 vs Sonnet 4.5). Max plan subscribers pay a flat rate, so actual billing may differ. The subagent sessions' token costs are not included in this estimate.
8.4 Session Breakdown by JSONL File
| File | Size | Responses | Content |
|---|---|---|---|
e262789c.jsonl | 4.88 MB | 96 | Interview prep, chat export generation |
534ad7db.jsonl | 3.82 MB | 538 | Chess mode, auto-launch, ROM baking, Shift+R |
f814347a.jsonl | 285 KB | 22 | Test framework investigation |
e910f4c6.jsonl | 189 KB | 35 | Web frontend test setup (Vitest, 44 keypad tests) |
8.5 AI Model Distribution Across Commits
| Model | Commits | Notes |
|---|---|---|
| Claude Opus 4.5 | 215 | Primary model through mid-February |
| Claude Opus 4.6 | 74 | Adopted mid-February |
| Claude Sonnet 4.5 | 15 | Used sporadically |
| Codex | 1 | Single worktree snapshot |
| No AI attribution | 70 | Merge commits, manual fixes, config tweaks |
9. Major Bugs & Debugging Stories
9.1 The Magnitude Error (10⁹ bug)
Symptom: 6+7 showed 1300000000, 99*99 showed wrong answer.
Investigation: Spanned many sessions. Traced BCD floating-point operations, checked OP1-OP6 register addresses, examined TI-OS format buffer.
Root cause: Using self.adl instead of self.l for data register wrapping. L mode controls data addressing width (16-bit vs 24-bit), while ADL controls instruction/PC width. LDIR/LDDR were using 24-bit addressing when they should have used 16-bit for data operations.
False leads: Initially suspected display formatting (decimal point not written), then keypad timing, then OS Timer frequency.
9.2 The Graphing Hang
Symptom: Screen freezes after graphing, loader stops spinning.
Root cause: Timer raw bits accumulating and causing infinite ISR loops. Timer interrupt clearing in tick_peripherals was not properly clearing the raw status bits before re-evaluating.
Fix: Proper implementation of the 2-cycle timer interrupt delay pipeline.
9.3 The "Done" Bug
Symptom: First calculation after boot shows "Done" instead of numeric result.
Root cause: TI-OS expression parser not initialized. The OS expects an ENTER key to have been processed before the first calculation.
Fix: Auto-inject ENTER key on first user interaction (in Rust core, cross-platform).
9.4 ON Key Wake From Sleep
Symptom: ON button doesn't work after power-off (APO or 2nd+ON).
Investigation: Multiple sessions tracing CEmu's keypad_on_check(), control.off flag, power state.
Root causes (multiple): Battery status port returning hardcoded 0 instead of 0xFE (the OS rejected wakes because it thought the battery was dead). on_key_wake was one-shot instead of persistent. The APD (Automatic Power Down) disable wasn't clearing the right flag at 0xD00088.
9.5 The 50× Performance Regression
Caused by: Phase 4 scheduler changes (timer interrupt delay pipeline).
Root cause: Scheduler base_cycles_offset u64 underflow — subtracting a larger value from a smaller one wrapped around to near-maximum u64.
Fix: Added next_event_ticks cache to avoid scanning all events, fixed the underflow.
9.6 State Restore "RAM Cleared"
Symptom: Restoring a saved state showed "RAM cleared" message.
Root cause: Multiple peripheral fields not included in state snapshots: SPI controller state, watchdog state, cursor registers, needs_lcd_event, needs_lcd_clear, memory protection registers (stack limit, protected range). The OS detected inconsistency on restore.
Fix: Multiple STATE_VERSION bumps (reached version 8) to add missing fields. Spawned multiple investigation subagents to systematically audit what was and wasn't saved.
9.7 Chess Opening Books Not Loading
Symptom: Chess shows "BK:NONE" (file open fails) vs CEmu showing "BK:32768".
Root cause: fileioc (CE C library) stores curr_slot at LCD register address 0xE30C11 and resize_amount at 0xE30C0C — in LCD cursor image RAM (0x800-0xBFF) which wasn't implemented.
How found: Added breakpoint mechanism to Emu struct, used disasm command, traced through _ChkFindSym bcall. Multiple subagent sessions to understand fileioc's internals.
10. Running Real Software: DOOM & Chess
10.1 DOOM Support (PR #68)
Getting DOOM running required several subsystem additions:
- 8bpp LCD rendering: The calculator normally uses 16bpp RGB565, but DOOM uses 8bpp indexed color with a 256-entry palette. Added
render_frame_8bpp()with palette lookup. - .8xp/.8xv file parser (
ti_file.rs): Parse TI file format headers, checksums, variable entries. Supports programs, protected programs, AppVars. - Flash archive injection:
inject_archive_entry()writes flag byte 0xFC, 2-byte size, type, version, self-referential address, name, data into flash.find_archive_free_addr()scans sectors 0x0C0000-0x3B0000. - SendKey mechanism: Pokes OS RAM directly at 0xD0058C (kbdKey), 0xD0058E (keyExtend), bit 5 of 0xD0009F (keyReady). Uses
bus.poke_byte()to bypass memory protection. - Launch sequence: ENTER → CLEAR → Asm( → prgm → D,O,O,M → ENTER
- LCD cursor image RAM: Extended LCD address space to include 0x800-0xBFF (1024 bytes) used by LibLoad as scratch storage.
- Keypad range extension: Extended
KEYPAD_ENDfrom 0x150048 to 0x151000 (full 4KB page) because DOOM needed the full range.
10.2 Chess Integration
The chess engine (from the ce-games submodule) is a fully-featured chess program running on the eZ80:
- Alpha-beta negamax with PVS, aspiration windows, null-move pruning (R=2), LMR, futility pruning
- Texel-tuned PeSTO piece-square tables
- 4096-entry transposition table (always-replace)
- Polyglot opening book split across AppVars (TI-OS 64KB limit), up to 131K entries
- ~154K cycles/node, ~2000 Elo at Expert difficulty (15s/move)
- Automated tournament system (
emu_tournament.py) running eZ80 engine vs Stockfish
Web chess mode (/chess route): Fetches chess.bin (1.9MB gzipped ROM), auto-launches at 5× speed using visual polling for boot detection.
11. Development Timeline & Velocity
| Date | Day | PRs | Key Achievement |
|---|---|---|---|
| Jan 27 | 1 | #1-#4 | CPU + memory + peripherals. Full eZ80 in one afternoon. |
| Jan 28 | 2 | #6-#9 | 40K step parity. IM2 fix. OS Timer. |
| Jan 29 | 3 | #10-#13 | OS boots to home screen (3.6M steps). Scheduler implemented. |
| Jan 30-31 | 4-5 | #14-#20 | Keypad working. Magnitude error fixed (L vs ADL modes). |
| Feb 1 | 6 | #21-#33 | CEmu backend + iOS app + 13 PRs in one day. |
| Feb 2 | 7 | #34-#44 | Runtime backend switching. State persistence. Web app. |
| Feb 5-6 | 10-11 | #51-#58 | 7-phase CEmu parity overhaul (PR #56). WASM optimization. |
| Feb 7-9 | 12-14 | #59-#68 | DOOM runs. Image keypad. ON key wake. |
| Feb 10-14 | 15-19 | #69-#78 | Live file send. Debug port interception. Sudoku. |
| Feb 16-20 | 21-25 | #81-#86 | Chess mode. PWA offline. Shift+R dev shortcut. |
Key velocity facts:
- Boot-to-homescreen achieved in 3 days from initial commit
- Full eZ80 CPU (3,675 lines, 124 tests) implemented in one afternoon
- Tri-platform support (Android + iOS + Web) in 6 days
- From "first instruction runs" to "DOOM runs" in 13 days
12. PR & Commit History Analysis
12.1 Merge Patterns
- Squash merges: ~43 (64%) — used for most feature branches from PR #16 onward
- Merge commits: ~24 (36%) — used for earlier PRs and larger merges
- No rebases on main — clean linear history via squash
12.2 AI Co-Authorship
- 262/332 commits (79%) have AI co-author attribution
- Claude Opus 4.5: 215 commits (primary model through mid-Feb)
- Claude Opus 4.6: 74 commits (adopted mid-February)
- Claude Sonnet 4.5: 15 commits (sporadic)
- Codex: 1 commit (worktree snapshot)
- 70 commits (21%) have no AI attribution (merge commits, manual fixes, config tweaks)
12.3 Reverts
One revert: commit 645aeb1 reverted PR #2's CPU implementation immediately after squash-merge, then re-merged from a different branch topology. This was a merge strategy correction, not a code quality issue.
12.4 Closed PRs (8 total)
- PRs #46-#50 (5 individual CEmu parity features): All absorbed into monolithic PR #56
- PR #35 (unified state persistence): Superseded by per-platform PRs #39, #41, #43
- PR #63 (state restore perf): Incorporated into later PRs
- PR #85 (chess + Shift+R): Split into PRs #84 and #86
12.5 PR Dependency Chains
- Core emulation: #1→#2→#4→#6→#7→#8→#9→#10→#11→#12→#13 (boot achieved)
- CEmu parity: #22→#23→#30→#45→#51→#56→#59
- Gaming: #61→#68→#70→#71→#74→#77→#78→#82→#84
- State persistence: #39→#41→#42→#43→#53→#57→#81
13. Key Files Reference
Core Emulator
| File | Lines | Purpose |
|---|---|---|
core/src/emu.rs | 3168 | Main orchestrator, execution loop, frame rendering |
core/src/cpu/execute.rs | 2646 | All instruction execution (largest file) |
core/src/bus.rs | 1929 | Memory routing, flash unlock, debug ports |
core/src/disasm.rs | 1544 | Full eZ80 disassembler |
core/src/peripherals/lcd.rs | 1302 | LCD controller + 5-state DMA engine |
core/src/peripherals/keypad.rs | 899 | 8×7 key matrix, scan modes, edge detection |
core/src/cpu/helpers.rs | 858 | ALU, register access, prefetch |
core/src/peripherals/control.rs | 838 | CPU speed, battery FSM, memory protection |
core/src/scheduler.rs | 702 | Event scheduler, 7.68 GHz base clock |
core/src/peripherals/rtc.rs | 674 | RTC 3-state machine |
core/src/peripherals/timer.rs | 671 | 3× GPT with delay pipeline |
core/src/peripherals/spi.rs | 627 | SPI + 16-deep FIFO |
core/src/peripherals/mod.rs | 599 | Port routing, tick orchestration |
core/src/memory.rs | 587 | Flash + RAM + NOR commands |
core/src/peripherals/flash.rs | 576 | Flash controller registers |
core/src/peripherals/interrupt.rs | 448 | 2-bank interrupt controller |
core/src/lib.rs | 432 | C ABI exports, SyncEmu |
core/src/ti_file.rs | 379 | .8xp/.8xv parser |
core/src/wasm.rs | 325 | WASM bindings |
core/src/peripherals/sha256.rs | 307 | SHA-256 compression |
core/include/emu.h | 52 | C ABI contract |
Debug Tools
| File | Purpose |
|---|---|
core/examples/debug.rs (~2900 lines) | Swiss Army knife CLI: boot, trace, fulltrace, screen, vram, calc, sendfile, bakerom, run, rundoom |
tools/cemu-test/trace_gen.c | CEmu trace generator in matching format |
tools/cemu-test/parity_check.c | CEmu state checker at cycle milestones |
scripts/compare_traces.py | PC-synced trace comparison |
scripts/find_first_divergence.py | JSON fulltrace comparison with I/O matching |
Documentation
| File | Purpose |
|---|---|
docs/findings.md (15KB) | All hardware discoveries, bug findings, lessons |
docs/milestones.md (7.1KB) | 7-phase parity roadmap (all complete) |
docs/pr_comparison_report.md | 8-agent CEmu comparison (~150 issues) |
CLAUDE.md (7.6KB) | Claude Code workflow, memory map, trace format |
README.md (17KB) | Comprehensive project docs |
outline_v2.md | Blog post draft with narrative framing |
Platform Frontends
| File | Lines | Purpose |
|---|---|---|
android/.../MainActivity.kt | ~2100 | Monolithic Android UI |
android/.../jni_loader.cpp | ~630 | JNI + dynamic backend loading |
android/.../cemu_adapter.c | ~595 | CEmu wrapper for Android |
web/src/Calculator.tsx | ~800 | Main web component |
web/src/emulator/RustBackend.ts | — | WASM memory snapshot save/load |
ios/Calc/Bridge/EmulatorBridge.swift | — | Swift FFI bridge |
ios/Calc/Bridge/backend_bridge.c | ~309 | iOS static backend switching |