Appearance
TI-84 Plus CE Emulator — Deep Technical Profile
Table of Contents
- Project Overview
- Architecture
- The Rust Emulator Core
- Cross-Platform Frontends
- Technical Tradeoffs & Decisions
- The CEmu Parity Campaign
- AI Agent Orchestration for Testing & Debugging
- Claude Code Usage Statistics
- Major Bugs & Debugging Stories
- Running Real Software: DOOM & Chess
- Development Timeline & Velocity
- PR & Commit History Analysis
- Key Files Reference
1. Project Overview
A cycle-accurate TI-84 Plus CE graphing calculator emulator built from scratch in Rust, with native frontends for Android (Kotlin/Jetpack Compose), iOS (Swift/SwiftUI), and Web (React/TypeScript/WASM). The emulator faithfully reproduces the Zilog eZ80 processor, all 13 hardware peripherals, and the TI-OS operating system — achieving instruction-level behavioral parity with CEmu, the established open-source reference emulator.
By the numbers:
- ~15,000 lines of Rust core, ~38,000 total across all platforms
- 332 commits across 80+ branches, 67 merged PRs
- Built in 25 days (January 27 – February 20, 2026)
- 168,140,000 cycles to boot TI-OS (verified against CEmu)
- 277/455 unit tests passing (178 failures from pre-existing prefetch initialization, not regressions)
- WASM binary: 148KB uncompressed, 96KB gzipped
- ~80% AI co-authored (262/332 commits have Claude co-author attribution)
2. Architecture
2.1 Dual-Backend Design
The project's most distinctive architectural decision is its dual-backend system. Both the custom Rust emulator and the CEmu C reference emulator implement the same C ABI contract (core/include/emu.h), allowing any frontend to use either backend without code changes:
┌────────────────────────────────────────────────┐
│ Platform UI Layer │
│ (Android Compose, iOS SwiftUI, Web React) │
└──────────────────┬─────────────────────────────┘
│ C ABI (emu.h)
┌──────────────────▼─────────────────────────────┐
│ FFI Bindings (JNI / Swift / WASM) │
└────┬───────────────────────────────┬────────────┘
│ │
┌────▼──────────────┐ ┌────────▼─────────────┐
│ Rust Core │ │ CEmu Adapter │
│ (libemu_core.a) │ │ (libcemu_adapter.a) │
│ (emu_core.wasm) │ │ (cemu.wasm) │
└────────────────────┘ └──────────────────────┘The C ABI contract (emu.h, 52 lines) defines 15 functions:
- Lifecycle:
emu_create(),emu_destroy() - ROM:
emu_load_rom(),emu_send_file() - Execution:
emu_reset(),emu_power_on(),emu_run_cycles() - Display:
emu_framebuffer()(ARGB8888, 320×240),emu_is_lcd_on() - Input:
emu_set_key(row, col, down)(8×7 matrix) - State:
emu_save_state_size(),emu_save_state(),emu_load_state() - Misc:
emu_get_backlight(),emu_set_log_callback()
Backend switching per platform:
- Android: Dynamic loading via
dlopen()/dlsym()injni_loader.cpp. ABackendInterfacestruct holds 17 function pointers. Backends are separate.solibraries (libemu_rust.so,libemu_cemu.so) loaded at runtime. - iOS: Static linking with symbol prefixing. When built with
ios_prefixedCargo feature, Rust exports userust_emu_*names.backend_bridge.cdispatches through aBackendInterfacepointer. - Web: TypeScript class polymorphism with a
createBackend(type)factory function.RustBackendwraps wasm-bindgen,CEmuBackendwraps Emscripten.
Why dual backends? The primary motivation was parity-driven development. Having CEmu as a runtime-swappable reference meant:
- Any behavior difference could be observed in real-time on the same device
- The Rust implementation could be validated against CEmu at every stage
- Users had a working fallback while the Rust core was incomplete
- A/B comparison screenshots could be generated instantly
The tradeoff: doubled integration surface area, two build systems (Cargo + CMake/Emscripten), and symbol collision management. On iOS, this required the ios_prefixed feature flag to avoid linker errors when both backends are statically linked.
2.2 Core Module Architecture
The Rust emulator core (core/src/, ~15,000 lines) is organized into 5 main modules and 13 peripheral modules:
core/src/
├── lib.rs (432 lines) — C ABI exports, SyncEmu thread-safe wrapper
├── emu.rs (3168 lines) — Emulator orchestrator, execution loop, frame rendering
├── bus.rs (1929 lines) — Memory bus, address decoding, flash unlock, debug ports
├── memory.rs (587 lines) — Flash (4MB), RAM (256KB+VRAM), NOR flash commands
├── scheduler.rs (702 lines) — Event scheduler, 7.68 GHz base clock, 9 event types
├── disasm.rs (1544 lines) — Full eZ80 disassembler
├── ti_file.rs (379 lines) — .8xp/.8xv TI file format parser
├── wasm.rs (325 lines) — WASM FFI bindings
├── cpu/
│ ├── mod.rs (653 lines) — CPU state, step(), interrupt handling
│ ├── execute.rs (2646 lines) — All instruction execution (largest file)
│ ├── helpers.rs (858 lines) — ALU ops, register access, fetch/prefetch
│ ├── flags.rs (25 lines) — Flag bit constants (C, N, PV, H, Z, S, F3, F5)
│ └── tests/ (5420 lines) — instructions, modes, parity tests
└── peripherals/
├── mod.rs (599 lines) — Port routing, tick orchestration, state persistence
├── control.rs (838 lines) — CPU speed, battery FSM, flash unlock, memory protection
├── lcd.rs (1302 lines) — LCD controller, 5-state DMA engine, palette
├── timer.rs (671 lines) — 3× GPT with 2-cycle interrupt delay pipeline
├── rtc.rs (674 lines) — Real-time clock, 3-state machine, 6 interrupt types
├── keypad.rs (899 lines) — 8×7 matrix, scan modes, edge detection
├── spi.rs (627 lines) — SPI controller, 16-deep FIFO, panel stub
├── interrupt.rs(448 lines) — 2-bank interrupt controller with inversion/latching
├── flash.rs (576 lines) — Flash controller registers, wait state management
├── sha256.rs (307 lines) — SHA-256 block compression (64-round)
├── panel.rs (223 lines) — ST7789V LCD panel stub (SPI target)
├── backlight.rs(72 lines) — PWM backlight brightness
└── watchdog.rs (248 lines) — Watchdog timer stub2.3 Design Principles
No platform dependencies in the core. The Rust core has no
std::fs, nostd::io, nostd::net. It doesn't know about files, logging, threading, or what platform it's running on. Everything flows through byte buffers via the C ABI. The only external crates arewasm-bindgen/js-sys/web-sys(optional, gated behind thewasmfeature flag), andchrono(dev-only, for tests).Stable C ABI. All exports use
extern "C"with#[no_mangle]. TheSyncEmuwrapper wrapsEmuin aMutex<Emu>for thread safety. Raw pointers are used at the FFI boundary, withBox::into_raw/Box::from_rawfor lifecycle management.Single-threaded deterministic core. The emulator is purely deterministic — given the same ROM and inputs, it produces the same outputs every time. Threading is the platform's responsibility.
Buffer-based I/O. ROM loaded as
&[u8], framebuffer exposed as*const u32(ARGB8888), save states serialized toVec<u8>. No file handles cross the FFI boundary.
3. The Rust Emulator Core
3.1 CPU: The eZ80 Processor
The TI-84 Plus CE uses a Zilog eZ80 processor — an extended Z80 with 24-bit addressing (ADL mode). This is NOT a standard Z80; it has critical differences that are poorly documented and caused numerous bugs during development.
CPU State (Cpu struct, cpu/mod.rs):
- Main registers:
a: u8,f: u8,bc/de/hl: u32(24-bit) - Shadow registers:
a_prime,f_prime,bc_prime,de_prime,hl_prime - Index registers:
ix,iy(24-bit) - Dual stack pointers:
sps(Z80 16-bit),spl(ADL 24-bit) — selected by L mode flag - Special:
pc: u32,i: u16,r: u8,mbase: u8 - State flags:
iff1,iff2,im: InterruptMode,adl,halted - Per-instruction mode:
l,il,suffix,madl,prefix,prefetch ei_delay: u8— 2-step delayed interrupt enable (EI enables interrupts after the NEXT instruction)
Instruction Execution (execute.rs, 2646 lines — the largest file):
The CPU uses x-y-z-p-q opcode decomposition. Each instruction goes through:
Prefetch: Return the previously-prefetched byte, read the next byte into the prefetch buffer. This mirrors CEmu's hardware prefetch and is critical for cycle accuracy — without it, cycle counts were ~50% too low.
Suffix detection loop: Opcodes 0x40, 0x49, 0x52, 0x5B are suffix opcodes (.SIS, .LIS, .SIL, .LIL) that modify the next instruction's L/IL modes. They execute atomically with the following instruction — a single
step()call, not two. Getting this wrong caused trace count mismatches with CEmu.Dispatch: Based on the x field (bits 7:6): x=0 →
execute_x0, x=1 → LD r,r' or HALT, x=2 → ALU A,r, x=3 →execute_x3. Prefixed instructions (CB/ED/DD/FD) have their own dispatch tables.Interrupt check: If
iff1 && irq_pending, push return address and jump to 0x0038.
eZ80 Architectural Surprises (each of these caused boot failures):
| Discovery | Impact | How Found | | ---------------------------------------------------------------------------- | ------------------------------------------------ | ------------------------------------------- | -------------------- | | IM2 = IM1 on eZ80 (ignores I register, jumps to 0x0038) | Implementing standard Z80 IM2 crashed the boot | Trace comparison at ~9K steps | | Separate SPS/SPL stack pointers | Mixed-mode CALL/RET pushed wrong-width addresses | CEmu source reading + test failures | | Suffix opcodes execute atomically | Step count mismatched CEmu traces | Trace comparison showed 2 steps vs 1 | | R register rotation: LD R,A: (A<<1) | (A>>7) | R register diverged from CEmu | Parity test failures | | LD A,MB (ED 6E) — load memory base register | #1 boot blocker, not in Z80 docs | ROM disassembly | | F3/F5 flags preserved from previous F in ALU ops | Flag divergence in SPI polling loops | 29 dedicated parity tests | | ON key wakes from HALT even with DI | Boot sequence stalled at HALT | CEmu keypad_on_check() comparison | | OS Timer is a 4th timer (32K crystal, not documented) | Boot hangs at ~20M cycles without it | ROM code analysis showed it waits for bit 4 | | Block I/O instructions execute atomically (INI/IND + eZ80-specific variants) | Trace count mismatches | CEmu source comparison |
3.2 Bus: Memory Routing and Address Decoding
The Bus struct (bus.rs, 1929 lines) handles all memory access for the emulator. The 24-bit address space is decoded as:
| Range | Region | Size | Wait States |
|---|---|---|---|
| 0x000000–0x3FFFFF | Flash (ROM) | 4MB | 10 cycles read |
| 0x400000–0xCFFFFF | Unmapped | — | LFSR pseudo-random on read |
| 0xD00000–0xD3FFFF | RAM | 256KB | 4 read / 2 write |
| 0xD40000–0xD657FF | VRAM | ~150KB | 4 read / 2 write |
| 0xD65800–0xDFFFFF | Unmapped | — | LFSR pseudo-random |
| 0xE00000–0xFFFFFF | MMIO Ports | — | 2-4 cycles per port |
Flash unlock detection: The bus monitors the fetch stream for a magic byte sequence (the FLASH_UNLOCK_SEQUENCE — DI; JR; DI; IM2; IM1; OUT0/IN0; BIT 2,A) that the ROM uses to unlock flash writes. When detected during privileged code fetch (PC in ROM range), flash write mode is enabled. This is a 16-byte or 17-byte pattern match on a 32-byte fetch ring buffer.
Memory protection: Three mechanisms enforced in write_byte():
- Stack limit (ports 0x3A-0x3C): SP below limit triggers NMI
- Protected range (ports 0x20-0x25): Writes to protected range from unprivileged code trigger NMI and are blocked
- Flash privilege (port 0x28): Only privileged code can write to flash
Debug port interception: The bus intercepts writes to special addresses used by the CE toolchain's dbg_printf:
- 0xFB0000–0xFBFFFF: stdout (sequential address writes, NOT repeated writes to 0xFB0000)
- 0xFC0000–0xFCFFFF: stderr
- 0xFD0000: control (write 1 = clear console)
- Null byte at 0xFB0000 exactly = program exit sentinel
Flash cache model: A 2-way set-associative cache with 128 sets and 32-byte cache lines, returning 2 cycles (same line), 3 cycles (cache hit), or 197 cycles (cache miss). This matches CEmu's flash_cache implementation exactly.
3.3 Scheduler: The 7.68 GHz Base Clock
The scheduler (scheduler.rs, 702 lines) is the timing backbone of the emulator. Rather than maintaining separate tick counters for each hardware clock, it uses a single base clock at 7,680,000,000 Hz — the LCM of all hardware clocks:
| Clock | Rate | Base ticks per tick |
|---|---|---|
| CPU (speed 3) | 48 MHz | 160 |
| CPU (speed 2) | 24 MHz | 320 |
| CPU (speed 1) | 12 MHz | 640 |
| CPU (speed 0) | 6 MHz | 1280 |
| Panel | 10 MHz | 768 |
| Clock48M | 48 MHz | 160 |
| Clock24M | 24 MHz | 320 |
| Clock32K | 32.768 KHz | 234,375 |
Why the LCM approach? All hardware timer events can be scheduled in base ticks with pure integer arithmetic — no floating-point, no rounding errors, no drift. A timer ticking at 32.768 KHz fires every 234,375 base ticks, which divides evenly into the base clock rate. This design was ported directly from CEmu's schedule.c.
9 event types: RTC, SPI, TimerDelay, Timer0, Timer1, Timer2, OsTimer, LCD, LcdDma.
Overflow prevention: The process_second() method subtracts one second (7,680,000,000 base ticks) from all timestamps whenever the base counter crosses a second boundary. This prevents u64 overflow while maintaining relative timing. The INACTIVE_FLAG (bit 63) marks disabled events — since timestamps never reach bit 63 due to the one-second normalization, this bit is safely repurposed.
HALT fast-forward: When the CPU enters HALT, instead of spinning cycle-by-cycle, the emulator calls scheduler.cycles_until_next_event() and jumps forward to the next event. This is critical for performance — boot involves many HALT periods.
3.4 Peripherals: 13 Hardware Modules
LCD Controller (lcd.rs, 1302 lines)
The LCD runs a 5-state DMA engine:
FRONT_PORCH → SYNC → LNBU → BACK_PORCH → ACTIVE_VIDEO → (repeat)- FRONT_PORCH: Idle period before sync
- SYNC: Horizontal/vertical sync pulses. Timing registers are parsed here.
- LNBU: Line Buffer Update. Prefills a 256-byte FIFO before active video begins.
- BACK_PORCH: Idle after sync. DMA is scheduled during this phase.
- ACTIVE_VIDEO: Actual pixel DMA from VRAM at 0xD40000. The UPCURR register increments as pixels are transferred.
Two separate clock domains: LCD state machine events on CLOCK_24M (24 MHz), DMA transfers on CLOCK_48M (48 MHz). The process_dma() method handles pixel-by-pixel DMA advancement, while fast_forward_dma_events() provides O(1) bulk skip for performance.
DMA cycle stealing (~7.7% overhead): The LCD DMA and CPU contend for the memory bus. Rather than explicitly scheduling CPU wait cycles, the emulator tracks dma_last_mem_timestamp and calculates elapsed DMA cycles on each CPU memory access via process_dma_stealing(). This adds ~13M cycles to the 168M boot (~7.7% overhead, matching CEmu exactly).
Palette: 256 entries stored as 1555 ARGB, converted to both BGR565 and RGB565 on write. The 8bpp mode (used by DOOM) indexes into this palette for each pixel.
Cursor image RAM (0xE30800–0xE30BFF, 1024 bytes): A discovery made during DOOM support — LibLoad (a CE C library loader) uses this LCD hardware register space as scratch storage. Without implementing it, CE C programs crash.
Timer System (timer.rs, 671 lines)
Three General Purpose Timers with a shared 32-bit control register (3 bits per timer: enable, clock source, overflow enable) and a 2-cycle interrupt delay pipeline:
- Cycle 0: Timer match/overflow detected →
delay_statusbit set - Cycle 1: Status becomes visible →
delay_intrptbit set - Cycle 2: Interrupt actually fires
This pipeline is implemented via the TimerDelay scheduler event and process_delay() state machine. Getting this wrong caused the graphing hang bug — timer interrupts fired too early, and the ISR looped infinitely because the status bits weren't visible yet.
RTC (rtc.rs, 674 lines)
Real-time clock with a 3-state machine:
- TICK: Normal time counting (sec → min → hour → day with rollovers)
- LATCH: Time latched for reading (prevents tearing)
- LOAD_LATCH: Loading new time from load registers (bit-level transfer with write masks)
The load process is particularly complex: writing control bit 6 triggers a load that takes 51 ticks at 32 KHz to complete. A status register at offset 0x40 returns a bitmask showing which fields (sec/min/hour/day) have been transferred. The ROM polls this register during boot, and getting the timing wrong caused the emulator to boot into "Classic" mode instead of "MathPrint" mode.
Constants: LATCH_TICK_OFFSET = 16429 ticks (~0.5 seconds at 32 KHz). The RTC is scheduled at this offset from the start of each second.
OS Timer (in peripherals/mod.rs)
A 4th timer source not found in standard Z80 documentation, running off a 32.768 KHz crystal. Tick intervals vary by CPU speed: 73 ticks at 6 MHz (~449 Hz), 153 at 12 MHz (~214 Hz), 217 at 24 MHz (~151 Hz), 313 at 48 MHz (~105 Hz). The ROM enables OS Timer interrupt bit 4 and waits — without it, boot stalls indefinitely.
The OS Timer has a subtle ordering requirement matching CEmu: the interrupt state is set to the OLD value before toggling. Getting this wrong caused the OS Timer to fire at the wrong phase.
SPI Controller (spi.rs, 627 lines)
16-deep FIFO (not 4 as initially assumed), with a critical RX-only mode: when CR0 bit 11 (FLASH) is set and the TX FIFO is empty, transfers continue automatically filling the RX FIFO. Without this, the ROM's second SPI polling loop exits early at step ~699,910.
The SPI controller drives a ST7789V LCD panel stub (panel.rs, 223 lines) that absorbs 9-bit SPI frames and parses panel initialization commands (MADCTL, COLMOD, CASET, RASET, RAMWR).
Interrupt Controller (interrupt.rs, 448 lines)
Two banks of 32-bit interrupt registers (status, enabled, latched, inverted). The set_source() method handles edge/level/inverted semantics: if (set XOR (inverted & mask)) then set status bit, else clear it (preserving latched). The pulse() method creates proper edges for inverted+latched signals like WAKE.
Interrupt sources: ON_KEY(0), TIMER1(1), TIMER2(2), TIMER3(3), OSTIMER(4), KEYPAD(10), LCD(11), PWR(15), WAKE(19).
3.5 Execution Loop
The main execution loop (emu.rs, run_cycles()) is the heart of the emulator:
for each cycle budget:
1. Record opcode in execution history (64-entry ring buffer for crash diagnostics)
2. Handle any_key_wake (clears HALT if any key pressed)
3. Execute one CPU instruction via cpu.step(bus)
4. Check armed instruction trace (for debugging)
5. Advance scheduler by elapsed CPU cycles
6. Handle CPU speed changes (port 0x01 writes require cycle conversion)
7. Process all pending scheduler events (RTC, SPI, timers, LCD, DMA)
8. Process DMA cycle stealing
9. Schedule SPI transfers if needed
10. Check NMI flag (memory protection violations)
11. Tick peripherals (timers with delay pipeline, keypad, OS Timer)
12. HALT fast-forward (batch up to 10,000 cycles, cap at scheduler second boundary)
13. Periodic diagnostic logging (every 60 frames)Frame rendering is separate from execution: render_frame() dispatches to either render_frame_8bpp() (palette lookup using BGR565) or render_frame_16bpp() (direct RGB565 from VRAM) based on the LCD's BPP mode.
4. Cross-Platform Frontends
4.1 Shared Design
All three platforms use identical percentage-based layout constants:
- LCD position: left 11.53%, top 6.92%, width 76.74%, height 24.92% of calculator body
- Keypad area: top 34.57%, height 65.43% of calculator body
- D-pad region: left 63.97%, top 13.72%, width 22.01%, height 14.74% of keypad area
- Body aspect ratio: 963/2239 (from the calculator body image)
- 49 button regions with identical percentage coordinates across all platforms, derived from a real TI-84 CE photograph via
scripts/extract_buttons.py
All platforms use the same emulation timing: 800,000 cycles per frame at 60 FPS (= 48 MHz real-time), with non-linear speed steps from 0.25× to 20×.
State persistence uses SHA-256 of the ROM data, truncated to 16 hex chars, as a key. States are namespaced by backend type ("rust:<hash>" or "cemu:<hash>").
4.2 Android (android/, Kotlin + Jetpack Compose)
MainActivity.kt(~2100 lines, monolithic): Compose UI withEmulatorScreencomposable. Emulation loop runs onDispatchers.Defaultcoroutine. Framebuffer copied to Bitmap viasetPixels().EmulatorBridge.kt: Loadslibemu_jni.soviaSystem.loadLibrary, which dynamically loads backend.sofiles. 18 JNI method declarations.jni_loader.cpp(~630 lines):BackendInterfacestruct with function pointer table.loadBackend()usesdlopen()to loadlibemu_<name>.so. Thread-safe log callback deque (max 200 entries) forwarded to Android logcat.cemu_adapter.c(~595 lines): Wraps CEmu's global-state API into instance-based interface. Singleton pattern. State save/load uses temp files because CEmu only supportsFILE*API.StateManager.kt: Thread-safe singleton, SHA-256 ROM hashing, auto-delete corrupted states.- Image keypad:
ImageKeyButtoncomposables with 2dp travel on press, brightness darkening viaColorMatrix(0.82). - D-pad: Canvas-drawn circular D-pad with arc segments, arrow indicators, hit-testing via angle/radius calculation.
4.3 iOS (ios/, Swift + SwiftUI)
ContentView.swift:EmulatorStateclass (ObservableObject) manages emulation lifecycle. Emulation loop runs onTask.detached(priority: .userInitiated).EmulatorBridge.swift:NSLock-protected access to opaque C pointer.makeImage()createsCGImagefrom framebuffer usingCGContextwithpremultipliedFirst | byteOrder32Little.backend_bridge.c(~309 lines): Static linking variant. WhenHAS_RUST_BACKENDdefined, declaresrust_emu_*extern functions.current_backendpointer switches betweenrust_backendandcemu_backendconst structs.- 3 Xcode build configurations:
Backend-Rust.xcconfig(links-lemu_rust),Backend-CEmu.xcconfig(links-lemu_cemu),Backend-Both.xcconfig(both). ImageKeypadView.swift: Same 49 button regions,DragGesture(minimumDistance: 0)for press detection.AppState: Singleton monitoringscenePhasefor auto-save on background.
4.4 Web (web/, React + TypeScript + Vite)
Calculator.tsx(~800 lines): Main component withrequestAnimationFrameloop, time accumulator for frame pacing, safety cap of 4 frames per rAF tick (30 in turbo mode).RustBackend.ts: Wraps wasm-bindgenWasmEmuclass. State persistence uses WASM memory snapshots — dumps the entire linear memory (~29MB viamemcpy, ~4ms) rather than field-by-field serialization. On restore, grows memory if needed and copies back. Custom binary format with "WM01" header.CEmuBackend.ts: Wraps Emscripten module. Sets up global stubs (emul_is_inited,emul_is_paused). ARGB→RGBA pixel conversion on every frame.- Chess auto-launch: Polls canvas pixel (310, 10) for green battery icon to detect homescreen ready, then fires Prgm→Prgm→2→Enter key sequence.
- PWA:
vite-plugin-pwawith versioned ROM/WASM caching, update banner, offline support. ROM manifest system for tracking content hashes. - Drag-and-drop: Supports dragging both ROM files and .8xp/.8xv programs onto the calculator.
- Keyboard mapping: 50+ key bindings including F1-F5 for function row, Shift for 2nd (solo-press detection), V for sqrt (2nd + x² combo), Ctrl+R for resend programs.
5. Technical Tradeoffs & Decisions
5.1 Cycle-Accurate Scheduler via LCM Base Clock
Decision: Use a single 7.68 GHz base clock instead of separate tick counters per hardware clock.
Tradeoff: Large tick values (u64 required) but zero floating-point error. All hardware timers divide evenly into the base clock. The process_second() overflow prevention method keeps values bounded.
Alternative considered: Per-clock tick counters with conversion functions. Rejected because fractional cycles accumulate rounding errors over millions of instructions, causing drift that's impossible to debug.
5.2 Prefetch Pipeline Emulation
Decision: Implement a single-byte prefetch buffer that charges memory access cycles during the current instruction.
Tradeoff: Added complexity to every instruction fetch path (every call to fetch_byte() must handle the prefetch buffer). But without it, cycle counts were ~50% of CEmu's (10 cycles instead of 20 for flash reads). This was non-negotiable for parity.
5.3 Manual Serialization vs. Serde
Decision: Custom to_bytes()/from_bytes() for state snapshots, with a STATE_VERSION byte.
Tradeoff: Precise control over byte layout and smaller snapshots, but maintenance burden — STATE_VERSION was bumped 8 times as peripherals were added, and missed fields caused "RAM Cleared" bugs.
Alternative explored: Serde+bincode in a separate worktree (calc-serde branch). Required custom handling for types like [u8; 4194304] (flash) that don't implement serde traits. Not merged.
Web's approach: Bypassed the problem entirely by snapshotting the entire WASM linear memory (~29MB memcpy in ~4ms). This eliminates all serialization bugs at the cost of larger save files.
5.4 Image-Based Keypad vs. Programmatic Buttons
Decision: Pivot from programmatic gradient buttons to a photograph-based overlay with percentage-based hit regions.
Tradeoff: More realistic appearance and perfect cross-platform consistency (same coordinates file used everywhere), but harder to modify and larger asset size. The extract_buttons.py script crops individual button images from a high-res TI-84 CE photo and generates button_regions.json.
5.5 Visual Polling for Auto-Launch
Decision: Detect homescreen readiness by checking a single canvas pixel (310, 10) for the green battery indicator, rather than using fixed timing delays.
History: The chess auto-launch went through 6 iterations: fixed 3.5s delay → 2s delay → 1.2s delay → center pixel check → status bar check → single battery pixel check.
Tradeoff: Adapts to actual boot speed (different ROMs boot at different speeds), but fragile if the OS skin changes.
5.6 DMA Cycle Stealing via Timestamp Tracking
Decision: Track dma_last_mem_timestamp and calculate stolen DMA cycles lazily on each CPU memory access, rather than explicitly scheduling CPU wait states.
Tradeoff: Simpler code (no explicit wait state scheduling), but harder to debug because the timing effect is implicit. The ~7.7% cycle overhead (13M of 168M boot cycles) emerges from the interaction between DMA and CPU memory access patterns.
5.7 Debug Port Interception via Bus Cold Path
Decision: Intercept writes to 0xFB0000-0xFDFFFF (CE toolchain debug ports) in the bus's unmapped MMIO cold path.
Critical discovery: sprintf writes to SEQUENTIAL addresses (0xFB0000, 0xFB0001, ...), NOT repeated writes to the same address. A null byte at 0xFB0000 exactly means program exit; a null at any other offset means flush the buffer. This took multiple debugging sessions to understand.
6. The CEmu Parity Campaign
6.1 Overview
The parity campaign was the single largest engineering effort in the project — a systematic 7-phase overhaul to make the Rust emulator match CEmu's behavior at the instruction level. The campaign was driven by an extensive toolchain of trace generation, comparison, and analysis tools.
6.2 The Parity Toolchain
Trace generation (Rust side):
cargo run --example debug -- trace [steps]— Generates space-separated trace files with step, cycles, PC, SP, AF, BC, DE, HL, IX, IY, ADL, IFF1, IFF2, IM, HALT, opcodecargo run --example debug -- fulltrace [steps]— Generates comprehensive JSON traces including all I/O operations per instruction (RAM reads/writes, MMIO port access)
Trace generation (CEmu side):
tools/cemu-test/trace_gen.c— Links against CEmu'slibcemucore.a, generates traces in the exact same format as the Rust sidetools/cemu-test/parity_check.c— Checks CEmu state at 14 cycle milestones (1M through 60M)
Comparison tools:
scripts/compare_traces.py— PC-synced comparison with prefix lookahead (CEmu counts DD/FD prefixes as separate instructions)scripts/find_first_divergence.py— JSON fulltrace comparison with I/O operation matchingcore/examples/dense_compare.rs— PC-aligned comparison using HashMap-based lookup with 5-step lookahead
Targeted investigation tools (11 specialized Rust examples):
find_divergence.rs— Tracks PC at known CEmu cycle checkpointsscheduler_debug.rs— Monitors RTC event timing in schedulerrtc_timing_compare.rs— Compares RTC load timing between implementationscheck_0072fa.rs— Single-steps 70M cycles checking specific poll loop addressmathprint_check.rs— Monitors MathPrint flag at cycle checkpoints- Plus 6 more specialized analysis tools
6.3 The 7 Phases
Phase 1: CPU Instruction Correctness (Effort: L)
- Fixed RETI IFF1 restore
- Fixed register pair mapping (ED x=0 z=7 p=3: IY→IX)
- Added missing eZ80 instructions: LD I,HL (ED C7), LD HL,I (ED D7), LEA IY,IX+d (ED 55)
- Implemented block I/O (all Z80 + eZ80-specific variants)
- Fixed EX DE,HL L-mode masking
- Fixed block BC decrement (preserve BCU in Z80 mode)
- Verification: Boot 132.79M cycles, PC=085B80. 250/436 tests passing.
Phase 2: Bus & Address Decoding (Effort: M)
- Flash routing for 0x400000–0xBFFFFF
- MMIO unmapped holes
- Port range 0xF routing
- SPI in memory-mapped path
- Backlight routing
- Verification: Boot 132.79M, 251/436 tests.
Phase 3: Peripheral Register Layout Rewrites (Effort: XL — the largest phase)
- Timer rewrite: Replaced 3 separate Timer structs with unified
GeneralTimers. Shared 32-bit control register (3 bits per timer), status register, mask register. - Keypad register packing: Single 32-bit control register with mode (2 bits) + rowWait (14 bits) + scanWait (16 bits). 16 data registers. GPIO enable. Reset mask 0xFFFF.
- Watchdog offset fix: Counter and load value offsets were SWAPPED. Revision corrected to 0x00010602.
- Verification: Boot 156.10M, 272/457 tests.
Phase 4: Scheduler & Timing (Effort: L)
- SCHED_SECOND overflow prevention
- CPU speed change event conversion
- Panel clock rate 60Hz → 10,000,000 Hz
- OS Timer interrupt phase fix (set state to OLD value before toggling)
- Timer 32 KHz clock source
- Timer 2-cycle interrupt delay pipeline: Match detected → status visible → interrupt fires. Required
TimerDelayscheduler event,process_delay()state machine,delay_status: u32anddelay_intrpt: u16fields. - Verification: Boot 156.10M, 272/457 tests.
Phase 5: RTC, SHA256, Control Ports (Effort: M)
- RTC 3-state machine with time counting, latching, and load data transfer
- SHA256 64-round compression function
- Control port masks (port 0x01:
& 0x13, port 0x29:& 1) - Flash size_config reset (0x07 → 0x00)
- INT_PWR interrupt on reset
- Verification: Boot 108.78M, 259/437 tests.
Phase 6: LCD & SPI Enhancements (Effort: L)
- LCD DMA 5-state event machine with dual clock domains
- 256-entry palette storage
- SPI panel stub (ST7789V)
- LCD ICR, MIS, UPCURR, LPCURR registers
- Verification: Boot 156.10M, 277/455 tests.
Phase 7: CPU Advanced & Bus Protection (Effort: XL, Risk: High)
- Separate SPS/SPL stack pointers (CPU SNAPSHOT_SIZE 64 → 67)
- Mixed-mode CALL/RET/RST with MADL|ADL flag byte
- Memory protection (stack limit NMI, protected range, flash privilege)
- DMA scheduling with cycle stealing (~7.7%)
- HALT fast-forward, interrupt prefetch_discard, R rotation
- Verification: Boot 168.14M cycles, 277/455 tests. ALL PHASES COMPLETE.
6.4 The Comparison Report
PR #56's supporting document (docs/pr_comparison_report.md) was generated by 8 parallel analysis agents, each comparing a different subsystem of the Rust emulator against CEmu. It identified ~150 specific discrepancies organized into three tiers:
- 20 critical issues (CPU missing instructions, register layout mismatches, entirely missing peripherals)
- 19 high issues (timing differences, routing bugs, missing clock sources)
- 25+ medium issues (edge cases, missing features, reset value differences)
This document served as the roadmap for all 7 phases.
7. AI Agent Orchestration for Testing & Debugging
7.1 Overview of AI's Role
The project was ~80% AI co-authored (262/332 commits). Claude Code was used in 37+ main conversation sessions with 472 subagent invocations across those sessions, generating ~73 MB of conversation data (~9 MB in main sessions, ~64 MB in subagent sessions). The heaviest sessions had 89 subagent files (session 13baee3d, Feb 8) and 76 subagent files (session dc8b876a, Jan 30-31).
Hunter directed the architecture, defined the parity methodology, tested on real devices, identified bugs, and corrected the AI when it went wrong. Claude served as implementation workhorse — writing code, running traces, and deploying subagents per Hunter's direction. The division of labor was:
| Hunter (direction & decisions) | AI (execution & research) |
|---|---|
| Designed dual-backend architecture | Deployed up to 89 parallel subagents for research |
| Defined parity methodology & toolchain | Read and compared CEmu C source code against Rust |
| Tested on real Android/iOS devices | Wrote implementation code per Hunter's direction |
| Identified bugs via device testing | Generated traces and analyzed divergences |
| Corrected wrong AI approaches | Created PRs with descriptions |
| Decided what to build/cut/revert | Rebaked ROMs (17 consecutive times in one session) |
| Set quality bar ("I want perfect") | Iterated until Hunter's bar was met |
7.2 Subagent Usage Patterns
Subagents were the primary mechanism for scaling the AI's research and implementation capacity. The project used 472 subagent sessions across the 37 main sessions, with distinct usage patterns:
Research subagents (the most common pattern): These were deployed in parallel to read and analyze source code. The most dramatic example was the 8-agent peripheral audit (PR #56), where 8 subagents simultaneously compared each peripheral module in the Rust emulator against its CEmu C counterpart. Each agent would:
- Read the CEmu C source file (e.g.,
cemu-ref/core/timers.c) - Read the corresponding Rust source file (e.g.,
core/src/peripherals/timer.rs) - Produce a detailed report of every register offset, default value, timing behavior, and control flow difference
This pattern was repeated at smaller scale throughout the project. When investigating a bug, 2-3 research subagents might be deployed simultaneously — one reading the CEmu source for a specific subsystem, one reading the Rust implementation, and one analyzing trace divergences.
Implementation subagents: For larger feature work (e.g., the 7-phase parity overhaul), subagents were used to implement specific fixes within a phase while the main agent coordinated. A typical pattern:
- Main agent defines the fix needed (e.g., "rewrite timer register layout to match CEmu's packed 32-bit format")
- Subagent reads both implementations, writes the new Rust code
- Main agent integrates, runs tests, verifies parity
Worktree subagents: Claude Code's worktree feature was used for parallel feature development. Subagents worked in isolated git worktrees (e.g., calc-worktrees/rtc-ticking, calc-worktrees/doom, calc-worktrees/image-keypad) to develop features without blocking the main branch. The image keypad worktree had its own memory file with 6 subagent sessions documenting SwiftUI pitfalls and button region coordinate systems.
Session intensity distribution:
- 2 sessions with 75+ subagents (the boot-to-homescreen push on Jan 30-31, and the image keypad + parity work on Feb 8)
- ~5 sessions with 20-40 subagents (major feature work)
- ~10 sessions with 5-15 subagents (focused debugging or feature implementation)
- ~20 sessions with 0-5 subagents (quick fixes, documentation, interview prep)
Cross-session knowledge transfer: Because subagent context is lost when a session ends, the project relied on persistent files for continuity:
MEMORY.md(68 lines of distilled learnings in.claude/projects/)CLAUDE.md(7.6KB of workflow docs, memory map, trace formats in repo root)docs/findings.md(15KB of hardware discoveries)docs/milestones.md(7.1KB phase tracker)
Each new session's first action was typically reading these files to rebuild context. When a session hit context limits (which happened frequently during the parity campaign), the findings from that session were written to findings.md or milestones.md before ending, ensuring the next session could continue.
7.3 The Parity Testing/Debugging Loop
The CEmu parity campaign was Hunter's core methodology for achieving correctness: systematically compare every subsystem against the reference implementation, prioritize fixes by dependency order, and verify each fix pushes the divergence point further. The AI executed this methodology at scale. Here's how it worked:
Phase 1: Systematic Comparison Research
Hunter directed a comprehensive audit of every peripheral subsystem against CEmu's C source. Claude deployed 8 parallel subagents simultaneously, each assigned a different subsystem:
- Agent comparing CPU execution (cpu.c vs execute.rs)
- Agent comparing bus/memory (bus.c/mem.c vs bus.rs/memory.rs)
- Agent comparing LCD/backlight (lcd.c vs lcd.rs)
- Agent comparing timers (timers.c vs timer.rs)
- Agent comparing interrupt controller (interrupt.c vs interrupt.rs)
- Agent comparing keypad (keypad.c vs keypad.rs)
- Agent comparing RTC/SHA256 (realclock.c/sha256.c vs rtc.rs/sha256.rs)
- Agent comparing scheduler/control (schedule.c/control.c vs scheduler.rs/control.rs)
Each agent read both the Rust and C source files in full and produced a detailed report of every discrepancy — register layout differences, timing behavior differences, missing features, wrong default values. The reports were merged into docs/pr_comparison_report.md (~150 issues).
Phase 2: Prioritized Fix Implementation
Hunter organized the ~150 issues into 7 phases based on dependency ordering (CPU first, then bus, then peripherals, then timing). Each phase had:
- Clear deliverables (specific registers to fix, specific behaviors to match)
- A verification checkpoint (boot cycle count + test count)
- Effort and risk ratings
The AI implemented fixes per the phase plan, then verified by generating a trace and comparing against CEmu:
cargo run --example debug -- trace 100000 # Generate Rust trace
cd tools/cemu-test && ./trace_gen ../../ROM -n 100000 # Generate CEmu trace
python scripts/compare_traces.py cemu.log rust.log # ComparePhase 3: Divergence Bisection
When traces diverged, Hunter directed the AI to use a binary-search approach to find the exact instruction:
- Generate a long trace (100K+ steps)
- Find the first PC where Rust ≠ CEmu
- Read the surrounding instructions for context
- Look up the divergent instruction in CEmu's source
- Compare the implementation in Rust
- Fix the discrepancy
- Re-run and verify the fix pushed the divergence further out
This loop was repeated hundreds of times. Key milestones: 40K steps → 700K steps → 3.2M steps → full boot (3.6M steps pre-DMA, 168M with DMA).
Phase 4: Targeted Investigation Tools
When standard trace comparison was insufficient, Hunter directed the AI to build specialized investigation tools targeting specific subsystems. 11 custom Rust examples were created during the parity campaign:
rtc_timing_compare.rs— Compared RTC load timing between Rust and CEmu at 12 checkpoints from 0 to 50M cycles. Finding: "Our RTC load status returns 0x00 (complete) too early. This timing difference causes the poll loop at 0x0072FA to exit earlier."scheduler_debug.rs— Monitored RTC event scheduling. Finding: At 48 MHz, 1 CPU cycle = 160 base ticks; RTC fires every 16,429 ticks at 32 KHz = ~24M CPU cycles delay.check_0072fa.rs— Single-stepped 70M cycles checking one specific poll loop address that CEmu visits but Rust didn't. Finding: Different control flow due to RTC timing.mathprint_check.rs— Monitored the MathPrint flag (0xD000C4 bit 5) at 8 cycle checkpoints. Finding: The flag was never set to 0x20 (MathPrint mode) because the RTC timing caused a different code path.
Phase 5: Multi-Session Continuity
The parity campaign spanned multiple conversation sessions that hit context limits. To maintain continuity:
- The
MEMORY.mdfile in.claude/projects/stored distilled learnings (68 lines) - The
CLAUDE.mdfile in the repo documented workflow, key addresses, trace formats docs/findings.md(15KB) captured every hardware discoverydocs/milestones.md(7.1KB) tracked phase completion status
When a new session started, Claude would read these files to rebuild context, then continue from where the last session left off.
7.4 Specific Debugging Stories Driven by AI Agents
The SPI Divergence Hunt (699,900 → 3.2M steps):
- Divergence at step 699,900: After SPI STATUS read, CEmu A=0x20 (2 transfers pending), Rust A=0x00 (all complete)
- AI agents analyzed CEmu's scheduler-driven SPI:
sched_set(SCHED_SPI, ticks), transfer duration =bitCount * ((cr1 & 0xFFFF) + 1)ticks at 24MHz - The initial "complete ALL transfers on first read" approach worked at step 418K (3 transfers) but failed at 699K (6 transfers, only 4 should complete)
- Solution: Implement event-driven scheduler for SPI. This pushed parity to 3.2M steps.
- Next divergence at 3,216,456: RTC load status timing. This took 4 more dedicated investigation tools to diagnose.
The 8-Agent Peripheral Audit (Feb 5-6):
- 8 agents deployed simultaneously, each comparing a subsystem
- Total: ~150 issues identified across all peripherals
- Produced a 35KB comparison report that became the 7-phase implementation roadmap
- The agent findings revealed that some peripheral implementations were fundamentally wrong (e.g., timer register layout was completely different, keypad control register packing was wrong, watchdog offsets were swapped)
The MathPrint vs Classic Mode Investigation (5 custom tools):
- Problem: Emulator boots into Classic mode, CEmu boots into MathPrint mode
mathprint_check.rs: Found the flag is never set to 0x20check_0072fa.rs: Found the poll loop at 0x0072FA behaves differentlyrtc_timing_compare.rs: Found RTC load completes in ~75K cycles (Rust) vs ~24M cycles (CEmu)scheduler_debug.rs: Found the RTC event offset of 16,429 ticks was correct but the load processing timing was wrong- Resolution: Fix RTC load state machine to process at correct 32 KHz rate
7.5 Human-AI Interaction Patterns
Hunter's direction and quality enforcement:
- Set the parity standard: "i dont want close, i want perfect"
- Demanded correct approaches: "make sure this is the correct way to do things, i dont want to use hacky workarounds"
- Redirected flailing AI: "stop making random guesses and use comprehensive logging"
- Maintained project knowledge when AI lost context: "are you forgetting the things you learned in your findings?"
- Prioritized correctness over test coverage: "genuinely i dont give a shit if tests fail, they weren't catching shit before"
Hunter correcting the AI:
- AI focused on backlight brightness for the power-off bug when the real issue was the LCD enable bit — Hunter identified the correct root cause and redirected
- AI frequently forgot previous session findings across context boundaries — Hunter pointed it back to
findings.mdandmilestones.md - AI's interview prep contained unverifiable claims (e.g., calling a first-time-correct implementation a "redesign") — Hunter caught the fabrications and demanded accuracy
- AI over-engineered the PWA implementation — Hunter reverted the work and directed a simpler approach
- AI made the emulator boot into Classic mode instead of MathPrint — Hunter identified the symptom on device and directed the multi-tool investigation
What the AI executed well (when properly directed):
- Deploying parallel subagents for research before writing code
- Building specialized investigation tools for specific subsystems
- The trace → compare → fix → verify loop (Hunter's methodology, AI's execution)
- ROM baking automation (17 consecutive rebakes at 4-5 AM)
- Cross-file refactoring (updating all 3 platforms simultaneously)
8. Claude Code Usage Statistics
8.1 Session & Token Data
| Metric | Value |
|---|---|
| Main conversation sessions | 37 session directories (24 indexed in sessions-index.json) |
| Main JSONL conversation files | 4 files, 9.2 MB total |
| Subagent invocations | 472 subagent JSONL files, 64 MB total |
| Total conversation data | ~73 MB |
| PRs created from sessions | 13 |
| Unique branches worked on | 13 |
| Related worktree projects | 2 (calc-web, calc-worktrees-image-keypad) |
8.2 Token Usage (extracted from main JSONL files)
| Category | Tokens |
|---|---|
| Input tokens | 1,017,775 |
| Output tokens | 222,085 |
| Cache read tokens | 462,457,150 |
| Cache creation tokens | 73,134,709 |
| Total | ~537M tokens |
Note: These counts are from the 4 main JSONL files only. The 472 subagent sessions (64 MB of data) would add significantly more tokens — likely bringing the true total well above 1B tokens for the project.
8.3 Estimated Cost (Opus pay-per-token pricing)
| Category | Rate | Est. Cost |
|---|---|---|
| Input | $15/M tokens | $15 |
| Output | $75/M tokens | $17 |
| Cache reads | $1.50/M tokens | $694 |
| Cache creation | $18.75/M tokens | $1,371 |
| Main sessions total | ~$2,097 |
Note: This estimate uses Claude Opus 4 pricing. The actual cost depends on which model was used per session (Opus 4.5/4.6 vs Sonnet 4.5). Max plan subscribers pay a flat rate, so actual billing may differ. The subagent sessions' token costs are not included in this estimate.
8.4 Session Breakdown by JSONL File
| File | Size | Responses | Content |
|---|---|---|---|
e262789c.jsonl | 4.88 MB | 96 | Interview prep, chat export generation |
534ad7db.jsonl | 3.82 MB | 538 | Chess mode, auto-launch, ROM baking, Shift+R |
f814347a.jsonl | 285 KB | 22 | Test framework investigation |
e910f4c6.jsonl | 189 KB | 35 | Web frontend test setup (Vitest, 44 keypad tests) |
8.5 AI Model Distribution Across Commits
| Model | Commits | Notes |
|---|---|---|
| Claude Opus 4.5 | 215 | Primary model through mid-February |
| Claude Opus 4.6 | 74 | Adopted mid-February |
| Claude Sonnet 4.5 | 15 | Used sporadically |
| Codex | 1 | Single worktree snapshot |
| No AI attribution | 70 | Merge commits, manual fixes, config tweaks |
9. Major Bugs & Debugging Stories
9.1 The Magnitude Error (10⁹ bug)
Symptom: 6+7 showed 1300000000, 99*99 showed wrong answer.
Investigation: Spanned many sessions. Traced BCD floating-point operations, checked OP1-OP6 register addresses, examined TI-OS format buffer.
Root cause: Using self.adl instead of self.l for data register wrapping. L mode controls data addressing width (16-bit vs 24-bit), while ADL controls instruction/PC width. LDIR/LDDR were using 24-bit addressing when they should have used 16-bit for data operations.
False leads: Initially suspected display formatting (decimal point not written), then keypad timing, then OS Timer frequency.
9.2 The Graphing Hang
Symptom: Screen freezes after graphing, loader stops spinning.
Root cause: Timer raw bits accumulating and causing infinite ISR loops. Timer interrupt clearing in tick_peripherals was not properly clearing the raw status bits before re-evaluating.
Fix: Proper implementation of the 2-cycle timer interrupt delay pipeline.
9.3 The "Done" Bug
Symptom: First calculation after boot shows "Done" instead of numeric result.
Root cause: TI-OS expression parser not initialized. The OS expects an ENTER key to have been processed before the first calculation.
Fix: Auto-inject ENTER key on first user interaction (in Rust core, cross-platform).
9.4 ON Key Wake From Sleep
Symptom: ON button doesn't work after power-off (APO or 2nd+ON).
Investigation: Multiple sessions tracing CEmu's keypad_on_check(), control.off flag, power state.
Root causes (multiple): Battery status port returning hardcoded 0 instead of 0xFE (the OS rejected wakes because it thought the battery was dead). on_key_wake was one-shot instead of persistent. The APD (Automatic Power Down) disable wasn't clearing the right flag at 0xD00088.
9.5 The 50× Performance Regression
Caused by: Phase 4 scheduler changes (timer interrupt delay pipeline).
Root cause: Scheduler base_cycles_offset u64 underflow — subtracting a larger value from a smaller one wrapped around to near-maximum u64.
Fix: Added next_event_ticks cache to avoid scanning all events, fixed the underflow.
9.6 State Restore "RAM Cleared"
Symptom: Restoring a saved state showed "RAM cleared" message.
Root cause: Multiple peripheral fields not included in state snapshots: SPI controller state, watchdog state, cursor registers, needs_lcd_event, needs_lcd_clear, memory protection registers (stack limit, protected range). The OS detected inconsistency on restore.
Fix: Multiple STATE_VERSION bumps (reached version 8) to add missing fields. Spawned multiple investigation subagents to systematically audit what was and wasn't saved.
9.7 Chess Opening Books Not Loading
Symptom: Chess shows "BK:NONE" (file open fails) vs CEmu showing "BK:32768".
Root cause: fileioc (CE C library) stores curr_slot at LCD register address 0xE30C11 and resize_amount at 0xE30C0C — in LCD cursor image RAM (0x800-0xBFF) which wasn't implemented.
How found: Added breakpoint mechanism to Emu struct, used disasm command, traced through _ChkFindSym bcall. Multiple subagent sessions to understand fileioc's internals.
10. Running Real Software: DOOM & Chess
10.1 DOOM Support (PR #68)
Getting DOOM running required several subsystem additions:
- 8bpp LCD rendering: The calculator normally uses 16bpp RGB565, but DOOM uses 8bpp indexed color with a 256-entry palette. Added
render_frame_8bpp()with palette lookup. - .8xp/.8xv file parser (
ti_file.rs): Parse TI file format headers, checksums, variable entries. Supports programs, protected programs, AppVars. - Flash archive injection:
inject_archive_entry()writes flag byte 0xFC, 2-byte size, type, version, self-referential address, name, data into flash.find_archive_free_addr()scans sectors 0x0C0000-0x3B0000. - SendKey mechanism: Pokes OS RAM directly at 0xD0058C (kbdKey), 0xD0058E (keyExtend), bit 5 of 0xD0009F (keyReady). Uses
bus.poke_byte()to bypass memory protection. - Launch sequence: ENTER → CLEAR → Asm( → prgm → D,O,O,M → ENTER
- LCD cursor image RAM: Extended LCD address space to include 0x800-0xBFF (1024 bytes) used by LibLoad as scratch storage.
- Keypad range extension: Extended
KEYPAD_ENDfrom 0x150048 to 0x151000 (full 4KB page) because DOOM needed the full range.
10.2 Chess Integration
The chess engine (from the ce-games submodule) is a fully-featured chess program running on the eZ80:
- Alpha-beta negamax with PVS, aspiration windows, null-move pruning (R=2), LMR, futility pruning
- Texel-tuned PeSTO piece-square tables
- 4096-entry transposition table (always-replace)
- Polyglot opening book split across AppVars (TI-OS 64KB limit), up to 131K entries
- ~154K cycles/node, ~2000 Elo at Expert difficulty (15s/move)
- Automated tournament system (
emu_tournament.py) running eZ80 engine vs Stockfish
Web chess mode (/chess route): Fetches chess.bin (1.9MB gzipped ROM), auto-launches at 5× speed using visual polling for boot detection.
11. Development Timeline & Velocity
| Date | Day | PRs | Key Achievement |
|---|---|---|---|
| Jan 27 | 1 | #1-#4 | CPU + memory + peripherals. Full eZ80 in one afternoon. |
| Jan 28 | 2 | #6-#9 | 40K step parity. IM2 fix. OS Timer. |
| Jan 29 | 3 | #10-#13 | OS boots to home screen (3.6M steps). Scheduler implemented. |
| Jan 30-31 | 4-5 | #14-#20 | Keypad working. Magnitude error fixed (L vs ADL modes). |
| Feb 1 | 6 | #21-#33 | CEmu backend + iOS app + 13 PRs in one day. |
| Feb 2 | 7 | #34-#44 | Runtime backend switching. State persistence. Web app. |
| Feb 5-6 | 10-11 | #51-#58 | 7-phase CEmu parity overhaul (PR #56). WASM optimization. |
| Feb 7-9 | 12-14 | #59-#68 | DOOM runs. Image keypad. ON key wake. |
| Feb 10-14 | 15-19 | #69-#78 | Live file send. Debug port interception. Sudoku. |
| Feb 16-20 | 21-25 | #81-#86 | Chess mode. PWA offline. Shift+R dev shortcut. |
Key velocity facts:
- Boot-to-homescreen achieved in 3 days from initial commit
- Full eZ80 CPU (3,675 lines, 124 tests) implemented in one afternoon
- Tri-platform support (Android + iOS + Web) in 6 days
- From "first instruction runs" to "DOOM runs" in 13 days
12. PR & Commit History Analysis
12.1 Merge Patterns
- Squash merges: ~43 (64%) — used for most feature branches from PR #16 onward
- Merge commits: ~24 (36%) — used for earlier PRs and larger merges
- No rebases on main — clean linear history via squash
12.2 AI Co-Authorship
- 262/332 commits (79%) have AI co-author attribution
- Claude Opus 4.5: 215 commits (primary model through mid-Feb)
- Claude Opus 4.6: 74 commits (adopted mid-February)
- Claude Sonnet 4.5: 15 commits (sporadic)
- Codex: 1 commit (worktree snapshot)
- 70 commits (21%) have no AI attribution (merge commits, manual fixes, config tweaks)
12.3 Reverts
One revert: commit 645aeb1 reverted PR #2's CPU implementation immediately after squash-merge, then re-merged from a different branch topology. This was a merge strategy correction, not a code quality issue.
12.4 Closed PRs (8 total)
- PRs #46-#50 (5 individual CEmu parity features): All absorbed into monolithic PR #56
- PR #35 (unified state persistence): Superseded by per-platform PRs #39, #41, #43
- PR #63 (state restore perf): Incorporated into later PRs
- PR #85 (chess + Shift+R): Split into PRs #84 and #86
12.5 PR Dependency Chains
- Core emulation: #1→#2→#4→#6→#7→#8→#9→#10→#11→#12→#13 (boot achieved)
- CEmu parity: #22→#23→#30→#45→#51→#56→#59
- Gaming: #61→#68→#70→#71→#74→#77→#78→#82→#84
- State persistence: #39→#41→#42→#43→#53→#57→#81
13. Key Files Reference
Core Emulator
| File | Lines | Purpose |
|---|---|---|
core/src/emu.rs | 3168 | Main orchestrator, execution loop, frame rendering |
core/src/cpu/execute.rs | 2646 | All instruction execution (largest file) |
core/src/bus.rs | 1929 | Memory routing, flash unlock, debug ports |
core/src/disasm.rs | 1544 | Full eZ80 disassembler |
core/src/peripherals/lcd.rs | 1302 | LCD controller + 5-state DMA engine |
core/src/peripherals/keypad.rs | 899 | 8×7 key matrix, scan modes, edge detection |
core/src/cpu/helpers.rs | 858 | ALU, register access, prefetch |
core/src/peripherals/control.rs | 838 | CPU speed, battery FSM, memory protection |
core/src/scheduler.rs | 702 | Event scheduler, 7.68 GHz base clock |
core/src/peripherals/rtc.rs | 674 | RTC 3-state machine |
core/src/peripherals/timer.rs | 671 | 3× GPT with delay pipeline |
core/src/peripherals/spi.rs | 627 | SPI + 16-deep FIFO |
core/src/peripherals/mod.rs | 599 | Port routing, tick orchestration |
core/src/memory.rs | 587 | Flash + RAM + NOR commands |
core/src/peripherals/flash.rs | 576 | Flash controller registers |
core/src/peripherals/interrupt.rs | 448 | 2-bank interrupt controller |
core/src/lib.rs | 432 | C ABI exports, SyncEmu |
core/src/ti_file.rs | 379 | .8xp/.8xv parser |
core/src/wasm.rs | 325 | WASM bindings |
core/src/peripherals/sha256.rs | 307 | SHA-256 compression |
core/include/emu.h | 52 | C ABI contract |
Debug Tools
| File | Purpose |
|---|---|
core/examples/debug.rs (~2900 lines) | Swiss Army knife CLI: boot, trace, fulltrace, screen, vram, calc, sendfile, bakerom, run, rundoom |
tools/cemu-test/trace_gen.c | CEmu trace generator in matching format |
tools/cemu-test/parity_check.c | CEmu state checker at cycle milestones |
scripts/compare_traces.py | PC-synced trace comparison |
scripts/find_first_divergence.py | JSON fulltrace comparison with I/O matching |
Documentation
| File | Purpose |
|---|---|
docs/findings.md (15KB) | All hardware discoveries, bug findings, lessons |
docs/milestones.md (7.1KB) | 7-phase parity roadmap (all complete) |
docs/pr_comparison_report.md | 8-agent CEmu comparison (~150 issues) |
CLAUDE.md (7.6KB) | Claude Code workflow, memory map, trace format |
README.md (17KB) | Comprehensive project docs |
outline_v2.md | Blog post draft with narrative framing |
Platform Frontends
| File | Lines | Purpose |
|---|---|---|
android/.../MainActivity.kt | ~2100 | Monolithic Android UI |
android/.../jni_loader.cpp | ~630 | JNI + dynamic backend loading |
android/.../cemu_adapter.c | ~595 | CEmu wrapper for Android |
web/src/Calculator.tsx | ~800 | Main web component |
web/src/emulator/RustBackend.ts | — | WASM memory snapshot save/load |
ios/Calc/Bridge/EmulatorBridge.swift | — | Swift FFI bridge |
ios/Calc/Bridge/backend_bridge.c | ~309 | iOS static backend switching |