2.9 KiB

id ticket title created tags
LSN-0027 perf-host-debug-overlay-isolation Host Debug Overlay Isolation 2026-04-10
performance
host
gfx
telemetry

Host Debug Overlay Isolation

The PROMETEU debug overlay (HUD) was decoupled from the emulated machine pipeline and moved to the Host layer to ensure measurement purity and architectural separation.

The Original Problem

The debug overlay used to be rendered by injecting pixels directly into the emulated GFX pipeline during the logical frame execution. This caused several issues:

  • Performance Distortion: Cycle measurements for certification included the overhead of formatting technical strings and performing extra draw calls.
  • Leaky Abstraction: The emulated machine became aware of Host-only inspection needs.
  • GFX Coupling: The HUD was "burned" into the emulated framebuffer, making it impossible to capture raw game frames without the overlay while technical debugging was active.

The Solution: Host-Side Rendering with Atomic Telemetry

The implemented solution follows a strictly non-intrusive approach:

  1. Atomic Telemetry (Push-based): A new AtomicTelemetry structure was added to the HAL. It uses AtomicU64, AtomicU32, and AtomicUsize to track metrics (Cycles, Memory, Logs) in real-time.
  2. Runtime Decoupling: The VirtualMachineRuntime updates these atomic counters during its tick loop only if inspection_active is enabled. It does not perform any rendering or string formatting.
  3. Host-Side HUD: The HostRunner (in prometeu-host-desktop-winit) now takes a snapshot() of the atomic telemetry and renders the HUD as a native layer after the emulated machine has finished its work for the tick.

Impact and Benefits

  • Zero Machine Overhead: Rendering the HUD consumes Host CPU/GPU cycles but does not affect the emulated machine's cycle counter or logical behavior.
  • Fidelity: The emulated framebuffer remains pure, containing only game pixels.
  • Responsive Telemetry: By using atomics, the Host can read the most recent metrics at any time without waiting for frame boundaries or acquiring heavy read-locks on the runtime state.
  • Platform Agnosticism: Non-desktop hosts (which do not need the overlay) do not pay any implementation cost or performance penalty for the HUD's existence.

Lessons Learned

  • Decouple Data from View: Even for internal debugging tools, keeping the data collection (Runtime) separate from the visualization (Host) is crucial for accurate profiling.
  • Atomic Snapshots are Sufficient: For high-frequency HUD updates, eventual consistency via relaxed atomic loads is more than enough and significantly more performant than synchronizing via Mutexes or logical frame boundaries.
  • Late Composition: Composition of technical layers should always happen at the latest possible stage of the display pipeline to avoid polluting the core simulation state.