bQUARKz c65362c186
All checks were successful
Intrepid/Prometeu/Runtime/pipeline/head This commit looks good
dev/perf-runtime-telemetry-hot-path (#13)
Reviewed-on: #13
Co-authored-by: bQUARKz <bquarkz@gmail.com>
Co-committed-by: bQUARKz <bquarkz@gmail.com>
2026-04-10 08:32:00 +00:00

2.1 KiB

id ticket title created tags
LSN-0026 perf-runtime-telemetry-hot-path Push-based Telemetry Model 2026-04-10
performance
telemetry
atomics

Push-based Telemetry Model

The PROMETEU telemetry system evolved from an on-demand scan model (pull) to an incremental counter model (push), aiming to minimize the impact on the runtime's hot path.

The Original Problem

Previously, at every host tick, the runtime requested memory usage information from the asset banks. This resulted in:

  • O(n) scans over resource maps.
  • Multiple read lock acquisitions in every tick.
  • Unnecessary overhead on handheld hardware, where every microsecond counts.

The Solution: Push Model with Atomics

The implemented solution uses AtomicUsize in drivers and the VM to maintain the system state in real-time with O(1) read and write cost:

  1. Drivers (Assets): Atomic counters in each BankPolicy are updated during load, commit, and cancel.
  2. VM (Heap): A used_bytes counter in the Heap struct tracks allocations and deallocations (sweep).
  3. System (Logs): The LogService tracks log pressure emitted in each frame.

Two Levels of Observability

To balance performance and debugging, the collection was divided:

  • Frame Snapshot (Always): Automatic capture at the end of each logical frame. Irrelevant cost (O(1)). Serves the Certifier and historical logs.
  • Host Tick (On-Demand): Detailed collection in every tick only occurs if inspection_active is enabled (e.g., F1 Overlay on).

Lessons Learned

  • Trigger Decoupling: We should not use the Certifier state to enable visual debugging features (like the overlay), as they have different purposes and costs.
  • Eventual Consistency is Sufficient: For telemetry metrics, it is not necessary to lock the system to obtain an exact value every nanosecond. Relaxed atomic reading is sufficient and much more performant.
  • Cost Isolation: Moving the aggregation logic to the driver simplifies the runtime and ensures that the telemetry cost is paid only during state mutations, rather than repeatedly during stable execution.