All checks were successful
Intrepid/Prometeu/Runtime/pipeline/head This commit looks good
Reviewed-on: #13 Co-authored-by: bQUARKz <bquarkz@gmail.com> Co-committed-by: bQUARKz <bquarkz@gmail.com>
38 lines
2.1 KiB
Markdown
38 lines
2.1 KiB
Markdown
---
|
|
id: LSN-0026
|
|
ticket: perf-runtime-telemetry-hot-path
|
|
title: Push-based Telemetry Model
|
|
created: 2026-04-10
|
|
tags: [performance, telemetry, atomics]
|
|
---
|
|
|
|
# Push-based Telemetry Model
|
|
|
|
The PROMETEU telemetry system evolved from an on-demand scan model (pull) to an incremental counter model (push), aiming to minimize the impact on the runtime's hot path.
|
|
|
|
## The Original Problem
|
|
|
|
Previously, at every host tick, the runtime requested memory usage information from the asset banks. This resulted in:
|
|
- $O(n)$ scans over resource maps.
|
|
- Multiple read lock acquisitions in every tick.
|
|
- Unnecessary overhead on handheld hardware, where every microsecond counts.
|
|
|
|
## The Solution: Push Model with Atomics
|
|
|
|
The implemented solution uses `AtomicUsize` in drivers and the VM to maintain the system state in real-time with $O(1)$ read and write cost:
|
|
1. **Drivers (Assets):** Atomic counters in each `BankPolicy` are updated during `load`, `commit`, and `cancel`.
|
|
2. **VM (Heap):** A `used_bytes` counter in the `Heap` struct tracks allocations and deallocations (sweep).
|
|
3. **System (Logs):** The `LogService` tracks log pressure emitted in each frame.
|
|
|
|
## Two Levels of Observability
|
|
|
|
To balance performance and debugging, the collection was divided:
|
|
- **Frame Snapshot (Always):** Automatic capture at the end of each logical frame. Irrelevant cost ($O(1)$). Serves the `Certifier` and historical logs.
|
|
- **Host Tick (On-Demand):** Detailed collection in every tick only occurs if `inspection_active` is enabled (e.g., F1 Overlay on).
|
|
|
|
## Lessons Learned
|
|
|
|
- **Trigger Decoupling:** We should not use the `Certifier` state to enable visual debugging features (like the overlay), as they have different purposes and costs.
|
|
- **Eventual Consistency is Sufficient:** For telemetry metrics, it is not necessary to lock the system to obtain an exact value every nanosecond. Relaxed atomic reading is sufficient and much more performant.
|
|
- **Cost Isolation:** Moving the aggregation logic to the driver simplifies the runtime and ensures that the telemetry cost is paid only during state mutations, rather than repeatedly during stable execution.
|