prometeu-runtime/discussion/lessons/DSC-0008-perf-runtime-telemetry-hot-path/LSN-0026-push-based-telemetry-model.md

---
id: LSN-0026
ticket: perf-runtime-telemetry-hot-path
title: Push-based Telemetry Model
created: 2026-04-10
tags: [performance, telemetry, atomics]
---

# Push-based Telemetry Model

The PROMETEU telemetry system evolved from an on-demand scan model (pull) to an incremental counter model (push), aiming to minimize the impact on the runtime's hot path.

## The Original Problem

Previously, at every host tick, the runtime requested memory usage information from the asset banks. This resulted in:
- $O(n)$ scans over resource maps.
- Multiple read lock acquisitions in every tick.
- Unnecessary overhead on handheld hardware, where every microsecond counts.

## The Solution: Push Model with Atomics

The implemented solution uses `AtomicUsize` in drivers and the VM to maintain the system state in real-time with $O(1)$ read and write cost:
1.  **Drivers (Assets):** Atomic counters in each `BankPolicy` are updated during `load`, `commit`, and `cancel`.
2.  **VM (Heap):** A `used_bytes` counter in the `Heap` struct tracks allocations and deallocations (sweep).
3.  **System (Logs):** The `LogService` tracks log pressure emitted in each frame.

## Two Levels of Observability

To balance performance and debugging, the collection was divided:
- **Frame Snapshot (Always):** Automatic capture at the end of each logical frame. Irrelevant cost ($O(1)$). Serves the `Certifier` and historical logs.
- **Host Tick (On-Demand):** Detailed collection in every tick only occurs if `inspection_active` is enabled (e.g., F1 Overlay on).

## Lessons Learned

- **Trigger Decoupling:** We should not use the `Certifier` state to enable visual debugging features (like the overlay), as they have different purposes and costs.
- **Eventual Consistency is Sufficient:** For telemetry metrics, it is not necessary to lock the system to obtain an exact value every nanosecond. Relaxed atomic reading is sufficient and much more performant.
- **Cost Isolation:** Moving the aggregation logic to the driver simplifies the runtime and ensures that the telemetry cost is paid only during state mutations, rather than repeatedly during stable execution.