prometeu-runtime/discussion/lessons/DSC-0001-runtime-learn-legacy-import/LSN-0016-mental-model-status-first-and-fault-thinking.md

164 lines
4.9 KiB
Markdown

---
id: LSN-0016
ticket: legacy-runtime-learn-import
title: Status-First and Fault Thinking
created: 2026-03-27
tags: [migration, tech-debt]
---
# Status-First and Fault Thinking
Status: pedagogical
Companion specs:
- [`../specs/16a-syscall-policies.md`](../../specs/16a-syscall-policies.md)
- [`../specs/04-gfx-peripheral.md`](../../specs/04-gfx-peripheral.md)
- [`../specs/05-audio-peripheral.md`](../../specs/05-audio-peripheral.md)
- [`../specs/08-save-memory-and-memcard.md`](../../specs/08-save-memory-and-memcard.md)
- [`../specs/15-asset-management.md`](../../specs/15-asset-management.md)
PROMETEU uses a status-first model so the host/runtime boundary does not hide operational errors as silence, improper `Trap`, or accidental `Panic`.
## Core Split
The right mental model is:
- `Trap` for structural contract violations;
- `status` for observable operational results;
- `Panic` for internal invariant breaks.
A short way to think about it:
- the guest called it wrong: `Trap`;
- the guest called it correctly, but the domain could not complete it: `status`;
- the runtime broke internally: `Panic`.
## Why This Exists
Without that separation, host-backed systems tend to degrade into a bad mix of:
- `void` in operations that can really fail;
- implicit fallback;
- silent no-op;
- escalation of app errors into `Panic`.
Status-first exists to make behavior:
- observable;
- deterministic;
- testable;
- documentable per domain.
## Return Shape Rule
The most important practical rule is simple:
- if the operation can fail operationally, it should return `status`;
- if the operation has no real operational failure path, it may be `void`.
That prevents contracts in which the guest cannot distinguish:
- success;
- rejection;
- absence of effect;
- backend unavailable;
- missing asset or resource.
## Silent Failure Is Not Allowed
In PROMETEU, an operational error cannot be disguised as:
- implicit success;
- ignoring the call;
- automatic fallback to another resource;
- `Trap`, when the problem is not structural;
- `Panic`, when the problem is not internal.
If the guest can perceive the difference, that difference should appear as `status`.
## Reading The Boundary
When reading or designing a syscall, use this sequence:
1. Is the call structurally correct?
2. Does the guest have permission/capability to use the surface?
3. Can the domain execute the operation with the current resources?
4. Is there additional payload that only makes sense when `status` indicates success?
That reading tends to produce cleaner contracts, for example:
- `asset.load(...) -> (status, handle)`
- `mem.slot_read(...) -> (status, payload, bytes_read)`
- operations without a real failure path remain `void`
## Domain Intuition
### GFX
In `gfx`, the typical problem is not "the syscall exploded". The typical problem is:
- missing sprite;
- index outside the operational range;
- invalid argument for the operation;
- a call with no real effect.
Those cases call for `status`, not silence.
### Audio
In `audio`, the model prevents things like:
- an invalid voice being ignored;
- a missing sample turning into silent fallback;
- an out-of-range parameter looking like success.
Audio should behave like a finite, explicit peripheral, not like "automatic sound".
### Asset
In `asset`, status-first better separates:
- request errors;
- asynchronous lifecycle;
- invalid commit/cancel;
- unknown handle.
That fits the request + poll + commit model instead of pretending loading is instantaneous.
### FS / MEMCARD
In persistence, status-first prevents the worst category of error: "it seemed to save".
Casos como:
- slot vazio;
- storage cheio;
- conflito;
- corrupcao;
- backend indisponivel;
must appear as observable state because the game and the Hub/OS need to react to them.
## Design Smells
Signs that the contract is still weak:
- the operation can fail, but returns `void`;
- there is implicit fallback to a default resource;
- the documentation depends on "in practice this should not happen";
- `Panic` appears for app input errors;
- the same domain mixes silent no-op and explicit status;
- return payload does not make clear when it is valid.
## Historical Anchors
This guide consolidates the intuition that first appeared in these historical snapshots:
- [`historical-gfx-status-first-fault-and-return-contract.md`](LSN-0006-historical-gfx-status-first-fault-and-return-contract.md)
- [`historical-audio-status-first-fault-and-return-contract.md`](LSN-0003-historical-audio-status-first-fault-and-return-contract.md)
- [`historical-asset-status-first-fault-and-return-contract.md`](LSN-0002-historical-asset-status-first-fault-and-return-contract.md)
- [`historical-game-memcard-slots-surface-and-semantics.md`](LSN-0005-historical-game-memcard-slots-surface-and-semantics.md)
- [`historical-retired-fault-and-input-decisions.md`](LSN-0007-historical-retired-fault-and-input-decisions.md)
Use the snapshots for historical context. Use the specs for the current normative contract.