diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 00000000..205ee010 --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,164 @@ +Prometeu Runtime — Architecture (Baseline) + +This document is the concise, authoritative description of the current Prometeu VM baseline after the architectural reset. It reflects the implementation as it exists today — no legacy, no transitional wording. + + +1. Overview +----------- + +- Stack‑based virtual machine + - Operand stack + call frames; bytecode is fetched from a ROM/program image with a separate constant pool. +- GC‑managed heap + - Non‑compacting mark–sweep collector; stable object handles (`HeapRef`) while live. Sweep invalidates unreachable handles; objects are never moved. +- Closures (Model B) + - First‑class closures with a heap‑allocated environment. The closure object is passed to the callee as a hidden `arg0` when invoking a closure. +- Cooperative coroutines + - Deterministic, cooperative scheduling. Switching and GC occur only at explicit safepoints (`FRAME_SYNC`). +- Unified syscall ABI + - Numeric ID dispatch with metadata (`SyscallMeta`). Verifier enforces arity/return‑slot counts; capability gating at runtime. Syscalls are not first‑class values. + + +2. Memory Model +---------------- + +2.1 Stack vs Heap + +- Stack + - Each running context has an operand stack plus call frames (locals, return bookkeeping). Primitive values (integers, floats, booleans) reside on the stack. Heap objects are referenced by opaque `HeapRef` values on the stack. + - The VM’s current operand stack and frames are GC roots. + +- Heap + - The heap stores runtime objects that require identity and reachability tracking. Handles are `HeapRef` indices into an internal object store. + - The collector is mark–sweep, non‑moving: it marks from roots, then reclaims unreachable objects without relocating survivors. Indices for live objects remain stable across collections. + +2.2 Heap Object Kinds (as used today) + +- Arrays of `Value` + - Variable‑length arrays whose elements may contain further `HeapRef`s. +- Closures + - Carry a function identifier and a captured environment (a slice/vector of `Value`s stored with the closure). Captured `HeapRef`s are traversed by the GC. +- Coroutines + - Heap‑resident coroutine records (state + wake time + suspended operand stack and call frames). These act as GC roots when suspended. + +Notes: +- Literals like strings and numbers are sourced from the constant pool in the program image; heap allocation is only used for runtime objects (closures, arrays, coroutine records, and any future heap kinds). The constant pool never embeds raw `HeapRef`s. + +2.3 GC Roots + +- VM roots + - Current operand stack and call frames of the running coroutine (or main context). +- Suspended coroutines + - All heap‑resident, suspended coroutine objects are treated as roots. Their saved stacks/frames are scanned during marking. +- Root traversal + - The VM exposes a root‑visitor that walks the operand stack, frames, and coroutine records to feed the collector. The collector then follows children from each object kind (e.g., array elements, closure environments, coroutine stacks). + + +3. Execution Model +------------------- + +3.1 Interpreter Loop + +- The VM runs a classic fetch–decode–execute loop over the ROM’s bytecode. The current program counter (PC), operand stack, and call frames define execution state. +- Function calls establish new frames; returns restore the caller’s frame and adjust the operand stack to the callee’s declared return slot count (the verifier enforces this shape statically). +- Errors + - Traps (well‑defined fault conditions) surface as trap reasons; panics indicate internal consistency failures. The VM can report logical frame endings such as `FrameSync`, `BudgetExhausted`, `Halted`, end‑of‑ROM, `Breakpoint`, `Trap(code, …)`, and `Panic(msg)`. + +3.2 Safepoints + +- `FRAME_SYNC` is the only safepoint. + - At `FRAME_SYNC`, the VM performs two actions in a well‑defined order: + 1) Garbage‑collection opportunity: root enumeration + mark–sweep. + 2) Scheduler handoff: the currently running coroutine may yield/sleep, and a next ready coroutine is selected deterministically. +- No other opcode constitutes a GC or scheduling safepoint. Syscalls do not implicitly trigger GC or rescheduling. + +3.3 Scheduler Behavior (Cooperative Coroutines) + +- Coroutines are cooperative and scheduled deterministically (FIFO among ready coroutines). +- `YIELD` and `SLEEP` take effect at `FRAME_SYNC`: + - `YIELD` places the current coroutine at the end of the ready queue. + - `SLEEP` parks the current coroutine until its exact `wake_tick`, after which it re‑enters the ready queue at the correct point. +- `SPAWN` creates a new coroutine with its own stack/frames recorded in the heap and enqueues it deterministically. +- No preemption: the VM never interrupts a coroutine between safepoints. + + +4. Verification Model +---------------------- + +4.1 Verifier Responsibilities + +The verifier statically checks bytecode for structural safety and stack‑shape correctness. Representative checks include: + +- Instruction well‑formedness + - Unknown opcode, truncated immediates/opcodes, malformed function boundaries, trailing bytes. +- Control‑flow integrity + - Jump targets within bounds and to instruction boundaries; functions must have proper terminators; path coverage ensures a valid exit. +- Stack discipline + - No underflow/overflow relative to declared max stack; consistent stack height at control‑flow joins; `RET` occurs at the expected height. +- Call/return shape + - Direct calls and returns must match the declared argument counts and return slot counts. Mismatches are rejected. +- Syscalls + - Syscall IDs must exist per `SyscallMeta`. Arity and declared return slot counts must match metadata. Capability checks are enforced at runtime (not by the verifier). +- Closures + - `CALL_CLOSURE` is only allowed on closure values; the callee function must be known; argument counts for closure calls must match. +- Coroutines + - `YIELD` context must be valid; `SPAWN` argument counts are validated. + +4.2 Runtime vs Verifier Guarantees + +- The verifier guarantees structural correctness and stack‑shape invariants. It does not perform full type checking of value contents; dynamic checks (e.g., numeric domain checks, polymorphic comparisons, concrete syscall argument validation) occur at runtime and may trap. +- Capability gating for syscalls is enforced at runtime by the VM/native interface. + + +5. Closures (Model B) — Calling Convention +------------------------------------------- + +- Creation + - `MAKE_CLOSURE` captures N values from the operand stack into a heap‑allocated environment alongside a function identifier. The opcode pushes a `HeapRef` to the new closure. +- Call + - `CALL_CLOSURE` invokes a closure. The closure object itself is supplied to the callee as a hidden `arg0`. User‑visible arguments follow the function’s declared arity. +- Access to captures + - The callee can access captured values via the closure’s environment. Captured `HeapRef`s are traced by the GC. + + +6. Unified Syscall ABI +----------------------- + +- Identification + - Syscalls are addressed by a numeric ID. They are not first‑class values. +- Metadata‑driven + - `SyscallMeta` defines expected arity and return slot counts. The verifier checks IDs/arity/return‑slot counts against this metadata. +- Arguments and returns + - Arguments are taken from the operand stack in the order defined by the ABI. Returns use bounded multi‑slot results via a host‑side return buffer (`HostReturn`) which the VM copies back onto the stack, or zero slots for “void”. A mismatch in result counts is a fault/panic per current hardening logic. +- Capabilities + - Each VM instance has capability flags. Invoking a syscall without the required capability traps. + + +7. Garbage Collection +---------------------- + +- Collector + - Non‑moving mark–sweep. +- Triggers + - GC runs only at `FRAME_SYNC` safepoints. +- Liveness + - Roots comprise: the live VM stack/frames and all suspended coroutines. The collector traverses object‑specific children (array elements, closure environments, coroutine stacks). +- Determinism + - GC opportunities and scheduling order are tied to `FRAME_SYNC`, ensuring repeatable execution traces across runs with the same inputs. + + +8. Non‑Goals +------------- + +- No RC +- No HIP +- No preemption +- No mailbox + + +9. Notes for Contributors +-------------------------- + +- Keep the public surface minimal and metadata‑driven (e.g., syscalls via `SyscallMeta`). +- Do not assume implicit safepoints; schedule and GC only at `FRAME_SYNC`. +- When adding new opcodes or object kinds, extend the verifier and GC traversal accordingly (children enumeration, environment scanning, root sets). +- This document is the canonical reference; update it alongside any architectural change. diff --git a/files/TODOs.md b/files/TODOs.md index 97ef1d55..8d9d09fa 100644 --- a/files/TODOs.md +++ b/files/TODOs.md @@ -1,92 +1,3 @@ -# PR-9 — Final Hardening & Baseline Documentation - -This phase finalizes the new Prometeu VM baseline after the architectural reset (GC, closures, coroutines, unified syscall ABI, deterministic scheduler). - -Goals: - -* Consolidated architecture documentation (short, precise, authoritative) -* Minimal and controlled public surface area -* Removal of temporary feature flags -* Final cleanup: dead code, warnings, outdated docs/examples - -All PRs below are self-contained and must compile independently. - ---- - -# PR-9.1 — Consolidated Architecture Documentation - -## Briefing - -The project must include a short, authoritative architecture document in English describing the new baseline. - -It must reflect the current state only (no legacy, no transitional wording). - -This document becomes the canonical reference for contributors. - -## Target - -Create `ARCHITECTURE.md` at repository root with sections: - -1. Overview - - * Stack-based VM - * GC-managed heap (mark-sweep, non-compacting) - * Closures (Model B, hidden arg0) - * Cooperative coroutines (FRAME_SYNC safepoints) - * Unified syscall ABI - -2. Memory Model - - * Stack vs heap - * Heap object kinds - * GC roots (VM + suspended coroutines) - -3. Execution Model - - * Interpreter loop - * Safepoints - * Scheduler behavior - -4. Verification Model - - * Verifier responsibilities - * Runtime vs verifier guarantees - -5. Non-Goals - - * No RC - * No HIP - * No preemption - * No mailbox - -The document must be concise (no more than ~4–6 pages equivalent). - -## Acceptance Checklist - -* [ ] ARCHITECTURE.md exists. -* [ ] No mention of RC/HIP. -* [ ] Reflects actual implementation. -* [ ] Reviewed for consistency with code. - -## Tests - -Manual review required. - -## Junie Instructions - -You MAY: - -* Create new documentation files. - -You MUST NOT: - -* Invent features not implemented. -* Reference legacy behavior. - -If architecture details are unclear, STOP and request clarification. - ---- - # PR-9.2 — Public Surface Area Minimization ## Briefing