# PR-2 - PBX SYSC and HOSTCALL Loader Patching ## Goal Teach the runtime to load canonical host bindings from PBX, resolve them at load time, validate ABI and capabilities, and rewrite pre-load host calls into final numeric syscalls. This PR assumes PR-1 is already available, so cartridge capabilities are exposed to the loader as internal `CapFlags`. ## Why Prometeu's hardware contract is: - source- and SDK-level host APIs map to canonical identities `(module, name, version)` - load-time resolution maps those identities to numeric syscall ids - runtime execution is numeric-only via `SYSCALL ` The runtime already has the important building blocks: - canonical syscall registry - load-time `resolve_program_syscalls` - verifier support for numeric `SYSCALL ` - VM dispatch by numeric id What is missing is the PBX and loader wiring. ## Scope In scope: - add a mandatory PBX `SYSC` section - extend each `SYSC` entry with ABI shape - add pre-load opcode `HOSTCALL ` - parse `SYSC` at load time - resolve `SYSC` against the host syscall registry - validate declared ABI against authoritative host metadata - validate capabilities using cartridge-granted `CapFlags` - reject unused `SYSC` entries - reject out-of-bounds `HOSTCALL` indices - patch all `HOSTCALL ` to `SYSCALL ` - ensure verifier runs only on the patched image Out of scope: - compiler emission - stdlib SDK pack format - final external tooling for PBX generation - platform policy beyond current manifest-derived granted capabilities ## Architectural Contract ### PBX `SYSC` `SYSC` is a unique, deduplicated, program-wide table of declared host bindings. Each entry carries: - `module` - `name` - `version` - `arg_slots` - `ret_slots` `SYSC` is mandatory for every valid PBX. If the program requires no host bindings, `SYSC` is present with `count = 0`. ### Pre-load callsites The compiler-side artifact form is: ```text HOSTCALL ``` Rules: - `sysc_index` is zero-based into the `SYSC` table - code must not contain final `SYSCALL ` for unresolved host-backed SDK calls - the final executable image given to the VM must contain no `HOSTCALL` ### Load-time responsibilities The loader must: 1. parse `SYSC` 2. resolve each entry to `syscall_id` 3. validate `arg_slots` and `ret_slots` 4. validate capabilities 5. scan code for `HOSTCALL` 6. mark used `SYSC` indices 7. reject out-of-bounds indices 8. reject unused `SYSC` entries 9. rewrite `HOSTCALL ` into `SYSCALL ` 10. ensure no `HOSTCALL` remains 11. only then hand the image to the verifier ## PBX Format Changes ### `prometeu-bytecode` Add a new section kind: - `SYSC` Recommended temporary binary layout: ```text u32 count repeat count times: u16 module_len [module_len bytes UTF-8] u16 name_len [name_len bytes UTF-8] u16 version u16 arg_slots u16 ret_slots ``` Rules: - strings are UTF-8 - duplicate canonical identities are invalid - malformed lengths are invalid - missing `SYSC` is invalid ### Bytecode model changes Extend `BytecodeModule` with a syscall declaration vector. Suggested shape: ```rust pub struct SyscallDecl { pub module: String, pub name: String, pub version: u16, pub arg_slots: u16, pub ret_slots: u16, } ``` Then: ```rust pub syscalls: Vec ``` Serialization/deserialization must include section kind for `SYSC`. ## Opcode Changes ### Add `HOSTCALL` Add a new opcode: - `HOSTCALL` Immediate: - `u32 sysc_index` Recommended behavior: - valid only in pre-load artifact form - not allowed to reach final verifier/VM execution path Required code updates: - opcode enum - decoder - opcode spec - assembler - disassembler Verifier recommendation: - the verifier does not need to support `HOSTCALL` - it should continue to run after loader patching only ## Loader Integration ### Resolution Use the existing canonical resolver in `prometeu-hal::syscalls`. For each `SYSC` entry: 1. resolve `(module, name, version)` 2. obtain `SyscallResolved` 3. compare `arg_slots` 4. compare `ret_slots` 5. compare required capability against granted cartridge flags If any check fails, load fails deterministically. ### Capability source Capabilities come from the already-loaded cartridge manifest via PR-1. This PR must not invent authority from PBX contents. ### Code patching Recommended algorithm: 1. parse code buffer 2. whenever `HOSTCALL ` is found: 3. validate `index < syscalls.len()` 4. mark that `SYSC[index]` is used 5. rewrite opcode/immediate in place or rebuild the code buffer 6. emit final `SYSCALL ` After scan: 1. every `SYSC` entry must be marked used 2. no `HOSTCALL` may remain Either patching strategy is acceptable: - mutate byte buffer in place - rebuild a fresh patched buffer The final `ProgramImage` must contain only numeric `SYSCALL `. ## Verifier Contract No verifier redesign is required. The intended contract is: - loader validates interface compatibility - verifier validates final numeric program structure This means: - loader checks `SYSC`-declared ABI against host metadata - verifier checks stack effects of final `SYSCALL ` using existing runtime metadata This is intentional and not considered harmful duplication. ## Proposed Code Areas ### `prometeu-bytecode` Likely files: - `src/model.rs` - `src/opcode.rs` - `src/opcode_spec.rs` - `src/decoder.rs` - `src/assembler.rs` - `src/disassembler.rs` - `src/lib.rs` ### `prometeu-hal` Likely files: - `src/cartridge_loader.rs` - possibly helper types or resolver glue near syscall loading logic ### `prometeu-vm` Likely files: - load path in `src/virtual_machine.rs` Required behavior: - patch before `Verifier::verify(...)` ## Implementation Plan Implementation should be staged so each phase leaves the workspace in a coherent state and keeps the verifier/VM contract intact. ### Commit Checklist 1. `prometeu-bytecode: add SyscallDecl model and mandatory SYSC section` 2. `prometeu-bytecode: add SYSC parser validation and load errors` 3. `prometeu-bytecode: add phase-1 coverage for empty/valid/invalid SYSC` 4. `prometeu-bytecode: wire downstream constructors to new BytecodeModule.syscalls field` 5. `prometeu-bytecode: clean up naming/docs after phase-1 passes` ### Phase 1 - Extend PBX module format Target crates: - `crates/console/prometeu-bytecode` Steps: 1. add `SyscallDecl` to `src/model.rs` and extend `BytecodeModule` with `syscalls: Vec` 2. reserve a new section kind for `SYSC` in module serialization/deserialization 3. enforce the mandatory-section rule: valid PBS images always carry `SYSC`, including `count = 0` 4. reject malformed payloads, invalid UTF-8, and duplicate canonical identities during load 5. update `src/lib.rs` exports if needed so the VM and HAL can consume the new declarations Checkpoint: - `BytecodeLoader::load(...)` returns `BytecodeModule` with canonical syscall declarations preserved from PBX ### Phase 2 - Add pre-load opcode support Target crates: - `crates/console/prometeu-bytecode` Steps: 1. add `HOSTCALL` to `src/opcode.rs` with a `u32` immediate 2. extend `src/opcode_spec.rs` and `src/decoder.rs` so the loader can scan and decode `HOSTCALL` 3. update `src/assembler.rs` and `src/disassembler.rs` so tests and fixtures can produce/read pre-load artifacts 4. keep the runtime contract explicit: `HOSTCALL` is representable in bytecode artifacts but must not survive loader patching Checkpoint: - bytecode tooling round-trips `HOSTCALL ` correctly, while runtime execution still depends on patched `SYSCALL ` ### Phase 3 - Bridge PBX declarations to host metadata Target crates: - `crates/console/prometeu-hal` Steps: 1. add a resolver path that accepts program-owned syscall declarations instead of only `&'static str` identities 2. resolve each `(module, name, version)` against `src/syscalls.rs` 3. validate `arg_slots` and `ret_slots` against authoritative `SyscallMeta` 4. validate required capabilities against cartridge-derived `CapFlags` 5. return deterministic, load-facing errors for unknown syscalls, ABI mismatches, and missing capabilities Design note: - this phase likely needs a small owned-string adapter or a new helper alongside `resolve_program_syscalls(...)`, because PBX strings are runtime data, not `&'static str` Checkpoint: - given only `BytecodeModule.syscalls` plus granted `CapFlags`, the loader can produce a resolved table `sysc_index -> syscall_id` ### Phase 4 - Patch before verification Target crates: - `crates/console/prometeu-vm` Steps: 1. add a load-time patching helper near `src/virtual_machine.rs` or a small dedicated module 2. run that helper immediately after `BytecodeLoader::load(...)` and before `Verifier::verify(...)` 3. scan `module.code`, decode every instruction, and rewrite `HOSTCALL ` into `SYSCALL ` 4. reject out-of-bounds `HOSTCALL` indices during the scan 5. track `SYSC` usage and reject declarations that are never referenced 6. assert that no `HOSTCALL` remains before handing code to the verifier 7. only then call `Verifier::verify(...)`, compute `max_stack_slots`, and build `ProgramImage::from(module)` Why the sequencing matters: - the current load path in `crates/console/prometeu-vm/src/virtual_machine.rs` verifies the raw module immediately after `BytecodeLoader::load(...)` - `ProgramImage` currently stores only the final ROM/functions/constants and does not preserve a syscall declaration table, so patching must happen while the code is still a `BytecodeModule` Checkpoint: - the verifier sees only numeric `SYSCALL ` instructions, preserving the existing verifier and VM execution model ### Phase 5 - Tests and failure matrix Target crates: - `crates/console/prometeu-bytecode` - `crates/console/prometeu-hal` - `crates/console/prometeu-vm` Steps: 1. add serialization/deserialization tests for missing, empty, valid, malformed, and duplicate `SYSC` 2. add opcode tests for `HOSTCALL` decoding and assembler/disassembler coverage 3. add resolver tests for unknown identity, ABI mismatch, and capability mismatch 4. add VM load-path tests proving patch-before-verify behavior 5. add an assertion that final executable images contain only numeric `SYSCALL` Suggested PR slicing: 1. bytecode format + tests 2. `HOSTCALL` opcode plumbing 3. HAL resolution/ABI validation 4. VM loader patching 5. integration and regression tests ## Deterministic Load Errors Load must fail for at least: 1. missing `SYSC` 2. malformed `SYSC` 3. invalid UTF-8 4. duplicate syscall identities 5. unknown syscall identity 6. ABI mismatch between `SYSC` and host metadata 7. missing capability 8. `HOSTCALL` with out-of-bounds `sysc_index` 9. declared `SYSC` entry unused by all `HOSTCALL`s 10. `HOSTCALL` still present after patch ## Acceptance Criteria - PBX parser supports mandatory `SYSC` - `BytecodeModule` carries syscall declarations - runtime understands `HOSTCALL` - loader resolves `SYSC` entries before verification - loader validates `arg_slots` and `ret_slots` - loader validates capabilities against cartridge flags - loader rewrites `HOSTCALL` to `SYSCALL` - verifier runs only on patched code - final VM execution path remains numeric-only ## Tests Add tests covering: 1. valid PBX with empty `SYSC` and no `HOSTCALL` 2. valid PBX with one syscall and one `HOSTCALL` 3. unknown syscall identity 4. capability mismatch 5. ABI mismatch 6. missing `SYSC` 7. duplicate `SYSC` entries 8. malformed `SYSC` payload 9. `HOSTCALL` index out of bounds 10. unused `SYSC` entry 11. patched image contains only `SYSCALL` Prefer synthetic in-memory PBX images in tests. ## Definition of Done After this PR: - PBX declares canonical host bindings in `SYSC` - pre-load code references those bindings with `HOSTCALL` - the loader resolves and validates them during load - the loader patches executable code to `SYSCALL ` - the verifier and VM continue to operate on numeric syscalls only