443 lines
12 KiB
Markdown
443 lines
12 KiB
Markdown
# PR-2 - PBX SYSC and HOSTCALL Loader Patching
|
|
|
|
## Goal
|
|
|
|
Teach the runtime to load canonical host bindings from PBX, resolve them at load time, validate ABI and capabilities, and rewrite pre-load host calls into final numeric syscalls.
|
|
|
|
This PR assumes PR-1 is already available, so cartridge capabilities are exposed to the loader as internal `CapFlags`.
|
|
|
|
## Why
|
|
|
|
Prometeu's hardware contract is:
|
|
|
|
- source- and SDK-level host APIs map to canonical identities `(module, name, version)`
|
|
- load-time resolution maps those identities to numeric syscall ids
|
|
- runtime execution is numeric-only via `SYSCALL <id>`
|
|
|
|
The runtime already has the important building blocks:
|
|
|
|
- canonical syscall registry
|
|
- load-time `resolve_program_syscalls`
|
|
- verifier support for numeric `SYSCALL <id>`
|
|
- VM dispatch by numeric id
|
|
|
|
What is missing is the PBX and loader wiring.
|
|
|
|
## Scope
|
|
|
|
In scope:
|
|
|
|
- add a mandatory PBX `SYSC` section
|
|
- extend each `SYSC` entry with ABI shape
|
|
- add pre-load opcode `HOSTCALL <sysc_index>`
|
|
- parse `SYSC` at load time
|
|
- resolve `SYSC` against the host syscall registry
|
|
- validate declared ABI against authoritative host metadata
|
|
- validate capabilities using cartridge-granted `CapFlags`
|
|
- reject unused `SYSC` entries
|
|
- reject out-of-bounds `HOSTCALL` indices
|
|
- patch all `HOSTCALL <sysc_index>` to `SYSCALL <id>`
|
|
- ensure verifier runs only on the patched image
|
|
|
|
Out of scope:
|
|
|
|
- compiler emission
|
|
- stdlib SDK pack format
|
|
- final external tooling for PBX generation
|
|
- platform policy beyond current manifest-derived granted capabilities
|
|
|
|
## Architectural Contract
|
|
|
|
### PBX `SYSC`
|
|
|
|
`SYSC` is a unique, deduplicated, program-wide table of declared host bindings.
|
|
|
|
Each entry carries:
|
|
|
|
- `module`
|
|
- `name`
|
|
- `version`
|
|
- `arg_slots`
|
|
- `ret_slots`
|
|
|
|
`SYSC` is mandatory for every valid PBX.
|
|
If the program requires no host bindings, `SYSC` is present with `count = 0`.
|
|
|
|
### Pre-load callsites
|
|
|
|
The compiler-side artifact form is:
|
|
|
|
```text
|
|
HOSTCALL <u32 sysc_index>
|
|
```
|
|
|
|
Rules:
|
|
|
|
- `sysc_index` is zero-based into the `SYSC` table
|
|
- code must not contain final `SYSCALL <id>` for unresolved host-backed SDK calls
|
|
- the final executable image given to the VM must contain no `HOSTCALL`
|
|
|
|
### Load-time responsibilities
|
|
|
|
The loader must:
|
|
|
|
1. parse `SYSC`
|
|
2. resolve each entry to `syscall_id`
|
|
3. validate `arg_slots` and `ret_slots`
|
|
4. validate capabilities
|
|
5. scan code for `HOSTCALL`
|
|
6. mark used `SYSC` indices
|
|
7. reject out-of-bounds indices
|
|
8. reject unused `SYSC` entries
|
|
9. rewrite `HOSTCALL <index>` into `SYSCALL <id>`
|
|
10. ensure no `HOSTCALL` remains
|
|
11. only then hand the image to the verifier
|
|
|
|
## PBX Format Changes
|
|
|
|
### `prometeu-bytecode`
|
|
|
|
Add a new section kind:
|
|
|
|
- `SYSC`
|
|
|
|
Recommended temporary binary layout:
|
|
|
|
```text
|
|
u32 count
|
|
repeat count times:
|
|
u16 module_len
|
|
[module_len bytes UTF-8]
|
|
u16 name_len
|
|
[name_len bytes UTF-8]
|
|
u16 version
|
|
u16 arg_slots
|
|
u16 ret_slots
|
|
```
|
|
|
|
Rules:
|
|
|
|
- strings are UTF-8
|
|
- duplicate canonical identities are invalid
|
|
- malformed lengths are invalid
|
|
- missing `SYSC` is invalid
|
|
|
|
### Bytecode model changes
|
|
|
|
Extend `BytecodeModule` with a syscall declaration vector.
|
|
|
|
Suggested shape:
|
|
|
|
```rust
|
|
pub struct SyscallDecl {
|
|
pub module: String,
|
|
pub name: String,
|
|
pub version: u16,
|
|
pub arg_slots: u16,
|
|
pub ret_slots: u16,
|
|
}
|
|
```
|
|
|
|
Then:
|
|
|
|
```rust
|
|
pub syscalls: Vec<SyscallDecl>
|
|
```
|
|
|
|
Serialization/deserialization must include section kind for `SYSC`.
|
|
|
|
## Opcode Changes
|
|
|
|
### Add `HOSTCALL`
|
|
|
|
Add a new opcode:
|
|
|
|
- `HOSTCALL`
|
|
|
|
Immediate:
|
|
|
|
- `u32 sysc_index`
|
|
|
|
Recommended behavior:
|
|
|
|
- valid only in pre-load artifact form
|
|
- not allowed to reach final verifier/VM execution path
|
|
|
|
Required code updates:
|
|
|
|
- opcode enum
|
|
- decoder
|
|
- opcode spec
|
|
- assembler
|
|
- disassembler
|
|
|
|
Verifier recommendation:
|
|
|
|
- the verifier does not need to support `HOSTCALL`
|
|
- it should continue to run after loader patching only
|
|
|
|
## Loader Integration
|
|
|
|
### Resolution
|
|
|
|
Use the existing canonical resolver in `prometeu-hal::syscalls`.
|
|
|
|
For each `SYSC` entry:
|
|
|
|
1. resolve `(module, name, version)`
|
|
2. obtain `SyscallResolved`
|
|
3. compare `arg_slots`
|
|
4. compare `ret_slots`
|
|
5. compare required capability against granted cartridge flags
|
|
|
|
If any check fails, load fails deterministically.
|
|
|
|
### Capability source
|
|
|
|
Capabilities come from the already-loaded cartridge manifest via PR-1.
|
|
|
|
This PR must not invent authority from PBX contents.
|
|
|
|
### Code patching
|
|
|
|
Recommended algorithm:
|
|
|
|
1. parse code buffer
|
|
2. whenever `HOSTCALL <index>` is found:
|
|
3. validate `index < syscalls.len()`
|
|
4. mark that `SYSC[index]` is used
|
|
5. rewrite opcode/immediate in place or rebuild the code buffer
|
|
6. emit final `SYSCALL <resolved_id>`
|
|
|
|
After scan:
|
|
|
|
1. every `SYSC` entry must be marked used
|
|
2. no `HOSTCALL` may remain
|
|
|
|
Either patching strategy is acceptable:
|
|
|
|
- mutate byte buffer in place
|
|
- rebuild a fresh patched buffer
|
|
|
|
The final `ProgramImage` must contain only numeric `SYSCALL <id>`.
|
|
|
|
## Verifier Contract
|
|
|
|
No verifier redesign is required.
|
|
|
|
The intended contract is:
|
|
|
|
- loader validates interface compatibility
|
|
- verifier validates final numeric program structure
|
|
|
|
This means:
|
|
|
|
- loader checks `SYSC`-declared ABI against host metadata
|
|
- verifier checks stack effects of final `SYSCALL <id>` using existing runtime metadata
|
|
|
|
This is intentional and not considered harmful duplication.
|
|
|
|
## Proposed Code Areas
|
|
|
|
### `prometeu-bytecode`
|
|
|
|
Likely files:
|
|
|
|
- `src/model.rs`
|
|
- `src/opcode.rs`
|
|
- `src/opcode_spec.rs`
|
|
- `src/decoder.rs`
|
|
- `src/assembler.rs`
|
|
- `src/disassembler.rs`
|
|
- `src/lib.rs`
|
|
|
|
### `prometeu-hal`
|
|
|
|
Likely files:
|
|
|
|
- `src/cartridge_loader.rs`
|
|
- possibly helper types or resolver glue near syscall loading logic
|
|
|
|
### `prometeu-vm`
|
|
|
|
Likely files:
|
|
|
|
- load path in `src/virtual_machine.rs`
|
|
|
|
Required behavior:
|
|
|
|
- patch before `Verifier::verify(...)`
|
|
|
|
## Implementation Plan
|
|
|
|
Implementation should be staged so each phase leaves the workspace in a coherent state and keeps the verifier/VM contract intact.
|
|
|
|
### Commit Checklist
|
|
|
|
1. `prometeu-bytecode: add SyscallDecl model and mandatory SYSC section`
|
|
2. `prometeu-bytecode: add SYSC parser validation and load errors`
|
|
3. `prometeu-bytecode: add phase-1 coverage for empty/valid/invalid SYSC`
|
|
4. `prometeu-bytecode: wire downstream constructors to new BytecodeModule.syscalls field`
|
|
5. `prometeu-bytecode: clean up naming/docs after phase-1 passes`
|
|
|
|
### Phase 1 - Extend PBX module format
|
|
|
|
Target crates:
|
|
|
|
- `crates/console/prometeu-bytecode`
|
|
|
|
Steps:
|
|
|
|
1. add `SyscallDecl` to `src/model.rs` and extend `BytecodeModule` with `syscalls: Vec<SyscallDecl>`
|
|
2. reserve a new section kind for `SYSC` in module serialization/deserialization
|
|
3. enforce the mandatory-section rule: valid PBS images always carry `SYSC`, including `count = 0`
|
|
4. reject malformed payloads, invalid UTF-8, and duplicate canonical identities during load
|
|
5. update `src/lib.rs` exports if needed so the VM and HAL can consume the new declarations
|
|
|
|
Checkpoint:
|
|
|
|
- `BytecodeLoader::load(...)` returns `BytecodeModule` with canonical syscall declarations preserved from PBX
|
|
|
|
### Phase 2 - Add pre-load opcode support
|
|
|
|
Target crates:
|
|
|
|
- `crates/console/prometeu-bytecode`
|
|
|
|
Steps:
|
|
|
|
1. add `HOSTCALL` to `src/opcode.rs` with a `u32` immediate
|
|
2. extend `src/opcode_spec.rs` and `src/decoder.rs` so the loader can scan and decode `HOSTCALL`
|
|
3. update `src/assembler.rs` and `src/disassembler.rs` so tests and fixtures can produce/read pre-load artifacts
|
|
4. keep the runtime contract explicit: `HOSTCALL` is representable in bytecode artifacts but must not survive loader patching
|
|
|
|
Checkpoint:
|
|
|
|
- bytecode tooling round-trips `HOSTCALL <index>` correctly, while runtime execution still depends on patched `SYSCALL <id>`
|
|
|
|
### Phase 3 - Bridge PBX declarations to host metadata
|
|
|
|
Target crates:
|
|
|
|
- `crates/console/prometeu-hal`
|
|
|
|
Steps:
|
|
|
|
1. add a resolver path that accepts program-owned syscall declarations instead of only `&'static str` identities
|
|
2. resolve each `(module, name, version)` against `src/syscalls.rs`
|
|
3. validate `arg_slots` and `ret_slots` against authoritative `SyscallMeta`
|
|
4. validate required capabilities against cartridge-derived `CapFlags`
|
|
5. return deterministic, load-facing errors for unknown syscalls, ABI mismatches, and missing capabilities
|
|
|
|
Design note:
|
|
|
|
- this phase likely needs a small owned-string adapter or a new helper alongside `resolve_program_syscalls(...)`, because PBX strings are runtime data, not `&'static str`
|
|
|
|
Checkpoint:
|
|
|
|
- given only `BytecodeModule.syscalls` plus granted `CapFlags`, the loader can produce a resolved table `sysc_index -> syscall_id`
|
|
|
|
### Phase 4 - Patch before verification
|
|
|
|
Target crates:
|
|
|
|
- `crates/console/prometeu-vm`
|
|
|
|
Steps:
|
|
|
|
1. add a load-time patching helper near `src/virtual_machine.rs` or a small dedicated module
|
|
2. run that helper immediately after `BytecodeLoader::load(...)` and before `Verifier::verify(...)`
|
|
3. scan `module.code`, decode every instruction, and rewrite `HOSTCALL <index>` into `SYSCALL <resolved_id>`
|
|
4. reject out-of-bounds `HOSTCALL` indices during the scan
|
|
5. track `SYSC` usage and reject declarations that are never referenced
|
|
6. assert that no `HOSTCALL` remains before handing code to the verifier
|
|
7. only then call `Verifier::verify(...)`, compute `max_stack_slots`, and build `ProgramImage::from(module)`
|
|
|
|
Why the sequencing matters:
|
|
|
|
- the current load path in `crates/console/prometeu-vm/src/virtual_machine.rs` verifies the raw module immediately after `BytecodeLoader::load(...)`
|
|
- `ProgramImage` currently stores only the final ROM/functions/constants and does not preserve a syscall declaration table, so patching must happen while the code is still a `BytecodeModule`
|
|
|
|
Checkpoint:
|
|
|
|
- the verifier sees only numeric `SYSCALL <id>` instructions, preserving the existing verifier and VM execution model
|
|
|
|
### Phase 5 - Tests and failure matrix
|
|
|
|
Target crates:
|
|
|
|
- `crates/console/prometeu-bytecode`
|
|
- `crates/console/prometeu-hal`
|
|
- `crates/console/prometeu-vm`
|
|
|
|
Steps:
|
|
|
|
1. add serialization/deserialization tests for missing, empty, valid, malformed, and duplicate `SYSC`
|
|
2. add opcode tests for `HOSTCALL` decoding and assembler/disassembler coverage
|
|
3. add resolver tests for unknown identity, ABI mismatch, and capability mismatch
|
|
4. add VM load-path tests proving patch-before-verify behavior
|
|
5. add an assertion that final executable images contain only numeric `SYSCALL`
|
|
|
|
Suggested PR slicing:
|
|
|
|
1. bytecode format + tests
|
|
2. `HOSTCALL` opcode plumbing
|
|
3. HAL resolution/ABI validation
|
|
4. VM loader patching
|
|
5. integration and regression tests
|
|
|
|
## Deterministic Load Errors
|
|
|
|
Load must fail for at least:
|
|
|
|
1. missing `SYSC`
|
|
2. malformed `SYSC`
|
|
3. invalid UTF-8
|
|
4. duplicate syscall identities
|
|
5. unknown syscall identity
|
|
6. ABI mismatch between `SYSC` and host metadata
|
|
7. missing capability
|
|
8. `HOSTCALL` with out-of-bounds `sysc_index`
|
|
9. declared `SYSC` entry unused by all `HOSTCALL`s
|
|
10. `HOSTCALL` still present after patch
|
|
|
|
## Acceptance Criteria
|
|
|
|
- PBX parser supports mandatory `SYSC`
|
|
- `BytecodeModule` carries syscall declarations
|
|
- runtime understands `HOSTCALL`
|
|
- loader resolves `SYSC` entries before verification
|
|
- loader validates `arg_slots` and `ret_slots`
|
|
- loader validates capabilities against cartridge flags
|
|
- loader rewrites `HOSTCALL` to `SYSCALL`
|
|
- verifier runs only on patched code
|
|
- final VM execution path remains numeric-only
|
|
|
|
## Tests
|
|
|
|
Add tests covering:
|
|
|
|
1. valid PBX with empty `SYSC` and no `HOSTCALL`
|
|
2. valid PBX with one syscall and one `HOSTCALL`
|
|
3. unknown syscall identity
|
|
4. capability mismatch
|
|
5. ABI mismatch
|
|
6. missing `SYSC`
|
|
7. duplicate `SYSC` entries
|
|
8. malformed `SYSC` payload
|
|
9. `HOSTCALL` index out of bounds
|
|
10. unused `SYSC` entry
|
|
11. patched image contains only `SYSCALL`
|
|
|
|
Prefer synthetic in-memory PBX images in tests.
|
|
|
|
## Definition of Done
|
|
|
|
After this PR:
|
|
|
|
- PBX declares canonical host bindings in `SYSC`
|
|
- pre-load code references those bindings with `HOSTCALL`
|
|
- the loader resolves and validates them during load
|
|
- the loader patches executable code to `SYSCALL <id>`
|
|
- the verifier and VM continue to operate on numeric syscalls only
|