prometeu-studio/docs/specs/compiler-languages/pbs/6.2. Host ABI Binding and Loader Resolution Specification.md
2026-04-18 17:01:31 +01:00

492 lines
17 KiB
Markdown

# Host ABI Binding and Loader Resolution Specification
Status: Draft v1 (Temporary)
Applies to: PBX host-binding metadata, cartridge load, canonical syscall resolution, and loader-side capability gating
## 1. Purpose
This document defines the current contract between:
- the PBS compiler/toolchain,
- the PBX program artifact,
- the cartridge loader,
- and the host syscall registry.
Its purpose is to let compiler and runtime work in parallel without diverging on the most critical boundary:
how host-backed SDK calls become load-time-resolved numeric syscalls.
This document is intentionally narrower than a full PBX or VM specification.
It focuses on loader behavior and the minimum artifact contract the loader must consume.
## 2. Scope
This document defines:
- the canonical identity of a host binding,
- the relationship between `declare host` and canonical host bindings,
- the minimum PBX metadata required for load-time host resolution,
- the pre-load opcode form used for host-backed call sites,
- the required loader algorithm,
- the required loader patching step from pre-load host callsites to numeric syscalls,
- the division of responsibility between loader-side ABI checks and verifier-side stack checks,
- capability checks required during load,
- deterministic loader failures for malformed or unauthorized host usage,
- the requirement that VM execution remains numeric-only.
This document does not define:
- the full PBX module format,
- the exact in-memory patching strategy used by the loader,
- the publication format of stdlib SDK packs,
- the semantics of individual subsystems such as GFX or AUDIO.
## 3. Normative Inputs
This document follows two prior decisions:
1. Canonical host identity is defined by the hardware ABI as `(module, name, version)`.
2. User code does not declare host bindings directly; SDK/toolchain-controlled modules expose reserved `declare host` surfaces and the compiler lowers those usages into PBX host-binding metadata.
This document therefore treats:
- `declare host Gfx { fn draw_pixel(...); }`
as compile-time surface only, and treats:
- `("gfx", "draw_pixel", 1)`
as the normative runtime-facing identity.
## 4. Core Model
The host-binding pipeline is:
1. User code imports SDK surfaces such as `Gfx` from `@sdk:gfx` and `Composer` from `@sdk:composer`.
2. The compiler resolves those surfaces against the selected stdlib line.
3. The compiler maps each host-backed SDK member to a canonical identity `(module, name, version)`.
4. The compiler emits those required canonical identities into the PBX `SYSC` table.
5. The compiler emits pre-load host call sites as `HOSTCALL <sysc_index>`.
6. At cartridge load time, the loader resolves `SYSC` entries against the host syscall registry.
7. The loader verifies ABI compatibility and capabilities before execution starts.
8. The loader rewrites every `HOSTCALL <sysc_index>` into `SYSCALL <id>`.
9. The VM executes only numeric syscall identifiers.
The loader never resolves host bindings from source-level names such as `Gfx.draw_pixel`.
It resolves only canonical identities declared in the PBX.
## 5. Canonical Host Binding Identity
Every host binding is identified by the triple:
```text
(module, name, version)
```
Rules:
- `module` is the canonical host subsystem namespace, such as `gfx` or `audio`.
- `name` is the canonical host operation name within that module.
- `version` is the canonical ABI version of that host operation.
- This identity is stable across source languages and SDK naming styles.
- This identity is the only identity used by the loader for host resolution.
Source-level owners such as `Gfx` are SDK-facing aliases.
They are not loader-facing identities.
For the current low-level asset surface:
- the canonical PBS owner name is `LowAssets`,
- the canonical runtime module remains `asset`,
- the canonical capability remains `asset`,
- the canonical load identity is `("asset", "load", 1)`.
## 6. Relationship to `declare host`
`declare host` is a reserved SDK/toolchain surface used to expose host-backed APIs to source programs.
Rules:
- User-authored project modules do not declare canonical host bindings directly.
- `declare host` exists to define compile-time interface shape, not executable PBS bodies.
- The compiler is responsible for mapping each host-backed member to its canonical host identity.
- The PBX must not depend on source spellings such as `Gfx` for runtime resolution.
Example:
```pbs
import { Gfx, color } from @sdk:gfx;
let c: color = color.red;
Gfx.draw_pixel(1, 1, c);
```
The source-level call may originate from an SDK declaration such as:
```pbs
declare host Gfx {
[Host(module = "gfx", name = "draw_pixel", version = 1)]
[Capability(name = "gfx")]
fn draw_pixel(x: int, y: int, c: color);
}
```
but the PBX-facing declaration is canonical, for example:
```text
("gfx", "draw_pixel", 1)
```
The same rule applies to sprite composition through the composer domain.
For example, the PBS-facing declaration:
```pbs
declare host LowComposer {
[Host(module = "composer", name = "emit_sprite", version = 1)]
[Capability(name = "gfx")]
fn emit_sprite(
glyph_id: int,
palette_id: int,
x: int,
y: int,
layer: int,
bank_id: int,
flip_x: bool,
flip_y: bool,
priority: int
) -> int;
}
```
maps to the canonical runtime identity:
```text
("composer", "emit_sprite", 1)
```
The same rule applies to the low-level asset surface.
For example, the PBS-facing declaration:
```pbs
declare host LowAssets {
[Host(module = "asset", name = "load", version = 1)]
[Capability(name = "asset")]
[AssetLowering(param = 0)]
fn load(addressable: Addressable, slot: int) -> (status: int, loading_handle: int);
}
```
maps to the canonical runtime identity:
```text
("asset", "load", 1)
```
The loader MUST NOT derive runtime identity from the source owner spelling `LowAssets`.
At the PBS-facing declaration layer, `AssetLowering(param = 0)` marks the author-facing `Addressable` parameter as the backend-owned symbolic operand that must be rewritten to operational `asset_id` before the final low-level host path is emitted.
## 7. PBX Host Binding Section
### 7.1 Temporary section contract
Until the full PBX host-binding format is stabilized elsewhere, the loader contract uses a dedicated PBX section for declared host bindings.
Temporary section identifier:
- `SYSC`
Meaning:
- the list of canonical host bindings required by the program.
This section is metadata only.
It does not define the executable opcode stream itself.
### 7.2 Temporary binary layout
The current temporary layout is:
```text
u32 count
repeat count times:
u16 module_len
[module_len bytes UTF-8]
u16 name_len
[name_len bytes UTF-8]
u16 version
u16 arg_slots
u16 ret_slots
```
Rules:
- strings are UTF-8,
- `module_len` and `name_len` are byte lengths,
- `arg_slots` is the slot count the compiler expects the host binding to consume,
- `ret_slots` is the slot count the compiler expects the host binding to produce,
- duplicate `(module, name, version)` entries are invalid,
- malformed lengths or invalid UTF-8 are load errors.
The `SYSC` table is:
- unique per program,
- deduplicated by canonical identity,
- ordered by first occurrence in the compiled program.
The loader MUST preserve the table order as emitted.
### 7.3 Presence requirements
The `SYSC` section is mandatory for every valid PBX artifact.
Rules:
- every valid `program.pbx` MUST contain a `SYSC` section,
- the loader MUST inspect `SYSC` before execution begins,
- an empty `SYSC` section is valid and means the program requires no host bindings,
- absence of `SYSC` is a deterministic load error,
- malformed or duplicate `SYSC` declarations are deterministic load errors.
Rationale:
- the Prometeu runtime is always host-aware,
- PBX structural uniformity is preferred over optional host-binding metadata,
- loader simplicity and deterministic testability take priority over omitting an empty section.
## 8. Loader Resolution Algorithm
The loader must perform host-binding resolution before VM execution starts.
Required algorithm:
1. Read the PBX header and section table.
2. Locate and parse the `SYSC` section before launching the VM.
3. Decode the declared canonical identities in source order.
4. Reject duplicate identities.
5. Resolve each identity against the host syscall registry.
6. For each resolved entry, verify that declared `arg_slots` and `ret_slots` match the authoritative host metadata.
7. Verify that the granted capability set covers every resolved binding.
8. Scan the executable bytecode for every `HOSTCALL <sysc_index>` instruction.
9. Reject any `HOSTCALL` whose `sysc_index` is out of bounds.
10. Reject any `SYSC` entry that is never referenced by a `HOSTCALL`.
11. Rewrite every `HOSTCALL <sysc_index>` into `SYSCALL <syscall_id>`.
12. Reject the loaded image if any `HOSTCALL` remains after the rewrite pass.
13. Produce the final executable program image.
14. Abort load deterministically if any step fails.
The loader must not defer canonical-name resolution until the first call site executes.
## 9. Host Registry Resolution
The host maintains a canonical syscall registry:
```text
(module, name, version) -> syscall_id + metadata
```
That metadata includes, at minimum:
- numeric syscall identifier,
- canonical identity,
- argument slot count,
- return slot count,
- required capabilities.
Rules:
- the loader resolves only against this canonical registry,
- unknown identities are load errors,
- successful resolution yields numeric syscall identifiers and associated metadata,
- the resolved metadata is authoritative for capability gating and later runtime verification.
For the current asset runtime surface, the canonical registry entries include:
- `("asset", "load", 1)` with `arg_slots = 2` and `ret_slots = 2`,
- `("asset", "status", 1)` with `arg_slots = 1` and `ret_slots = 1`,
- `("asset", "commit", 1)` with `arg_slots = 1` and `ret_slots = 1`,
- `("asset", "cancel", 1)` with `arg_slots = 1` and `ret_slots = 1`.
## 10. Pre-Load Callsite Form
The compiler emits host-backed call sites in pre-load form as:
```text
HOSTCALL <sysc_index>
```
Rules:
- `sysc_index` is a zero-based index into the program's `SYSC` table,
- every host-backed call site in PBX code must use `HOSTCALL`, not `SYSCALL`,
- `HOSTCALL` exists only in the pre-load artifact form,
- `HOSTCALL` must not remain in the final executable image passed to the VM.
Rationale:
- the compiler does not know final numeric syscall identifiers,
- the loader resolves by canonical identity and materializes the numeric form,
- the verifier and VM can stay numeric-only after the patch step.
## 11. Loader-Side ABI Validation
Load-time ABI validation is mandatory.
Rules:
- every `SYSC` entry declares `arg_slots` and `ret_slots`,
- the loader resolves the entry to authoritative host metadata,
- the loader MUST reject the image if declared `arg_slots` differs from host metadata,
- the loader MUST reject the image if declared `ret_slots` differs from host metadata.
This validation is not a replacement for bytecode verification.
It exists to detect mismatches between:
- the compiler's emitted host-binding contract,
- and the host registry contract actually present in the runtime.
## 12. Capability Gating
Capability gating is mandatory during load.
Rules:
- every resolved host binding declares required capabilities,
- frontend/build host-admission computes deterministic `requiredCapabilities` from host binding metadata for packer assistance,
- the cartridge manifest declares requested capabilities using a nominal capability list,
- the loader derives or receives the granted capability set from the cartridge/platform policy environment,
- if a required capability is not granted, load fails,
- the loader must fail before the program begins execution.
The compiler may infer which capabilities are required by the emitted `SYSC` table, but it does not grant authority.
Authority belongs to the cartridge/platform loading environment.
Runtime capability checks may still exist defensively in the VM, but they do not replace mandatory load-time validation.
## 13. Verifier Interaction
The loader and the verifier validate different layers of the contract.
Loader responsibilities:
- validate canonical host identity resolution,
- validate `SYSC`-declared ABI shape against host metadata,
- validate capabilities,
- patch `HOSTCALL <sysc_index>` into `SYSCALL <id>`.
Verifier responsibilities:
- run on the already-patched executable image,
- validate stack safety and control-flow safety,
- treat `SYSCALL <id>` exactly like any other final numeric instruction,
- derive stack effects from authoritative runtime syscall metadata.
This is intentional and not considered harmful duplication:
- the loader checks interface compatibility,
- the verifier checks structural correctness of the final program.
## 14. Execution Model After Load
After successful load:
- the VM executes numeric syscall identifiers only,
- runtime dispatch must not depend on canonical strings,
- canonical names exist only in PBX metadata and loader-side resolution,
- the runtime-facing execution form remains `SYSCALL <id>`.
This preserves the hardware contract:
- canonical identity is used for declaration and linking,
- numeric identity is used for execution.
## 15. Deterministic Loader Errors
The loader must fail deterministically for at least the following cases:
1. missing `SYSC` section,
2. malformed `SYSC` payload,
3. invalid UTF-8 in canonical names,
4. duplicate `(module, name, version)` declarations,
5. unknown canonical identity,
6. ABI mismatch between `SYSC` declaration and host registry metadata,
7. capability mismatch,
8. `HOSTCALL` references an out-of-bounds `sysc_index`,
9. a declared `SYSC` entry is unused by all `HOSTCALL` instructions,
10. a `HOSTCALL` remains in the executable image after loader patching,
11. any internal inconsistency between parsed host-binding metadata and the loader's host registry contract.
Diagnostics should identify the failing canonical identity whenever available.
## 16. Relationship to `stdlib`
`stdlib` selection is a compile-time concern that determines which SDK surfaces and host-backed declarations the compiler may use.
Rules:
- the loader does not resolve source imports such as `@sdk:gfx` or `@sdk:composer`,
- the loader only consumes canonical host-binding metadata emitted into the PBX,
- `stdlib` affects the loader indirectly through what the compiler emitted,
- the loader must not attempt to reconstruct SDK/module semantics from the PBX.
In other words:
- `stdlib` selects the compile-time SDK contract,
- `SYSC` carries the runtime-facing host-binding contract.
## 17. Relationship to Cartridge Manifest Capabilities
The cartridge manifest is the current declarative source for requested runtime capabilities.
Rules:
- cartridge capability declarations are external to `program.pbx`,
- capability declarations are serializable nominal identifiers such as `gfx` or `audio`,
- the loader converts that nominal declaration into the internal capability flag set used by the runtime,
- the loader MUST NOT infer granted authority solely from the presence of `SYSC` entries.
This keeps authority separate from code generation:
- PBX declares what the program needs,
- cartridge packaging declares what the cartridge requests,
- the loader/runtime environment decides what is granted.
## 18. Current Implementation Direction
This document aligns with the current Prometeu direction:
- SDK/toolchain-controlled `declare host` surfaces define host-backed APIs,
- the compiler emits canonical host-binding declarations into PBX `SYSC`,
- the compiler emits `HOSTCALL <sysc_index>` in pre-load code,
- the loader resolves those declarations at cartridge load,
- the loader patch-rewrites host calls to numeric syscalls,
- the VM continues to execute numeric syscalls only.
## 19. Open Items Deferred
The following remain deferred:
1. final PBX section numbering and chunk registry policy,
2. exact binary opcode allocation for `HOSTCALL`,
3. whether the loader patches bytecode in place or rebuilds a fresh executable image buffer,
4. final cartridge policy model when platform-level grants differ from manifest requests,
5. final chunk numbering and binary compatibility policy for PBX versions that predate mandatory `SYSC`,
6. final integration shape with `ProgramImage` or another loaded-program container.
## 20. Current Decision Summary
The current temporary contract is:
1. Host bindings are identified canonically by `(module, name, version)`.
2. `declare host` is reserved to SDK/toolchain modules and is compile-time surface only.
3. The compiler emits a unique, deduplicated `SYSC` table ordered by first occurrence.
4. Each `SYSC` entry carries `module`, `name`, `version`, `arg_slots`, and `ret_slots`.
5. The compiler emits host-backed call sites as `HOSTCALL <sysc_index>`.
6. The loader parses `SYSC` before execution.
7. The loader resolves canonical identities against the host registry during load.
8. The loader validates `arg_slots` and `ret_slots` against authoritative host metadata.
9. The loader enforces capability gating during load.
10. The loader rejects out-of-bounds and unused `SYSC` references.
11. The loader rewrites every `HOSTCALL <sysc_index>` into `SYSCALL <id>`.
12. The verifier runs after patching and validates the final numeric program.
13. The VM executes numeric syscall identifiers only.