427 lines
15 KiB
Markdown
427 lines
15 KiB
Markdown
# Host ABI Binding and Loader Resolution Specification
|
|
|
|
Status: Draft v1 (Temporary)
|
|
Applies to: PBX host-binding metadata, cartridge load, canonical syscall resolution, and loader-side capability gating
|
|
|
|
## 1. Purpose
|
|
|
|
This document defines the current contract between:
|
|
|
|
- the PBS compiler/toolchain,
|
|
- the PBX program artifact,
|
|
- the cartridge loader,
|
|
- and the host syscall registry.
|
|
|
|
Its purpose is to let compiler and runtime work in parallel without diverging on the most critical boundary:
|
|
how host-backed SDK calls become load-time-resolved numeric syscalls.
|
|
|
|
This document is intentionally narrower than a full PBX or VM specification.
|
|
It focuses on loader behavior and the minimum artifact contract the loader must consume.
|
|
|
|
## 2. Scope
|
|
|
|
This document defines:
|
|
|
|
- the canonical identity of a host binding,
|
|
- the relationship between `declare host` and canonical host bindings,
|
|
- the minimum PBX metadata required for load-time host resolution,
|
|
- the pre-load opcode form used for host-backed call sites,
|
|
- the required loader algorithm,
|
|
- the required loader patching step from pre-load host callsites to numeric syscalls,
|
|
- the division of responsibility between loader-side ABI checks and verifier-side stack checks,
|
|
- capability checks required during load,
|
|
- deterministic loader failures for malformed or unauthorized host usage,
|
|
- the requirement that VM execution remains numeric-only.
|
|
|
|
This document does not define:
|
|
|
|
- the full PBX module format,
|
|
- the exact in-memory patching strategy used by the loader,
|
|
- the publication format of stdlib SDK packs,
|
|
- the semantics of individual subsystems such as GFX or AUDIO.
|
|
|
|
## 3. Normative Inputs
|
|
|
|
This document follows two prior decisions:
|
|
|
|
1. Canonical host identity is defined by the hardware ABI as `(module, name, version)`.
|
|
2. User code does not declare host bindings directly; SDK/toolchain-controlled modules expose reserved `declare host` surfaces and the compiler lowers those usages into PBX host-binding metadata.
|
|
|
|
This document therefore treats:
|
|
|
|
- `declare host Gfx { fn draw_pixel(...); }`
|
|
|
|
as compile-time surface only, and treats:
|
|
|
|
- `("gfx", "draw_pixel", 1)`
|
|
|
|
as the normative runtime-facing identity.
|
|
|
|
## 4. Core Model
|
|
|
|
The host-binding pipeline is:
|
|
|
|
1. User code imports SDK surfaces such as `Gfx` from `@sdk:gfx`.
|
|
2. The compiler resolves those surfaces against the selected stdlib line.
|
|
3. The compiler maps each host-backed SDK member to a canonical identity `(module, name, version)`.
|
|
4. The compiler emits those required canonical identities into the PBX `SYSC` table.
|
|
5. The compiler emits pre-load host call sites as `HOSTCALL <sysc_index>`.
|
|
6. At cartridge load time, the loader resolves `SYSC` entries against the host syscall registry.
|
|
7. The loader verifies ABI compatibility and capabilities before execution starts.
|
|
8. The loader rewrites every `HOSTCALL <sysc_index>` into `SYSCALL <id>`.
|
|
9. The VM executes only numeric syscall identifiers.
|
|
|
|
The loader never resolves host bindings from source-level names such as `Gfx.draw_pixel`.
|
|
It resolves only canonical identities declared in the PBX.
|
|
|
|
## 5. Canonical Host Binding Identity
|
|
|
|
Every host binding is identified by the triple:
|
|
|
|
```text
|
|
(module, name, version)
|
|
```
|
|
|
|
Rules:
|
|
|
|
- `module` is the canonical host subsystem namespace, such as `gfx` or `audio`.
|
|
- `name` is the canonical host operation name within that module.
|
|
- `version` is the canonical ABI version of that host operation.
|
|
- This identity is stable across source languages and SDK naming styles.
|
|
- This identity is the only identity used by the loader for host resolution.
|
|
|
|
Source-level owners such as `Gfx` are SDK-facing aliases.
|
|
They are not loader-facing identities.
|
|
|
|
## 6. Relationship to `declare host`
|
|
|
|
`declare host` is a reserved SDK/toolchain surface used to expose host-backed APIs to source programs.
|
|
|
|
Rules:
|
|
|
|
- User-authored project modules do not declare canonical host bindings directly.
|
|
- `declare host` exists to define compile-time interface shape, not executable PBS bodies.
|
|
- The compiler is responsible for mapping each host-backed member to its canonical host identity.
|
|
- The PBX must not depend on source spellings such as `Gfx` for runtime resolution.
|
|
|
|
Example:
|
|
|
|
```pbs
|
|
import { Gfx, color } from @sdk:gfx;
|
|
|
|
let c: color = color.red;
|
|
Gfx.draw_pixel(1, 1, c);
|
|
```
|
|
|
|
The source-level call may originate from an SDK declaration such as:
|
|
|
|
```pbs
|
|
declare host Gfx {
|
|
[Host(module = "gfx", name = "draw_pixel", version = 1)]
|
|
fn draw_pixel(x: int, y: int, c: color);
|
|
}
|
|
```
|
|
|
|
but the PBX-facing declaration is canonical, for example:
|
|
|
|
```text
|
|
("gfx", "draw_pixel", 1)
|
|
```
|
|
|
|
## 7. PBX Host Binding Section
|
|
|
|
### 7.1 Temporary section contract
|
|
|
|
Until the full PBX host-binding format is stabilized elsewhere, the loader contract uses a dedicated PBX section for declared host bindings.
|
|
|
|
Temporary section identifier:
|
|
|
|
- `SYSC`
|
|
|
|
Meaning:
|
|
|
|
- the list of canonical host bindings required by the program.
|
|
|
|
This section is metadata only.
|
|
It does not define the executable opcode stream itself.
|
|
|
|
### 7.2 Temporary binary layout
|
|
|
|
The current temporary layout is:
|
|
|
|
```text
|
|
u32 count
|
|
repeat count times:
|
|
u16 module_len
|
|
[module_len bytes UTF-8]
|
|
u16 name_len
|
|
[name_len bytes UTF-8]
|
|
u16 version
|
|
u16 arg_slots
|
|
u16 ret_slots
|
|
```
|
|
|
|
Rules:
|
|
|
|
- strings are UTF-8,
|
|
- `module_len` and `name_len` are byte lengths,
|
|
- `arg_slots` is the slot count the compiler expects the host binding to consume,
|
|
- `ret_slots` is the slot count the compiler expects the host binding to produce,
|
|
- duplicate `(module, name, version)` entries are invalid,
|
|
- malformed lengths or invalid UTF-8 are load errors.
|
|
|
|
The `SYSC` table is:
|
|
|
|
- unique per program,
|
|
- deduplicated by canonical identity,
|
|
- ordered by first occurrence in the compiled program.
|
|
|
|
The loader MUST preserve the table order as emitted.
|
|
|
|
### 7.3 Presence requirements
|
|
|
|
The `SYSC` section is mandatory for every valid PBX artifact.
|
|
|
|
Rules:
|
|
|
|
- every valid `program.pbx` MUST contain a `SYSC` section,
|
|
- the loader MUST inspect `SYSC` before execution begins,
|
|
- an empty `SYSC` section is valid and means the program requires no host bindings,
|
|
- absence of `SYSC` is a deterministic load error,
|
|
- malformed or duplicate `SYSC` declarations are deterministic load errors.
|
|
|
|
Rationale:
|
|
|
|
- the Prometeu runtime is always host-aware,
|
|
- PBX structural uniformity is preferred over optional host-binding metadata,
|
|
- loader simplicity and deterministic testability take priority over omitting an empty section.
|
|
|
|
## 8. Loader Resolution Algorithm
|
|
|
|
The loader must perform host-binding resolution before VM execution starts.
|
|
|
|
Required algorithm:
|
|
|
|
1. Read the PBX header and section table.
|
|
2. Locate and parse the `SYSC` section before launching the VM.
|
|
3. Decode the declared canonical identities in source order.
|
|
4. Reject duplicate identities.
|
|
5. Resolve each identity against the host syscall registry.
|
|
6. For each resolved entry, verify that declared `arg_slots` and `ret_slots` match the authoritative host metadata.
|
|
7. Verify that the granted capability set covers every resolved binding.
|
|
8. Scan the executable bytecode for every `HOSTCALL <sysc_index>` instruction.
|
|
9. Reject any `HOSTCALL` whose `sysc_index` is out of bounds.
|
|
10. Reject any `SYSC` entry that is never referenced by a `HOSTCALL`.
|
|
11. Rewrite every `HOSTCALL <sysc_index>` into `SYSCALL <syscall_id>`.
|
|
12. Reject the loaded image if any `HOSTCALL` remains after the rewrite pass.
|
|
13. Produce the final executable program image.
|
|
14. Abort load deterministically if any step fails.
|
|
|
|
The loader must not defer canonical-name resolution until the first call site executes.
|
|
|
|
## 9. Host Registry Resolution
|
|
|
|
The host maintains a canonical syscall registry:
|
|
|
|
```text
|
|
(module, name, version) -> syscall_id + metadata
|
|
```
|
|
|
|
That metadata includes, at minimum:
|
|
|
|
- numeric syscall identifier,
|
|
- canonical identity,
|
|
- argument slot count,
|
|
- return slot count,
|
|
- required capabilities.
|
|
|
|
Rules:
|
|
|
|
- the loader resolves only against this canonical registry,
|
|
- unknown identities are load errors,
|
|
- successful resolution yields numeric syscall identifiers and associated metadata,
|
|
- the resolved metadata is authoritative for capability gating and later runtime verification.
|
|
|
|
## 10. Pre-Load Callsite Form
|
|
|
|
The compiler emits host-backed call sites in pre-load form as:
|
|
|
|
```text
|
|
HOSTCALL <sysc_index>
|
|
```
|
|
|
|
Rules:
|
|
|
|
- `sysc_index` is a zero-based index into the program's `SYSC` table,
|
|
- every host-backed call site in PBX code must use `HOSTCALL`, not `SYSCALL`,
|
|
- `HOSTCALL` exists only in the pre-load artifact form,
|
|
- `HOSTCALL` must not remain in the final executable image passed to the VM.
|
|
|
|
Rationale:
|
|
|
|
- the compiler does not know final numeric syscall identifiers,
|
|
- the loader resolves by canonical identity and materializes the numeric form,
|
|
- the verifier and VM can stay numeric-only after the patch step.
|
|
|
|
## 11. Loader-Side ABI Validation
|
|
|
|
Load-time ABI validation is mandatory.
|
|
|
|
Rules:
|
|
|
|
- every `SYSC` entry declares `arg_slots` and `ret_slots`,
|
|
- the loader resolves the entry to authoritative host metadata,
|
|
- the loader MUST reject the image if declared `arg_slots` differs from host metadata,
|
|
- the loader MUST reject the image if declared `ret_slots` differs from host metadata.
|
|
|
|
This validation is not a replacement for bytecode verification.
|
|
It exists to detect mismatches between:
|
|
|
|
- the compiler's emitted host-binding contract,
|
|
- and the host registry contract actually present in the runtime.
|
|
|
|
## 12. Capability Gating
|
|
|
|
Capability gating is mandatory during load.
|
|
|
|
Rules:
|
|
|
|
- every resolved host binding declares required capabilities,
|
|
- the cartridge manifest declares requested capabilities using a nominal capability list,
|
|
- the loader derives or receives the granted capability set from the cartridge/platform policy environment,
|
|
- if a required capability is not granted, load fails,
|
|
- the loader must fail before the program begins execution.
|
|
|
|
The compiler may infer which capabilities are required by the emitted `SYSC` table, but it does not grant authority.
|
|
Authority belongs to the cartridge/platform loading environment.
|
|
|
|
Runtime capability checks may still exist defensively in the VM, but they do not replace mandatory load-time validation.
|
|
|
|
## 13. Verifier Interaction
|
|
|
|
The loader and the verifier validate different layers of the contract.
|
|
|
|
Loader responsibilities:
|
|
|
|
- validate canonical host identity resolution,
|
|
- validate `SYSC`-declared ABI shape against host metadata,
|
|
- validate capabilities,
|
|
- patch `HOSTCALL <sysc_index>` into `SYSCALL <id>`.
|
|
|
|
Verifier responsibilities:
|
|
|
|
- run on the already-patched executable image,
|
|
- validate stack safety and control-flow safety,
|
|
- treat `SYSCALL <id>` exactly like any other final numeric instruction,
|
|
- derive stack effects from authoritative runtime syscall metadata.
|
|
|
|
This is intentional and not considered harmful duplication:
|
|
|
|
- the loader checks interface compatibility,
|
|
- the verifier checks structural correctness of the final program.
|
|
|
|
## 14. Execution Model After Load
|
|
|
|
After successful load:
|
|
|
|
- the VM executes numeric syscall identifiers only,
|
|
- runtime dispatch must not depend on canonical strings,
|
|
- canonical names exist only in PBX metadata and loader-side resolution,
|
|
- the runtime-facing execution form remains `SYSCALL <id>`.
|
|
|
|
This preserves the hardware contract:
|
|
|
|
- canonical identity is used for declaration and linking,
|
|
- numeric identity is used for execution.
|
|
|
|
## 15. Deterministic Loader Errors
|
|
|
|
The loader must fail deterministically for at least the following cases:
|
|
|
|
1. missing `SYSC` section,
|
|
2. malformed `SYSC` payload,
|
|
3. invalid UTF-8 in canonical names,
|
|
4. duplicate `(module, name, version)` declarations,
|
|
5. unknown canonical identity,
|
|
6. ABI mismatch between `SYSC` declaration and host registry metadata,
|
|
7. capability mismatch,
|
|
8. `HOSTCALL` references an out-of-bounds `sysc_index`,
|
|
9. a declared `SYSC` entry is unused by all `HOSTCALL` instructions,
|
|
10. a `HOSTCALL` remains in the executable image after loader patching,
|
|
11. any internal inconsistency between parsed host-binding metadata and the loader's host registry contract.
|
|
|
|
Diagnostics should identify the failing canonical identity whenever available.
|
|
|
|
## 16. Relationship to `stdlib`
|
|
|
|
`stdlib` selection is a compile-time concern that determines which SDK surfaces and host-backed declarations the compiler may use.
|
|
|
|
Rules:
|
|
|
|
- the loader does not resolve source imports such as `@sdk:gfx`,
|
|
- the loader only consumes canonical host-binding metadata emitted into the PBX,
|
|
- `stdlib` affects the loader indirectly through what the compiler emitted,
|
|
- the loader must not attempt to reconstruct SDK/module semantics from the PBX.
|
|
|
|
In other words:
|
|
|
|
- `stdlib` selects the compile-time SDK contract,
|
|
- `SYSC` carries the runtime-facing host-binding contract.
|
|
|
|
## 17. Relationship to Cartridge Manifest Capabilities
|
|
|
|
The cartridge manifest is the current declarative source for requested runtime capabilities.
|
|
|
|
Rules:
|
|
|
|
- cartridge capability declarations are external to `program.pbx`,
|
|
- capability declarations are serializable nominal identifiers such as `gfx` or `audio`,
|
|
- the loader converts that nominal declaration into the internal capability flag set used by the runtime,
|
|
- the loader MUST NOT infer granted authority solely from the presence of `SYSC` entries.
|
|
|
|
This keeps authority separate from code generation:
|
|
|
|
- PBX declares what the program needs,
|
|
- cartridge packaging declares what the cartridge requests,
|
|
- the loader/runtime environment decides what is granted.
|
|
|
|
## 18. Current Implementation Direction
|
|
|
|
This document aligns with the current Prometeu direction:
|
|
|
|
- SDK/toolchain-controlled `declare host` surfaces define host-backed APIs,
|
|
- the compiler emits canonical host-binding declarations into PBX `SYSC`,
|
|
- the compiler emits `HOSTCALL <sysc_index>` in pre-load code,
|
|
- the loader resolves those declarations at cartridge load,
|
|
- the loader patch-rewrites host calls to numeric syscalls,
|
|
- the VM continues to execute numeric syscalls only.
|
|
|
|
## 19. Open Items Deferred
|
|
|
|
The following remain deferred:
|
|
|
|
1. final PBX section numbering and chunk registry policy,
|
|
2. exact binary opcode allocation for `HOSTCALL`,
|
|
3. whether the loader patches bytecode in place or rebuilds a fresh executable image buffer,
|
|
4. final cartridge policy model when platform-level grants differ from manifest requests,
|
|
5. final chunk numbering and binary compatibility policy for PBX versions that predate mandatory `SYSC`,
|
|
6. final integration shape with `ProgramImage` or another loaded-program container.
|
|
|
|
## 20. Current Decision Summary
|
|
|
|
The current temporary contract is:
|
|
|
|
1. Host bindings are identified canonically by `(module, name, version)`.
|
|
2. `declare host` is reserved to SDK/toolchain modules and is compile-time surface only.
|
|
3. The compiler emits a unique, deduplicated `SYSC` table ordered by first occurrence.
|
|
4. Each `SYSC` entry carries `module`, `name`, `version`, `arg_slots`, and `ret_slots`.
|
|
5. The compiler emits host-backed call sites as `HOSTCALL <sysc_index>`.
|
|
6. The loader parses `SYSC` before execution.
|
|
7. The loader resolves canonical identities against the host registry during load.
|
|
8. The loader validates `arg_slots` and `ret_slots` against authoritative host metadata.
|
|
9. The loader enforces capability gating during load.
|
|
10. The loader rejects out-of-bounds and unused `SYSC` references.
|
|
11. The loader rewrites every `HOSTCALL <sysc_index>` into `SYSCALL <id>`.
|
|
12. The verifier runs after patching and validates the final numeric program.
|
|
13. The VM executes numeric syscall identifiers only.
|