prometeu-studio/docs/specs/compiler/15. Bytecode and PBX Mapping Specification.md

184 lines
6.9 KiB
Markdown

# PBS Bytecode and PBX Mapping Specification
Status: Draft v1 (Backend Baseline)
Applies to: mapping from lowered/optimized backend programs into PBX sections, bytecode-facing artifacts, and source-to-artifact invariants required by loader/verifier/runtime
## 1. Purpose
This document defines the normative mapping between backend-lowered semantics and emitted PBX/bytecode-facing artifacts.
Its purpose is to keep artifact emission deterministic and compatible with runtime loader/verifier contracts.
## 2. Scope
This document defines:
- artifact-level obligations at `IRVM -> BytecodeModule` emission boundary,
- mapping invariants for function layout, code layout, callsite forms, and host-binding declarations,
- minimum debug/source-attribution hooks for v1 backend/runtime diagnostics and conformance,
- and deterministic artifact rejection expectations for emitter-side failures.
This document does not define:
- full ISA semantics,
- runtime loader patching internals,
- or one mandatory emitter implementation architecture.
## 3. Authority and Precedence
Normative precedence:
1. Runtime authority (`docs/specs/hardware/topics/chapter-2.md`, `chapter-3.md`, `chapter-9.md`, `chapter-12.md`, `chapter-16.md`)
2. Bytecode authority (`docs/specs/bytecode/ISA_CORE.md`)
3. `docs/specs/compiler-languages/pbs/6.1. Intrinsics and Builtin Types Specification.md`
4. `docs/specs/compiler-languages/pbs/6.2. Host ABI Binding and Loader Resolution Specification.md`
5. `20. IRBackend to IRVM Lowering Specification.md`
6. `21. IRVM Optimization Pipeline Specification.md`
7. This document
If a rule here conflicts with higher-precedence authorities, it is invalid.
## 4. Normative Inputs
This document depends on, at minimum:
- `docs/specs/compiler-languages/pbs/6.1. Intrinsics and Builtin Types Specification.md`
- `docs/specs/compiler-languages/pbs/6.2. Host ABI Binding and Loader Resolution Specification.md`
- `20. IRBackend to IRVM Lowering Specification.md`
- `21. IRVM Optimization Pipeline Specification.md`
## 5. Already-Settled Inputs
The following are fixed and must not be contradicted:
- The compiler emits host-binding declarations in PBX `SYSC`.
- Host-backed callsites are emitted in pre-load form as `HOSTCALL <sysc_index>`.
- `SYSC` entries are deduplicated by canonical identity and ordered by first occurrence.
- The loader resolves host bindings and rewrites `HOSTCALL` to `SYSCALL` before execution.
- Raw `SYSCALL` in pre-load artifacts is rejected.
- VM-owned intrinsic artifacts are distinct from `SYSC`, `HOSTCALL`, and `SYSCALL`.
- `SYSC` section is mandatory in valid PBX artifacts (empty section is valid).
## 6. Artifact Mapping Contract (v1)
### 6.1 Required module surfaces
Emitter output must map to a `BytecodeModule` shape containing:
1. `const_pool`,
2. `functions`,
3. `code`,
4. `exports`,
5. `syscalls`,
6. optional `debug_info` that still satisfies v1 minimum debug obligations.
### 6.2 Function ordering and IDs
Function ordering must be deterministic:
1. the published wrapper function index is `0`,
2. function index `0` is owned by the compiler-selected physical wrapper rather than by manifest metadata or nominal export lookup,
3. remaining functions are ordered by `(moduleId -> modulePool canonical key, callable_name, source_start)`,
4. identical admitted input graph yields identical function ordering and function ids.
For PBS executable publication:
- the userland callable marked with `[Frame]` is not itself the physical entrypoint unless it is wrapped by the published synthetic wrapper,
- final `FRAME_RET` belongs to the wrapper path.
### 6.3 Function code layout
Emitter must satisfy:
1. `code_offset` values are unique and monotonic over function order,
2. `code_len` exactly matches emitted bytes for each function body,
3. `code_offset + code_len` stays within `code.len`,
4. and final code concatenation is deterministic.
### 6.4 Instruction encoding
Emitter must satisfy:
1. little-endian encoding,
2. instruction layout `[opcode: u16][immediate]`,
3. jump immediates as `u32` offsets relative to function start,
4. immediate sizes matching selected Core ISA opcode spec.
### 6.5 Host-backed mapping obligations
For host-backed operations:
1. emit canonical declarations in `SYSC` (`module`, `name`, `version`, `arg_slots`, `ret_slots`),
2. deduplicate by canonical identity,
3. order by first occurrence,
4. emit callsites as `HOSTCALL <sysc_index>` only,
5. do not emit raw `SYSCALL` in pre-load artifact form.
### 6.6 VM-owned intrinsic mapping obligations
For VM-owned intrinsic operations:
1. emit VM-owned intrinsic call form (`INTRINSIC <id>`),
2. resolve `<id>` from the canonical ISA-scoped intrinsic registry artifact,
3. keep intrinsic path distinct from host-binding metadata and host call opcodes,
4. and do not emit VM-owned builtin/intrinsic semantics through `SYSC`.
### 6.7 Internal symbolic-to-index mapping
Compilers may use internal symbolic references before final index materialization.
If used, symbolic references must be resolved deterministically to final numeric indices before serialization.
## 7. Minimum Debug Attribution Contract (v1)
For v1 backend/runtime diagnostics and conformance support, emitted artifacts must preserve at minimum:
1. `function_names` entries for all emitted function indices,
2. `pc_to_span` entries for each emitted instruction start PC.
This minimum does not require one universal source-map format.
## 8. Deterministic Emitter Rejection
Emitter-side rejection must be deterministic for malformed or inconsistent artifact candidates, including at minimum:
1. inconsistent function layout bounds,
2. unresolved symbolic references at serialization boundary,
3. illegal pre-load host call form (`SYSCALL` in pre-load image),
4. duplicate `SYSC` canonical identities,
5. and declared host ABI shape mismatch detectable at compile target metadata line.
## 9. Conformance-Facing Baseline
At minimum, artifact conformance checks should assert:
1. canonical `SYSC` declarations for admitted host-backed operations,
2. deterministic `SYSC` dedup/order,
3. pre-load `HOSTCALL` callsites for host-backed paths,
4. no host-binding leakage for VM-owned intrinsic/builtin operations,
5. minimum debug attribution hooks required by v1,
6. deterministic function ordering and code layout invariants.
## 10. Explicit Deferrals
The following remain deferred:
- richer optional debug/source-map formats,
- additional PBX section-level contracts beyond current baseline,
- and profile-specific binary compatibility policy details beyond current v1 baseline.
## 11. Non-Goals
- Repeating full ISA/runtime documentation.
- Mandating one byte-for-byte whole-image golden as sole conformance oracle.
- Defining loader patching internals already owned elsewhere.
## 12. Exit Criteria
This document is healthy when:
1. artifact mapping obligations are explicit and testable,
2. host-backed and VM-owned emission boundaries are explicit,
3. deterministic ordering/layout rules are explicit,
4. and v1 minimum debug/source attribution contract is explicit.