prometeu-studio/docs/specs/compiler/15. Bytecode and PBX Mapping Specification.md

6.9 KiB

PBS Bytecode and PBX Mapping Specification

Status: Draft v1 (Backend Baseline)
Applies to: mapping from lowered/optimized backend programs into PBX sections, bytecode-facing artifacts, and source-to-artifact invariants required by loader/verifier/runtime

1. Purpose

This document defines the normative mapping between backend-lowered semantics and emitted PBX/bytecode-facing artifacts.

Its purpose is to keep artifact emission deterministic and compatible with runtime loader/verifier contracts.

2. Scope

This document defines:

  • artifact-level obligations at IRVM -> BytecodeModule emission boundary,
  • mapping invariants for function layout, code layout, callsite forms, and host-binding declarations,
  • minimum debug/source-attribution hooks for v1 backend/runtime diagnostics and conformance,
  • and deterministic artifact rejection expectations for emitter-side failures.

This document does not define:

  • full ISA semantics,
  • runtime loader patching internals,
  • or one mandatory emitter implementation architecture.

3. Authority and Precedence

Normative precedence:

  1. Runtime authority (docs/specs/hardware/topics/chapter-2.md, chapter-3.md, chapter-9.md, chapter-12.md, chapter-16.md)
  2. Bytecode authority (docs/specs/bytecode/ISA_CORE.md)
  3. docs/specs/compiler-languages/pbs/6.1. Intrinsics and Builtin Types Specification.md
  4. docs/specs/compiler-languages/pbs/6.2. Host ABI Binding and Loader Resolution Specification.md
  5. 20. IRBackend to IRVM Lowering Specification.md
  6. 21. IRVM Optimization Pipeline Specification.md
  7. This document

If a rule here conflicts with higher-precedence authorities, it is invalid.

4. Normative Inputs

This document depends on, at minimum:

  • docs/specs/compiler-languages/pbs/6.1. Intrinsics and Builtin Types Specification.md
  • docs/specs/compiler-languages/pbs/6.2. Host ABI Binding and Loader Resolution Specification.md
  • 20. IRBackend to IRVM Lowering Specification.md
  • 21. IRVM Optimization Pipeline Specification.md

5. Already-Settled Inputs

The following are fixed and must not be contradicted:

  • The compiler emits host-binding declarations in PBX SYSC.
  • Host-backed callsites are emitted in pre-load form as HOSTCALL <sysc_index>.
  • SYSC entries are deduplicated by canonical identity and ordered by first occurrence.
  • The loader resolves host bindings and rewrites HOSTCALL to SYSCALL before execution.
  • Raw SYSCALL in pre-load artifacts is rejected.
  • VM-owned intrinsic artifacts are distinct from SYSC, HOSTCALL, and SYSCALL.
  • SYSC section is mandatory in valid PBX artifacts (empty section is valid).

6. Artifact Mapping Contract (v1)

6.1 Required module surfaces

Emitter output must map to a BytecodeModule shape containing:

  1. const_pool,
  2. functions,
  3. code,
  4. exports,
  5. syscalls,
  6. optional debug_info that still satisfies v1 minimum debug obligations.

6.2 Function ordering and IDs

Function ordering must be deterministic:

  1. the published wrapper function index is 0,
  2. function index 0 is owned by the compiler-selected physical wrapper rather than by manifest metadata or nominal export lookup,
  3. remaining functions are ordered by (moduleId -> modulePool canonical key, callable_name, source_start),
  4. identical admitted input graph yields identical function ordering and function ids.

For PBS executable publication:

  • the userland callable marked with [Frame] is not itself the physical entrypoint unless it is wrapped by the published synthetic wrapper,
  • final FRAME_RET belongs to the wrapper path.

6.3 Function code layout

Emitter must satisfy:

  1. code_offset values are unique and monotonic over function order,
  2. code_len exactly matches emitted bytes for each function body,
  3. code_offset + code_len stays within code.len,
  4. and final code concatenation is deterministic.

6.4 Instruction encoding

Emitter must satisfy:

  1. little-endian encoding,
  2. instruction layout [opcode: u16][immediate],
  3. jump immediates as u32 offsets relative to function start,
  4. immediate sizes matching selected Core ISA opcode spec.

6.5 Host-backed mapping obligations

For host-backed operations:

  1. emit canonical declarations in SYSC (module, name, version, arg_slots, ret_slots),
  2. deduplicate by canonical identity,
  3. order by first occurrence,
  4. emit callsites as HOSTCALL <sysc_index> only,
  5. do not emit raw SYSCALL in pre-load artifact form.

6.6 VM-owned intrinsic mapping obligations

For VM-owned intrinsic operations:

  1. emit VM-owned intrinsic call form (INTRINSIC <id>),
  2. resolve <id> from the canonical ISA-scoped intrinsic registry artifact,
  3. keep intrinsic path distinct from host-binding metadata and host call opcodes,
  4. and do not emit VM-owned builtin/intrinsic semantics through SYSC.

6.7 Internal symbolic-to-index mapping

Compilers may use internal symbolic references before final index materialization.

If used, symbolic references must be resolved deterministically to final numeric indices before serialization.

7. Minimum Debug Attribution Contract (v1)

For v1 backend/runtime diagnostics and conformance support, emitted artifacts must preserve at minimum:

  1. function_names entries for all emitted function indices,
  2. pc_to_span entries for each emitted instruction start PC.

This minimum does not require one universal source-map format.

8. Deterministic Emitter Rejection

Emitter-side rejection must be deterministic for malformed or inconsistent artifact candidates, including at minimum:

  1. inconsistent function layout bounds,
  2. unresolved symbolic references at serialization boundary,
  3. illegal pre-load host call form (SYSCALL in pre-load image),
  4. duplicate SYSC canonical identities,
  5. and declared host ABI shape mismatch detectable at compile target metadata line.

9. Conformance-Facing Baseline

At minimum, artifact conformance checks should assert:

  1. canonical SYSC declarations for admitted host-backed operations,
  2. deterministic SYSC dedup/order,
  3. pre-load HOSTCALL callsites for host-backed paths,
  4. no host-binding leakage for VM-owned intrinsic/builtin operations,
  5. minimum debug attribution hooks required by v1,
  6. deterministic function ordering and code layout invariants.

10. Explicit Deferrals

The following remain deferred:

  • richer optional debug/source-map formats,
  • additional PBX section-level contracts beyond current baseline,
  • and profile-specific binary compatibility policy details beyond current v1 baseline.

11. Non-Goals

  • Repeating full ISA/runtime documentation.
  • Mandating one byte-for-byte whole-image golden as sole conformance oracle.
  • Defining loader patching internals already owned elsewhere.

12. Exit Criteria

This document is healthy when:

  1. artifact mapping obligations are explicit and testable,
  2. host-backed and VM-owned emission boundaries are explicit,
  3. deterministic ordering/layout rules are explicit,
  4. and v1 minimum debug/source attribution contract is explicit.