2026-04-17 17:49:18 +01:00

146 lines
8.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

### Prometeu Bytecode — Core ISA
Status: bytecode-level normative
This document defines the stable Core ISA surface for the Prometeu Virtual Machine at the bytecode level. It specifies instruction encoding, the stack evaluation model, and the instruction set currently available in the canonical opcode surface used by encoder, decoder, disassembler, assembler, verifier, and VM execution.
Machine boundary:
- PROMETEU is not "just the VM". It is the broader fantasy console/handheld machine.
- This document covers only the bytecode ISA of the VM subsystem embedded in that machine.
Authority rule:
- This document is normative for bytecode-level encoding and opcode surface.
- Runtime-wide invariants still live in [`../../ARCHITECTURE.md`](../../ARCHITECTURE.md).
- If a bytecode-level rule here conflicts with runtime architecture, the conflict must be resolved explicitly in both documents; neither should drift silently.
#### Encoding Rules
- Endianness: Littleendian.
- Instruction layout: `[opcode: u16][immediate: spec.imm_bytes]`.
- Opcodes are defined in `prometeu_bytecode::isa::core::CoreOpCode`.
- Immediate sizes and stack effects are defined by `CoreOpCode::spec()` returning `CoreOpcodeSpec`.
- All jump immediates are absolute u32 byte offsets from the start of the current function.
#### Stack Machine Model
- The VM is stackbased. Unless noted, operands are taken from the top of the operand stack and results are pushed back.
- Types at the bytecode level are represented by the `Value` enum; the VM may perform numeric promotion where appropriate (e.g., `Int32 + Float -> Float`).
- Stack underflow is a trap (TRAP_STACK_UNDERFLOW).
- Some operations may trap for other reasons (e.g., division by zero, invalid indices, type mismatches).
#### Instruction Set (Core)
- Execution control:
- `NOP` — no effect.
- `HALT` — terminates execution (block terminator).
- `JMP u32` — unconditional absolute jump (block terminator).
- `JMP_IF_FALSE u32` — pops `[bool]`, jumps if false.
- `JMP_IF_TRUE u32` — pops `[bool]`, jumps if true.
- `TRAP` — software trap/breakpoint (block terminator).
- Stack manipulation:
- `PUSH_CONST u32` — load constant by index → _pushes `[value]`.
- `PUSH_I64 i64`, `PUSH_F64 f64`, `PUSH_BOOL u8`, `PUSH_I32 i32` — push literals.
- `POP` — pops 1.
- `POP_N u32` — pops N.
- `DUP``[x] -> [x, x]`.
- `SWAP``[a, b] -> [b, a]`.
- Arithmetic:
- `ADD`, `SUB`, `MUL`, `DIV`, `MOD` — binary numeric ops.
- `NEG` — unary numeric negation.
- Comparison and logic:
- `EQ`, `NEQ`, `LT`, `LTE`, `GT`, `GTE` — comparisons → `[bool]`.
- `AND`, `OR`, `NOT` — boolean logic.
- `BIT_AND`, `BIT_OR`, `BIT_XOR`, `SHL`, `SHR` — integer bit operations.
- Variables:
- `GET_GLOBAL u32`, `SET_GLOBAL u32` — access global slots.
- `GET_LOCAL u32`, `SET_LOCAL u32` — access local slots (current frame).
- Functions and scopes:
- `CALL u32` — call by function index; argument/result arity per function metadata.
- `RET` — return from current function (block terminator).
- `MAKE_CLOSURE u32,u32` — create closure from `(fn_id, capture_count)`.
- `CALL_CLOSURE u32` — invoke closure with `arg_count` user arguments.
- Concurrency:
- `SPAWN u32,u32` — create coroutine for `(fn_id, arg_count)`.
- `YIELD` — request cooperative yield at the next safepoint.
- `SLEEP u32` — request suspension for a logical tick duration.
- System/Timing:
- `HOSTCALL u32` — PBX pre-load host binding call by `SYSC` table index; the loader must resolve and rewrite it before verification or execution.
- `SYSCALL u32` — final numeric platform call in the executable image; raw `SYSCALL` in PBX pre-load artifacts is rejected by the loader.
- `INTRINSIC u32` — final numeric VM-owned intrinsic call.
- `FRAME_SYNC` — yield until the next frame boundary (e.g., vblank); explicit safepoint.
Host service arity is not encoded in the opcode itself. It is defined by resolved syscall metadata.
Example:
- `asset.load` currently resolves with `arg_slots = 2` and `ret_slots = 2`.
- The canonical stack contract is `asset_id, slot -> status, handle`.
- Callers do not provide an explicit asset kind; the runtime derives it from `asset_table`.
- `composer.bind_scene` resolves with `arg_slots = 1` and `ret_slots = 1`.
- The canonical stack contract is `bank_id -> status`.
- `composer.emit_sprite` resolves with `arg_slots = 9` and `ret_slots = 1`.
#### Canonical Intrinsic Registry Artifact
- Final intrinsic IDs and intrinsic stack metadata are published in [`INTRINSICS.csv`](INTRINSICS.csv).
- This CSV is the ISA-scoped artifact intended to be consumed by compiler/tooling consumers such as `../studio`.
- Each row defines one canonical intrinsic identity and its final numeric ID.
- `canonical_name` is the fully qualified intrinsic identity seen by compiler-side intrinsic pools.
- `arg_slots` and `ret_slots` are the real stack effect contract for verifier/lowering consumers.
- `arg_layout` and `ret_layout` use `|`-separated ABI atoms:
- `int`
- `float`
- `bool`
- `builtin:<builtin_name>`
- Rows must remain unique by both `(canonical_name, canonical_version)` and `final_id_hex` / `final_id_dec`.
- Rows must remain deterministically ordered by final ID.
For exact immediates and stack effects, see `CoreOpCode::spec()` which is the single source of truth used by the decoder, disassembler, and verifier.
#### Canonical Decoder Contract
- The canonical decoder is `prometeu_bytecode::decode_next(pc, bytes)`.
- It uses the Core ISA spec to determine immediate size and the canonical `next_pc`.
- Unknown or legacy opcodes must produce a deterministic `UnknownOpcode` error.
#### Module Boundary
- Core ISA lives under `prometeu_bytecode::isa::core` and reexports:
- `CoreOpCode` — the opcode enum of the core profile.
- `CoreOpcodeSpec` and `CoreOpCodeSpecExt` — spec with `imm_bytes`, stack effects, and flags.
- Consumers (encoder/decoder/disasm/verifier) should import from this module to avoid depending on internal layout.
#### Scope Notes
- "Core ISA" in the current repository means the canonical opcode surface implemented by the runtime today.
- It includes closures, coroutines, `HOSTCALL` patching semantics, `INTRINSIC`, and `FRAME_SYNC`.
- It does not, by itself, define higher-level runtime policy such as crash taxonomy, firmware behavior, cartridge lifecycle, or host service organization. Those belong to the canonical runtime architecture and related specs.
#### FRAME_SYNC — Semantics and Placement (Bytecode Level)
- Semantics:
- `FRAME_SYNC` is a zero-operand instruction and does not modify the operand stack.
- It marks a VM safepoint for GC and the cooperative scheduler. In `CoreOpcodeSpec` this is exposed as `spec.is_safepoint == true`.
- On execution, the VM may suspend the current coroutine until the next frame boundary and/or perform GC. After resuming, execution continues at the next instruction.
- Placement rules (representable and checkable):
- `FRAME_SYNC` may appear anywhere inside a function body where normal instructions can appear. It is NOT a block terminator (`spec.is_terminator == false`).
- Instruction boundaries are canonical: encoders/emitters must only place `FRAME_SYNC` at valid instruction PCs. The verifier already enforces “jump-to-boundary” and end-exclusive `[start, end)` function ranges using the canonical layout routine.
- Entrypoints that represent a render/update loop SHOULD ensure at least one reachable `FRAME_SYNC` along every long-running path to provide deterministic safepoints for GC/scheduling. This policy is semantic and may be enforced by higher-level tooling; at the bytecode level it is representable via `spec.is_safepoint` and can be counted by static analyzers.
- Disassembly:
- Disassemblers must print the mnemonic `FRAME_SYNC` verbatim for this opcode.
- Tools MAY optionally annotate it as a safepoint in comments, e.g., `FRAME_SYNC ; safepoint`.
- Verification notes:
- The bytecode verifier treats `FRAME_SYNC` as a normal instruction with no stack effect and no control-flow targets. It is permitted before `RET`, between basic blocks, and as the last instruction of a function. Jumps targeting the function end (`pc == end`) remain valid under the end-exclusive rule.