prometeu-runtime/files/Hard Reset.md

# Prometeu Industrial-Grade Refactor Plan (JVM-like)

**Language policy:** All implementation notes, code comments, commit messages, PR descriptions, and review discussion **must be in English**.

**Reset policy:** This is a **hard reset**. We do **not** keep compatibility with the legacy bytecode/linker/verifier behaviors. No heuristics, no “temporary support”, no string hacks.

**North Star:** A JVM-like philosophy:

* Control-flow is **method-local** and **canonical**.
* The linker resolves **symbols** and **tables**, not intra-function branches.
* A **single canonical layout/decoder/spec** is used across compiler/linker/verifier/VM.
* Any invalid program fails with clear diagnostics, not panics.

**Estimation scale:**

* **1 point** = small / straightforward
* **3 points** = medium
* **5 points** = large
* Any work that feels bigger must be broken down into multiple PRs (≤ 5 points each).

---

## Phase 1 — Single Source of Truth: Bytecode Spec + Decoder (Highest ROI)

### PR-01 (3 pts) — Move OpcodeSpec to `prometeu-bytecode` and make it authoritative

**Briefing**

Today opcode metadata (imm sizes, stack effects, branch-ness, terminators) is duplicated and/or inconsistent across crates. This creates a perpetual maintenance nightmare.

**Target**

Create one authoritative opcode spec in **`prometeu-bytecode`** and delete/replace all “local” opcode knowledge.

**Scope**

* Create `prometeu-bytecode::opcode_spec` containing:

    * `imm_bytes`
    * `pops`, `pushes` (stack effect)
    * `is_branch`, `is_terminator`
    * optional: `name`, `category`
* Update callers to import from `prometeu-bytecode`.

**Requirements Checklist**

* [ ] There is exactly one canonical `OpcodeSpec` source.
* [ ] All crates compile against that source.
* [ ] No hardcoded operand sizes remain outside the spec.

**Completion Tests**

* [ ] Unit test enumerating all opcodes validates:

    * every opcode has a spec
    * `imm_bytes` is defined

---

### PR-02 (5 pts) — Introduce canonical decoder in `prometeu-bytecode` and migrate VM to it

**Briefing**

The VM currently has its own decoder. The linker and other tools decode manually. This must be centralized.

**Target**

Add a single canonical decoder in `prometeu-bytecode` that produces typed decoded instructions.

**Scope**

* Add `prometeu-bytecode::decoder`:

    * `decode_next(pc, bytes) -> DecodedInstr`
    * includes: opcode, pc, next_pc, raw immediate bytes slice
    * helpers: `imm_u8/u16/u32/i32/i64/f64` with size validation
* Migrate VM to use `prometeu-bytecode::decoder`.

**Requirements Checklist**

* [ ] VM no longer has a bespoke decoder.
* [ ] No slicing-based immediate parsing in VM core paths.
* [ ] Decoder validates immediate sizes and fails deterministically.

**Completion Tests**

* [ ] Decoder unit tests for representative opcodes with each immediate size.
* [ ] Roundtrip test: encode→decode (table-driven; property test optional).

---

### PR-03 (3 pts) — Delete/neutralize `abi::operand_size` duplication

**Briefing**

`prometeu-bytecode/src/abi.rs` provides partial operand sizing that can drift from the canonical spec.

**Target**

Make all operand sizing derived from the opcode spec.

**Scope**

* Replace `operand_size()` with `OpcodeSpec::imm_bytes`.
* Remove or restrict legacy APIs that leak duplication.

**Requirements Checklist**

* [ ] There is no second operand-size table.

**Completion Tests**

* [ ] Test ensuring `operand_size()` (if retained) matches spec for all opcodes.

---

## Phase 2 — Canonical Layout + Verifier Contract (JVM-like Control Flow)

### PR-04 (5 pts) — Rewrite layout to compute instruction boundaries via decoder (no heuristics)

**Briefing**

Layout must be computed canonically using the decoder, not guessed via ad-hoc stepping.

**Target**

`prometeu_bytecode::layout` becomes the only authority for:

* function ranges `[start, end)`
* function length
* valid instruction boundaries
* pc→function lookup

**Scope**

* Implement layout computation by scanning bytes with the canonical decoder.
* Provide APIs:

    * `function_range(func_idx) -> (start, end)`
    * `function_len(func_idx)`
    * `is_boundary(func_idx, rel_pc)` or `is_boundary_abs(abs_pc)`
    * `lookup_function_by_pc(abs_pc)`

**Requirements Checklist**

* [ ] No “clamp_jump_target” or tolerant APIs remain.
* [ ] Layout derived only via decoder.

**Completion Tests**

* [ ] Unit tests: boundaries for a known bytecode sequence.
* [ ] Fuzz/table tests: random instruction sequences produce monotonic ranges and valid boundaries.

---

### PR-05 (3 pts) — Verifier hard reset: branches are function-relative only

**Briefing**

The verifier must not guess absolute vs relative. One encoding only.

**Target**

Branches use `immediate = target_rel_to_function_start`, with `target == func_len` allowed.

**Scope**

* Replace any dual-format logic.
* Validation:

    * `target_rel <= func_len`
    * if `target_rel == func_len`: OK (end-exclusive)
    * else target must be an instruction boundary
* All boundary checks must come from `layout`.

**Requirements Checklist**

* [ ] No heuristics.
* [ ] Verifier depends only on layout + decoder.

**Completion Tests**

* [ ] JumpToEnd accepted.
* [ ] JumpToMidInstruction rejected.
* [ ] JumpOutsideFunction rejected.

---

### PR-06 (3 pts) — Linker hard reset: never relocate intra-function branches

**Briefing**

Linker must not rewrite local control-flow.

**Target**

Remove any relocation/patching for `Jmp`/`JmpIf*`.

**Scope**

* Delete branch relocation logic.
* Ensure only symbol/table/call relocations remain.

**Requirements Checklist**

* [ ] Linker does not inspect/patch branch immediates.

**Completion Tests**

* [ ] Link-order invariance test (A+B vs B+A) passes for intra-function branches.

---

## Phase 3 — JVM-like Symbol Identity: Signature-based Overload & Constant-Pool Mindset

### PR-07 (5 pts) — Introduce Signature interning (`SigId`) and descriptor canonicalization

**Briefing**

Overload must be by signature, not by `name/arity`.

**Target**

Create a canonical function descriptor system (JVM-like) and intern signatures.

**Scope**

* Add `Signature` model:

    * params types + return type
* Add `SignatureInterner` -> `SigId`
* Add `descriptor()` canonical representation (stable, deterministic).

**Requirements Checklist**

* [ ] `SigId` is used as identity in compiler IR.
* [ ] Descriptor is stable and round-trippable.

**Completion Tests**

* [ ] `debug(int)->void` and `debug(string)->void` produce different descriptors.
* [ ] Descriptor stability tests.

---

### PR-08 (5 pts) — Replace `name/arity` import/export keys with `(name, SigId)`

**Briefing**

`name/arity` and dedup-by-name break overload and are not industrial.

**Target**

Rewrite import/export identity:

* `ExportKey { module_path, base_name, sig }`
* `ImportKey { dep, module_path, base_name, sig }`

**Scope**

* Update lowering to stop producing `name/arity`.
* Update output builder to stop exporting short names and `name/arity`.
* Update collector to stop dedup-by-name.

**Requirements Checklist**

* [ ] No code constructs or parses `"{name}/{arity}"`.
* [ ] Overload is represented as first-class, not a hack.

**Completion Tests**

* [ ] Cross-module overload works.
* [ ] Duplicate export of same `(name, sig)` fails deterministically.

---

### PR-09 (3 pts) — Overload resolution rules (explicit, deterministic)

**Briefing**

Once overload exists, resolution rules must be explicit.

**Target**

Implement a deterministic overload resolver based on exact type match (no implicit hacks).

**Scope**

* Exact-match resolution only (initially).
* Clear diagnostic when ambiguous or missing.

**Requirements Checklist**

* [ ] No best-effort fallback.

**Completion Tests**

* [ ] Ambiguous call produces a clear diagnostic.
* [ ] Missing overload produces a clear diagnostic.

---

## Phase 4 — Eliminate Stringly-Typed Protocols & Debug Hacks

### PR-10 (5 pts) — Replace `origin: Option<String>` and all string protocols with structured enums

**Briefing**

String prefixes like `svc:` and `@dep:` are fragile and non-industrial.

**Target**

All origins and external references become typed data.

**Scope**

* Replace string origins with enums.
* Update lowering/collector/output accordingly.

**Requirements Checklist**

* [ ] No `.starts_with('@')`, `split(':')` protocols.

**Completion Tests**

* [ ] Grep-based test/lint step fails if forbidden patterns exist.

---

### PR-11 (5 pts) — DebugInfo V1: structured function metadata (no `name@offset+len`)

**Briefing**

Encoding debug metadata in strings is unacceptable.

**Target**

Introduce a structured debug info format that stores offset/len as fields.

**Scope**

* Add `DebugFunctionInfo { func_idx, name, code_offset, code_len }`.
* Remove all parsing of `@offset+len`.
* Update orchestrator/linker/emit to use structured debug info.

**Requirements Checklist**

* [ ] No code emits or parses `@offset+len`.

**Completion Tests**

* [ ] A test that fails if any debug name contains `@` pattern.
* [ ] Debug info roundtrip test.

---

## Phase 5 — Hardening: Diagnostics, Error Handling, and Regression Shields

### PR-12 (3 pts) — Replace panics in critical build pipeline with typed errors + diagnostics

**Briefing**

`unwrap/expect` in compiler/linker transforms user errors into crashes.

**Target**

Introduce typed errors and surface diagnostics.

**Scope**

* Replace unwraps in:

    * symbol resolution
    * import/export linking
    * entrypoint selection
* Ensure clean error return with context.

**Requirements Checklist**

* [ ] No panic paths for invalid user programs.

**Completion Tests**

* [ ] Invalid program produces diagnostics, not panic.

---

### PR-13 (3 pts) — Add regression test suite: link-order invariance + opcode-change immunity

**Briefing**

We need a system immune to opcode churn.

**Target**

Add tests that fail if:

* linker steps bytes manually
* decoder/spec drift exists
* link order changes semantics

**Scope**

* Link-order invariance tests.
* Spec coverage tests.
* Optional: lightweight “forbidden patterns” tests.

**Requirements Checklist**

* [ ] Changing an opcode immediate size requires updating only the spec and tests.

**Completion Tests**

* [ ] All new regression tests pass.

---

## Summary of Estimated Cost (Points)

* Phase 1: PR-01 (3) + PR-02 (5) + PR-03 (3) = **11**
* Phase 2: PR-04 (5) + PR-05 (3) + PR-06 (3) = **11**
* Phase 3: PR-07 (5) + PR-08 (5) + PR-09 (3) = **13**
* Phase 4: PR-10 (5) + PR-11 (5) = **10**
* Phase 5: PR-12 (3) + PR-13 (3) = **6**

**Total: 51 points**

> Note: If any PR starts to exceed 5 points in practice, it must be split into smaller PRs.

---

## Non-Negotiables

* No compatibility with legacy encodings.
* No heuristics.
* No string hacks.
* One canonical decoder/spec/layout.
* Everything in English (including review comments).