# Prometeu Industrial-Grade Refactor Plan (JVM-like) **Language policy:** All implementation notes, code comments, commit messages, PR descriptions, and review discussion **must be in English**. **Reset policy:** This is a **hard reset**. We do **not** keep compatibility with the legacy bytecode/linker/verifier behaviors. No heuristics, no “temporary support”, no string hacks. **North Star:** A JVM-like philosophy: * Control-flow is **method-local** and **canonical**. * The linker resolves **symbols** and **tables**, not intra-function branches. * A **single canonical layout/decoder/spec** is used across compiler/linker/verifier/VM. * Any invalid program fails with clear diagnostics, not panics. --- ## Phase 2 — Canonical Layout + Verifier Contract (JVM-like Control Flow) ### PR-04 (5 pts) — Rewrite layout to compute instruction boundaries via decoder (no heuristics) **Briefing** Layout must be computed canonically using the decoder, not guessed via ad-hoc stepping. **Target** `prometeu_bytecode::layout` becomes the only authority for: * function ranges `[start, end)` * function length * valid instruction boundaries * pc→function lookup **Scope** * Implement layout computation by scanning bytes with the canonical decoder. * Provide APIs: * `function_range(func_idx) -> (start, end)` * `function_len(func_idx)` * `is_boundary(func_idx, rel_pc)` or `is_boundary_abs(abs_pc)` * `lookup_function_by_pc(abs_pc)` **Requirements Checklist** * [ ] No “clamp_jump_target” or tolerant APIs remain. * [ ] Layout derived only via decoder. **Completion Tests** * [ ] Unit tests: boundaries for a known bytecode sequence. * [ ] Fuzz/table tests: random instruction sequences produce monotonic ranges and valid boundaries. --- ### PR-05 (3 pts) — Verifier hard reset: branches are function-relative only **Briefing** The verifier must not guess absolute vs relative. One encoding only. **Target** Branches use `immediate = target_rel_to_function_start`, with `target == func_len` allowed. **Scope** * Replace any dual-format logic. * Validation: * `target_rel <= func_len` * if `target_rel == func_len`: OK (end-exclusive) * else target must be an instruction boundary * All boundary checks must come from `layout`. **Requirements Checklist** * [ ] No heuristics. * [ ] Verifier depends only on layout + decoder. **Completion Tests** * [ ] JumpToEnd accepted. * [ ] JumpToMidInstruction rejected. * [ ] JumpOutsideFunction rejected. --- ### PR-06 (3 pts) — Linker hard reset: never relocate intra-function branches **Briefing** Linker must not rewrite local control-flow. **Target** Remove any relocation/patching for `Jmp`/`JmpIf*`. **Scope** * Delete branch relocation logic. * Ensure only symbol/table/call relocations remain. **Requirements Checklist** * [ ] Linker does not inspect/patch branch immediates. **Completion Tests** * [ ] Link-order invariance test (A+B vs B+A) passes for intra-function branches. --- ## Phase 3 — JVM-like Symbol Identity: Signature-based Overload & Constant-Pool Mindset ### PR-07 (5 pts) — Introduce Signature interning (`SigId`) and descriptor canonicalization **Briefing** Overload must be by signature, not by `name/arity`. **Target** Create a canonical function descriptor system (JVM-like) and intern signatures. **Scope** * Add `Signature` model: * params types + return type * Add `SignatureInterner` -> `SigId` * Add `descriptor()` canonical representation (stable, deterministic). **Requirements Checklist** * [ ] `SigId` is used as identity in compiler IR. * [ ] Descriptor is stable and round-trippable. **Completion Tests** * [ ] `debug(int)->void` and `debug(string)->void` produce different descriptors. * [ ] Descriptor stability tests. --- ### PR-08 (5 pts) — Replace `name/arity` import/export keys with `(name, SigId)` **Briefing** `name/arity` and dedup-by-name break overload and are not industrial. **Target** Rewrite import/export identity: * `ExportKey { module_path, base_name, sig }` * `ImportKey { dep, module_path, base_name, sig }` **Scope** * Update lowering to stop producing `name/arity`. * Update output builder to stop exporting short names and `name/arity`. * Update collector to stop dedup-by-name. **Requirements Checklist** * [ ] No code constructs or parses `"{name}/{arity}"`. * [ ] Overload is represented as first-class, not a hack. **Completion Tests** * [ ] Cross-module overload works. * [ ] Duplicate export of same `(name, sig)` fails deterministically. --- ### PR-09 (3 pts) — Overload resolution rules (explicit, deterministic) **Briefing** Once overload exists, resolution rules must be explicit. **Target** Implement a deterministic overload resolver based on exact type match (no implicit hacks). **Scope** * Exact-match resolution only (initially). * Clear diagnostic when ambiguous or missing. **Requirements Checklist** * [ ] No best-effort fallback. **Completion Tests** * [ ] Ambiguous call produces a clear diagnostic. * [ ] Missing overload produces a clear diagnostic. --- ## Phase 4 — Eliminate Stringly-Typed Protocols & Debug Hacks ### PR-10 (5 pts) — Replace `origin: Option` and all string protocols with structured enums **Briefing** String prefixes like `svc:` and `@dep:` are fragile and non-industrial. **Target** All origins and external references become typed data. **Scope** * Replace string origins with enums. * Update lowering/collector/output accordingly. **Requirements Checklist** * [ ] No `.starts_with('@')`, `split(':')` protocols. **Completion Tests** * [ ] Grep-based test/lint step fails if forbidden patterns exist. --- ### PR-11 (5 pts) — DebugInfo V1: structured function metadata (no `name@offset+len`) **Briefing** Encoding debug metadata in strings is unacceptable. **Target** Introduce a structured debug info format that stores offset/len as fields. **Scope** * Add `DebugFunctionInfo { func_idx, name, code_offset, code_len }`. * Remove all parsing of `@offset+len`. * Update orchestrator/linker/emit to use structured debug info. **Requirements Checklist** * [ ] No code emits or parses `@offset+len`. **Completion Tests** * [ ] A test that fails if any debug name contains `@` pattern. * [ ] Debug info roundtrip test. --- ## Phase 5 — Hardening: Diagnostics, Error Handling, and Regression Shields ### PR-12 (3 pts) — Replace panics in critical build pipeline with typed errors + diagnostics **Briefing** `unwrap/expect` in compiler/linker transforms user errors into crashes. **Target** Introduce typed errors and surface diagnostics. **Scope** * Replace unwraps in: * symbol resolution * import/export linking * entrypoint selection * Ensure clean error return with context. **Requirements Checklist** * [ ] No panic paths for invalid user programs. **Completion Tests** * [ ] Invalid program produces diagnostics, not panic. --- ### PR-13 (3 pts) — Add regression test suite: link-order invariance + opcode-change immunity **Briefing** We need a system immune to opcode churn. **Target** Add tests that fail if: * linker steps bytes manually * decoder/spec drift exists * link order changes semantics **Scope** * Link-order invariance tests. * Spec coverage tests. * Optional: lightweight “forbidden patterns” tests. **Requirements Checklist** * [ ] Changing an opcode immediate size requires updating only the spec and tests. **Completion Tests** * [ ] All new regression tests pass. --- ## Summary of Estimated Cost (Points) * Phase 1: PR-01 (3) + PR-02 (5) + PR-03 (3) = **11** * Phase 2: PR-04 (5) + PR-05 (3) + PR-06 (3) = **11** * Phase 3: PR-07 (5) + PR-08 (5) + PR-09 (3) = **13** * Phase 4: PR-10 (5) + PR-11 (5) = **10** * Phase 5: PR-12 (3) + PR-13 (3) = **6** **Total: 51 points** > Note: If any PR starts to exceed 5 points in practice, it must be split into smaller PRs. --- ## Non-Negotiables * No compatibility with legacy encodings. * No heuristics. * No string hacks. * One canonical decoder/spec/layout. * Everything in English (including review comments).