# Prometeu Industrial-Grade Refactor Plan (JVM-like) **Language policy:** All implementation notes, code comments, commit messages, PR descriptions, and review discussion **must be in English**. **Reset policy:** This is a **hard reset**. We do **not** keep compatibility with the legacy bytecode/linker/verifier behaviors. No heuristics, no “temporary support”, no string hacks. **North Star:** A JVM-like philosophy: * Control-flow is **method-local** and **canonical**. * The linker resolves **symbols** and **tables**, not intra-function branches. * A **single canonical layout/decoder/spec** is used across compiler/linker/verifier/VM. * Any invalid program fails with clear diagnostics, not panics. --- ## Phase 3 — JVM-like Symbol Identity: Signature-based Overload & Constant-Pool Mindset ### PR-09 (3 pts) — Overload resolution rules (explicit, deterministic) **Briefing** Once overload exists, resolution rules must be explicit. **Target** Implement a deterministic overload resolver based on exact type match (no implicit hacks). **Scope** * Exact-match resolution only (initially). * Clear diagnostic when ambiguous or missing. **Requirements Checklist** * [ ] No best-effort fallback. **Completion Tests** * [ ] Ambiguous call produces a clear diagnostic. * [ ] Missing overload produces a clear diagnostic. --- ## Phase 4 — Eliminate Stringly-Typed Protocols & Debug Hacks ### PR-10 (5 pts) — Replace `origin: Option` and all string protocols with structured enums **Briefing** String prefixes like `svc:` and `@dep:` are fragile and non-industrial. **Target** All origins and external references become typed data. **Scope** * Replace string origins with enums. * Update lowering/collector/output accordingly. **Requirements Checklist** * [ ] No `.starts_with('@')`, `split(':')` protocols. **Completion Tests** * [ ] Grep-based test/lint step fails if forbidden patterns exist. --- ### PR-11 (5 pts) — DebugInfo V1: structured function metadata (no `name@offset+len`) **Briefing** Encoding debug metadata in strings is unacceptable. **Target** Introduce a structured debug info format that stores offset/len as fields. **Scope** * Add `DebugFunctionInfo { func_idx, name, code_offset, code_len }`. * Remove all parsing of `@offset+len`. * Update orchestrator/linker/emit to use structured debug info. **Requirements Checklist** * [ ] No code emits or parses `@offset+len`. **Completion Tests** * [ ] A test that fails if any debug name contains `@` pattern. * [ ] Debug info roundtrip test. --- ## Phase 5 — Hardening: Diagnostics, Error Handling, and Regression Shields ### PR-12 (3 pts) — Replace panics in critical build pipeline with typed errors + diagnostics **Briefing** `unwrap/expect` in compiler/linker transforms user errors into crashes. **Target** Introduce typed errors and surface diagnostics. **Scope** * Replace unwraps in: * symbol resolution * import/export linking * entrypoint selection * Ensure clean error return with context. **Requirements Checklist** * [ ] No panic paths for invalid user programs. **Completion Tests** * [ ] Invalid program produces diagnostics, not panic. --- ### PR-13 (3 pts) — Add regression test suite: link-order invariance + opcode-change immunity **Briefing** We need a system immune to opcode churn. **Target** Add tests that fail if: * linker steps bytes manually * decoder/spec drift exists * link order changes semantics **Scope** * Link-order invariance tests. * Spec coverage tests. * Optional: lightweight “forbidden patterns” tests. **Requirements Checklist** * [ ] Changing an opcode immediate size requires updating only the spec and tests. **Completion Tests** * [ ] All new regression tests pass. --- ## Summary of Estimated Cost (Points) * Phase 1: PR-01 (3) + PR-02 (5) + PR-03 (3) = **11** * Phase 2: PR-04 (5) + PR-05 (3) + PR-06 (3) = **11** * Phase 3: PR-07 (5) + PR-08 (5) + PR-09 (3) = **13** * Phase 4: PR-10 (5) + PR-11 (5) = **10** * Phase 5: PR-12 (3) + PR-13 (3) = **6** **Total: 51 points** > Note: If any PR starts to exceed 5 points in practice, it must be split into smaller PRs. --- ## Non-Negotiables * No compatibility with legacy encodings. * No heuristics. * No string hacks. * One canonical decoder/spec/layout. * Everything in English (including review comments).