8.0 KiB
Prometeu Industrial-Grade Refactor Plan (JVM-like)
Language policy: All implementation notes, code comments, commit messages, PR descriptions, and review discussion must be in English.
Reset policy: This is a hard reset. We do not keep compatibility with the legacy bytecode/linker/verifier behaviors. No heuristics, no “temporary support”, no string hacks.
North Star: A JVM-like philosophy:
- Control-flow is method-local and canonical.
- The linker resolves symbols and tables, not intra-function branches.
- A single canonical layout/decoder/spec is used across compiler/linker/verifier/VM.
- Any invalid program fails with clear diagnostics, not panics.
Phase 2 — Canonical Layout + Verifier Contract (JVM-like Control Flow)
PR-04 (5 pts) — Rewrite layout to compute instruction boundaries via decoder (no heuristics)
Briefing
Layout must be computed canonically using the decoder, not guessed via ad-hoc stepping.
Target
prometeu_bytecode::layout becomes the only authority for:
- function ranges
[start, end) - function length
- valid instruction boundaries
- pc→function lookup
Scope
-
Implement layout computation by scanning bytes with the canonical decoder.
-
Provide APIs:
function_range(func_idx) -> (start, end)function_len(func_idx)is_boundary(func_idx, rel_pc)oris_boundary_abs(abs_pc)lookup_function_by_pc(abs_pc)
Requirements Checklist
- No “clamp_jump_target” or tolerant APIs remain.
- Layout derived only via decoder.
Completion Tests
- Unit tests: boundaries for a known bytecode sequence.
- Fuzz/table tests: random instruction sequences produce monotonic ranges and valid boundaries.
PR-05 (3 pts) — Verifier hard reset: branches are function-relative only
Briefing
The verifier must not guess absolute vs relative. One encoding only.
Target
Branches use immediate = target_rel_to_function_start, with target == func_len allowed.
Scope
-
Replace any dual-format logic.
-
Validation:
target_rel <= func_len- if
target_rel == func_len: OK (end-exclusive) - else target must be an instruction boundary
-
All boundary checks must come from
layout.
Requirements Checklist
- No heuristics.
- Verifier depends only on layout + decoder.
Completion Tests
- JumpToEnd accepted.
- JumpToMidInstruction rejected.
- JumpOutsideFunction rejected.
PR-06 (3 pts) — Linker hard reset: never relocate intra-function branches
Briefing
Linker must not rewrite local control-flow.
Target
Remove any relocation/patching for Jmp/JmpIf*.
Scope
- Delete branch relocation logic.
- Ensure only symbol/table/call relocations remain.
Requirements Checklist
- Linker does not inspect/patch branch immediates.
Completion Tests
- Link-order invariance test (A+B vs B+A) passes for intra-function branches.
Phase 3 — JVM-like Symbol Identity: Signature-based Overload & Constant-Pool Mindset
PR-07 (5 pts) — Introduce Signature interning (SigId) and descriptor canonicalization
Briefing
Overload must be by signature, not by name/arity.
Target
Create a canonical function descriptor system (JVM-like) and intern signatures.
Scope
-
Add
Signaturemodel:- params types + return type
-
Add
SignatureInterner->SigId -
Add
descriptor()canonical representation (stable, deterministic).
Requirements Checklist
SigIdis used as identity in compiler IR.- Descriptor is stable and round-trippable.
Completion Tests
debug(int)->voidanddebug(string)->voidproduce different descriptors.- Descriptor stability tests.
PR-08 (5 pts) — Replace name/arity import/export keys with (name, SigId)
Briefing
name/arity and dedup-by-name break overload and are not industrial.
Target
Rewrite import/export identity:
ExportKey { module_path, base_name, sig }ImportKey { dep, module_path, base_name, sig }
Scope
- Update lowering to stop producing
name/arity. - Update output builder to stop exporting short names and
name/arity. - Update collector to stop dedup-by-name.
Requirements Checklist
- No code constructs or parses
"{name}/{arity}". - Overload is represented as first-class, not a hack.
Completion Tests
- Cross-module overload works.
- Duplicate export of same
(name, sig)fails deterministically.
PR-09 (3 pts) — Overload resolution rules (explicit, deterministic)
Briefing
Once overload exists, resolution rules must be explicit.
Target
Implement a deterministic overload resolver based on exact type match (no implicit hacks).
Scope
- Exact-match resolution only (initially).
- Clear diagnostic when ambiguous or missing.
Requirements Checklist
- No best-effort fallback.
Completion Tests
- Ambiguous call produces a clear diagnostic.
- Missing overload produces a clear diagnostic.
Phase 4 — Eliminate Stringly-Typed Protocols & Debug Hacks
PR-10 (5 pts) — Replace origin: Option<String> and all string protocols with structured enums
Briefing
String prefixes like svc: and @dep: are fragile and non-industrial.
Target
All origins and external references become typed data.
Scope
- Replace string origins with enums.
- Update lowering/collector/output accordingly.
Requirements Checklist
- No
.starts_with('@'),split(':')protocols.
Completion Tests
- Grep-based test/lint step fails if forbidden patterns exist.
PR-11 (5 pts) — DebugInfo V1: structured function metadata (no name@offset+len)
Briefing
Encoding debug metadata in strings is unacceptable.
Target
Introduce a structured debug info format that stores offset/len as fields.
Scope
- Add
DebugFunctionInfo { func_idx, name, code_offset, code_len }. - Remove all parsing of
@offset+len. - Update orchestrator/linker/emit to use structured debug info.
Requirements Checklist
- No code emits or parses
@offset+len.
Completion Tests
- A test that fails if any debug name contains
@pattern. - Debug info roundtrip test.
Phase 5 — Hardening: Diagnostics, Error Handling, and Regression Shields
PR-12 (3 pts) — Replace panics in critical build pipeline with typed errors + diagnostics
Briefing
unwrap/expect in compiler/linker transforms user errors into crashes.
Target
Introduce typed errors and surface diagnostics.
Scope
-
Replace unwraps in:
- symbol resolution
- import/export linking
- entrypoint selection
-
Ensure clean error return with context.
Requirements Checklist
- No panic paths for invalid user programs.
Completion Tests
- Invalid program produces diagnostics, not panic.
PR-13 (3 pts) — Add regression test suite: link-order invariance + opcode-change immunity
Briefing
We need a system immune to opcode churn.
Target
Add tests that fail if:
- linker steps bytes manually
- decoder/spec drift exists
- link order changes semantics
Scope
- Link-order invariance tests.
- Spec coverage tests.
- Optional: lightweight “forbidden patterns” tests.
Requirements Checklist
- Changing an opcode immediate size requires updating only the spec and tests.
Completion Tests
- All new regression tests pass.
Summary of Estimated Cost (Points)
- Phase 1: PR-01 (3) + PR-02 (5) + PR-03 (3) = 11
- Phase 2: PR-04 (5) + PR-05 (3) + PR-06 (3) = 11
- Phase 3: PR-07 (5) + PR-08 (5) + PR-09 (3) = 13
- Phase 4: PR-10 (5) + PR-11 (5) = 10
- Phase 5: PR-12 (3) + PR-13 (3) = 6
Total: 51 points
Note: If any PR starts to exceed 5 points in practice, it must be split into smaller PRs.
Non-Negotiables
- No compatibility with legacy encodings.
- No heuristics.
- No string hacks.
- One canonical decoder/spec/layout.
- Everything in English (including review comments).