From c56256f63eab299da591ae748d96d0929e04ebd5 Mon Sep 17 00:00:00 2001 From: bQUARKz Date: Mon, 9 Feb 2026 22:49:01 +0000 Subject: [PATCH] add hard reset text --- files/Hard Reset.md | 438 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 438 insertions(+) create mode 100644 files/Hard Reset.md diff --git a/files/Hard Reset.md b/files/Hard Reset.md new file mode 100644 index 00000000..3b4ba910 --- /dev/null +++ b/files/Hard Reset.md @@ -0,0 +1,438 @@ +# Prometeu Industrial-Grade Refactor Plan (JVM-like) + +**Language policy:** All implementation notes, code comments, commit messages, PR descriptions, and review discussion **must be in English**. + +**Reset policy:** This is a **hard reset**. We do **not** keep compatibility with the legacy bytecode/linker/verifier behaviors. No heuristics, no “temporary support”, no string hacks. + +**North Star:** A JVM-like philosophy: + +* Control-flow is **method-local** and **canonical**. +* The linker resolves **symbols** and **tables**, not intra-function branches. +* A **single canonical layout/decoder/spec** is used across compiler/linker/verifier/VM. +* Any invalid program fails with clear diagnostics, not panics. + +**Estimation scale:** + +* **1 point** = small / straightforward +* **3 points** = medium +* **5 points** = large +* Any work that feels bigger must be broken down into multiple PRs (≤ 5 points each). + +--- + +## Phase 1 — Single Source of Truth: Bytecode Spec + Decoder (Highest ROI) + +### PR-01 (3 pts) — Move OpcodeSpec to `prometeu-bytecode` and make it authoritative + +**Briefing** + +Today opcode metadata (imm sizes, stack effects, branch-ness, terminators) is duplicated and/or inconsistent across crates. This creates a perpetual maintenance nightmare. + +**Target** + +Create one authoritative opcode spec in **`prometeu-bytecode`** and delete/replace all “local” opcode knowledge. + +**Scope** + +* Create `prometeu-bytecode::opcode_spec` containing: + + * `imm_bytes` + * `pops`, `pushes` (stack effect) + * `is_branch`, `is_terminator` + * optional: `name`, `category` +* Update callers to import from `prometeu-bytecode`. + +**Requirements Checklist** + +* [ ] There is exactly one canonical `OpcodeSpec` source. +* [ ] All crates compile against that source. +* [ ] No hardcoded operand sizes remain outside the spec. + +**Completion Tests** + +* [ ] Unit test enumerating all opcodes validates: + + * every opcode has a spec + * `imm_bytes` is defined + +--- + +### PR-02 (5 pts) — Introduce canonical decoder in `prometeu-bytecode` and migrate VM to it + +**Briefing** + +The VM currently has its own decoder. The linker and other tools decode manually. This must be centralized. + +**Target** + +Add a single canonical decoder in `prometeu-bytecode` that produces typed decoded instructions. + +**Scope** + +* Add `prometeu-bytecode::decoder`: + + * `decode_next(pc, bytes) -> DecodedInstr` + * includes: opcode, pc, next_pc, raw immediate bytes slice + * helpers: `imm_u8/u16/u32/i32/i64/f64` with size validation +* Migrate VM to use `prometeu-bytecode::decoder`. + +**Requirements Checklist** + +* [ ] VM no longer has a bespoke decoder. +* [ ] No slicing-based immediate parsing in VM core paths. +* [ ] Decoder validates immediate sizes and fails deterministically. + +**Completion Tests** + +* [ ] Decoder unit tests for representative opcodes with each immediate size. +* [ ] Roundtrip test: encode→decode (table-driven; property test optional). + +--- + +### PR-03 (3 pts) — Delete/neutralize `abi::operand_size` duplication + +**Briefing** + +`prometeu-bytecode/src/abi.rs` provides partial operand sizing that can drift from the canonical spec. + +**Target** + +Make all operand sizing derived from the opcode spec. + +**Scope** + +* Replace `operand_size()` with `OpcodeSpec::imm_bytes`. +* Remove or restrict legacy APIs that leak duplication. + +**Requirements Checklist** + +* [ ] There is no second operand-size table. + +**Completion Tests** + +* [ ] Test ensuring `operand_size()` (if retained) matches spec for all opcodes. + +--- + +## Phase 2 — Canonical Layout + Verifier Contract (JVM-like Control Flow) + +### PR-04 (5 pts) — Rewrite layout to compute instruction boundaries via decoder (no heuristics) + +**Briefing** + +Layout must be computed canonically using the decoder, not guessed via ad-hoc stepping. + +**Target** + +`prometeu_bytecode::layout` becomes the only authority for: + +* function ranges `[start, end)` +* function length +* valid instruction boundaries +* pc→function lookup + +**Scope** + +* Implement layout computation by scanning bytes with the canonical decoder. +* Provide APIs: + + * `function_range(func_idx) -> (start, end)` + * `function_len(func_idx)` + * `is_boundary(func_idx, rel_pc)` or `is_boundary_abs(abs_pc)` + * `lookup_function_by_pc(abs_pc)` + +**Requirements Checklist** + +* [ ] No “clamp_jump_target” or tolerant APIs remain. +* [ ] Layout derived only via decoder. + +**Completion Tests** + +* [ ] Unit tests: boundaries for a known bytecode sequence. +* [ ] Fuzz/table tests: random instruction sequences produce monotonic ranges and valid boundaries. + +--- + +### PR-05 (3 pts) — Verifier hard reset: branches are function-relative only + +**Briefing** + +The verifier must not guess absolute vs relative. One encoding only. + +**Target** + +Branches use `immediate = target_rel_to_function_start`, with `target == func_len` allowed. + +**Scope** + +* Replace any dual-format logic. +* Validation: + + * `target_rel <= func_len` + * if `target_rel == func_len`: OK (end-exclusive) + * else target must be an instruction boundary +* All boundary checks must come from `layout`. + +**Requirements Checklist** + +* [ ] No heuristics. +* [ ] Verifier depends only on layout + decoder. + +**Completion Tests** + +* [ ] JumpToEnd accepted. +* [ ] JumpToMidInstruction rejected. +* [ ] JumpOutsideFunction rejected. + +--- + +### PR-06 (3 pts) — Linker hard reset: never relocate intra-function branches + +**Briefing** + +Linker must not rewrite local control-flow. + +**Target** + +Remove any relocation/patching for `Jmp`/`JmpIf*`. + +**Scope** + +* Delete branch relocation logic. +* Ensure only symbol/table/call relocations remain. + +**Requirements Checklist** + +* [ ] Linker does not inspect/patch branch immediates. + +**Completion Tests** + +* [ ] Link-order invariance test (A+B vs B+A) passes for intra-function branches. + +--- + +## Phase 3 — JVM-like Symbol Identity: Signature-based Overload & Constant-Pool Mindset + +### PR-07 (5 pts) — Introduce Signature interning (`SigId`) and descriptor canonicalization + +**Briefing** + +Overload must be by signature, not by `name/arity`. + +**Target** + +Create a canonical function descriptor system (JVM-like) and intern signatures. + +**Scope** + +* Add `Signature` model: + + * params types + return type +* Add `SignatureInterner` -> `SigId` +* Add `descriptor()` canonical representation (stable, deterministic). + +**Requirements Checklist** + +* [ ] `SigId` is used as identity in compiler IR. +* [ ] Descriptor is stable and round-trippable. + +**Completion Tests** + +* [ ] `debug(int)->void` and `debug(string)->void` produce different descriptors. +* [ ] Descriptor stability tests. + +--- + +### PR-08 (5 pts) — Replace `name/arity` import/export keys with `(name, SigId)` + +**Briefing** + +`name/arity` and dedup-by-name break overload and are not industrial. + +**Target** + +Rewrite import/export identity: + +* `ExportKey { module_path, base_name, sig }` +* `ImportKey { dep, module_path, base_name, sig }` + +**Scope** + +* Update lowering to stop producing `name/arity`. +* Update output builder to stop exporting short names and `name/arity`. +* Update collector to stop dedup-by-name. + +**Requirements Checklist** + +* [ ] No code constructs or parses `"{name}/{arity}"`. +* [ ] Overload is represented as first-class, not a hack. + +**Completion Tests** + +* [ ] Cross-module overload works. +* [ ] Duplicate export of same `(name, sig)` fails deterministically. + +--- + +### PR-09 (3 pts) — Overload resolution rules (explicit, deterministic) + +**Briefing** + +Once overload exists, resolution rules must be explicit. + +**Target** + +Implement a deterministic overload resolver based on exact type match (no implicit hacks). + +**Scope** + +* Exact-match resolution only (initially). +* Clear diagnostic when ambiguous or missing. + +**Requirements Checklist** + +* [ ] No best-effort fallback. + +**Completion Tests** + +* [ ] Ambiguous call produces a clear diagnostic. +* [ ] Missing overload produces a clear diagnostic. + +--- + +## Phase 4 — Eliminate Stringly-Typed Protocols & Debug Hacks + +### PR-10 (5 pts) — Replace `origin: Option` and all string protocols with structured enums + +**Briefing** + +String prefixes like `svc:` and `@dep:` are fragile and non-industrial. + +**Target** + +All origins and external references become typed data. + +**Scope** + +* Replace string origins with enums. +* Update lowering/collector/output accordingly. + +**Requirements Checklist** + +* [ ] No `.starts_with('@')`, `split(':')` protocols. + +**Completion Tests** + +* [ ] Grep-based test/lint step fails if forbidden patterns exist. + +--- + +### PR-11 (5 pts) — DebugInfo V1: structured function metadata (no `name@offset+len`) + +**Briefing** + +Encoding debug metadata in strings is unacceptable. + +**Target** + +Introduce a structured debug info format that stores offset/len as fields. + +**Scope** + +* Add `DebugFunctionInfo { func_idx, name, code_offset, code_len }`. +* Remove all parsing of `@offset+len`. +* Update orchestrator/linker/emit to use structured debug info. + +**Requirements Checklist** + +* [ ] No code emits or parses `@offset+len`. + +**Completion Tests** + +* [ ] A test that fails if any debug name contains `@` pattern. +* [ ] Debug info roundtrip test. + +--- + +## Phase 5 — Hardening: Diagnostics, Error Handling, and Regression Shields + +### PR-12 (3 pts) — Replace panics in critical build pipeline with typed errors + diagnostics + +**Briefing** + +`unwrap/expect` in compiler/linker transforms user errors into crashes. + +**Target** + +Introduce typed errors and surface diagnostics. + +**Scope** + +* Replace unwraps in: + + * symbol resolution + * import/export linking + * entrypoint selection +* Ensure clean error return with context. + +**Requirements Checklist** + +* [ ] No panic paths for invalid user programs. + +**Completion Tests** + +* [ ] Invalid program produces diagnostics, not panic. + +--- + +### PR-13 (3 pts) — Add regression test suite: link-order invariance + opcode-change immunity + +**Briefing** + +We need a system immune to opcode churn. + +**Target** + +Add tests that fail if: + +* linker steps bytes manually +* decoder/spec drift exists +* link order changes semantics + +**Scope** + +* Link-order invariance tests. +* Spec coverage tests. +* Optional: lightweight “forbidden patterns” tests. + +**Requirements Checklist** + +* [ ] Changing an opcode immediate size requires updating only the spec and tests. + +**Completion Tests** + +* [ ] All new regression tests pass. + +--- + +## Summary of Estimated Cost (Points) + +* Phase 1: PR-01 (3) + PR-02 (5) + PR-03 (3) = **11** +* Phase 2: PR-04 (5) + PR-05 (3) + PR-06 (3) = **11** +* Phase 3: PR-07 (5) + PR-08 (5) + PR-09 (3) = **13** +* Phase 4: PR-10 (5) + PR-11 (5) = **10** +* Phase 5: PR-12 (3) + PR-13 (3) = **6** + +**Total: 51 points** + +> Note: If any PR starts to exceed 5 points in practice, it must be split into smaller PRs. + +--- + +## Non-Negotiables + +* No compatibility with legacy encodings. +* No heuristics. +* No string hacks. +* One canonical decoder/spec/layout. +* Everything in English (including review comments).