prometeu-studio/discussion/lessons/DSC-0011-compiler-analyze-compile-build-pipeline-split/LSN-0025-compiler-pipeline-entrypoints-and-result-boundaries.md
2026-03-30 19:52:01 +01:00

111 lines
5.8 KiB
Markdown

---
id: LSN-0025
ticket: compiler-analyze-compile-build-pipeline-split
title: Compiler Pipeline Entrypoints and Result Boundaries
created: 2026-03-30
tags: [compiler, pipeline, analyze, compile, build, contracts, conformance]
---
## Context
The compiler pipeline used to expose one public `run` flow that always resolved dependencies, loaded sources, ran the frontend, lowered to IRVM, optimized, emitted bytecode, verified it, and finally wrote `build/program.pbx`.
That shape hid three different intents behind one operation:
- tooling-only semantic analysis with no artifact side effects,
- in-memory executable compilation with no disk write,
- and filesystem-backed artifact materialization.
This became a real boundary problem once Studio and future LSP-like consumers needed semantic results without forcing PBX persistence.
## Key Decisions
### Keep One Canonical Pipeline, Not Three Divergent Pipelines
**What:**
The compiler now keeps one canonical shared stage order and exposes three public entrypoints over that same pipeline: `analyze`, `compile`, and `build`.
**Why:**
The important architectural rule is shared semantics with different terminal boundaries, not separate services that slowly drift apart.
`build` must stay defined as `compile` plus terminal persistence, not as another independent executable path.
**Trade-offs:**
This keeps behavior consistent for callers, but it requires the stage boundaries and result contracts to be explicit.
Without explicit contracts, one shared pipeline easily collapses back into a mutable context API that callers misuse.
### Make Terminal Stage Boundaries Part of the Public Contract
**What:**
The entrypoints now mean:
- `analyze = ResolveDeps + LoadSources + FrontendPhase`
- `compile = analyze + LowerToIRVM + OptimizeIRVM + EmitBytecode + LinkBytecode + VerifyBytecode`
- `build = compile + WriteBytecodeArtifact`
**Why:**
The value is not just naming.
Each entrypoint now communicates a precise side-effect boundary and a precise result boundary.
That lets tooling consumers ask for semantic facts, executable callers ask for validated in-memory bytecode, and filesystem callers ask for persisted artifacts without inventing alternate pipeline semantics.
**Trade-offs:**
The team must protect these boundaries with tests and conformance docs.
If callers start bypassing them with ad hoc helpers, `compile` and `build` drift immediately.
### Publish Stable Result Contracts Instead of Leaking Mutable Pipeline Context
**What:**
The public surface now returns stable record contracts:
- `AnalysisSnapshot`
- `CompileResult`
- `BuildResult`
**Why:**
`BuilderPipelineContext` is mutable internal state, not a good external contract.
Stable result models make the minimum payload explicit and keep callers from depending on incidental intermediate fields.
**Trade-offs:**
This adds adapter code from pipeline context to public results.
That cost is worth paying because it limits public coupling and makes future multi-caller evolution safer.
## Final Implementation
The implementation landed across specs, code, and tests:
- Chapter 23 in `docs/specs/compiler` now defines the canonical entrypoints and minimum result contracts.
- `BuilderPipelineService` now exposes `analyze`, `compile`, and `build` as the public surface.
- `Compile.main` now composes the default filesystem context and calls `build`.
- `AnalysisSnapshot`, `CompileResult`, and `BuildResult` carry the stable output contracts.
- integration coverage now proves that `analyze` and `compile` do not write `build/program.pbx`, while `build` does.
## Examples
- Use `analyze` when the caller needs diagnostics, source table access, workspace resolution, and frontend semantic facts for tooling.
- Use `compile` when the caller needs validated executable bytecode in memory and must not touch the filesystem.
- Use `build` when the caller wants the normal artifact-producing compiler behavior and a concrete `program.pbx` path.
## Pitfalls
- Do not reintroduce a public `run` alias. That would blur the side-effect boundary the discussion just made explicit.
- Do not let `build` diverge semantically from `compile`. The only extra step for `build` is terminal artifact persistence.
- Do not leak `BuilderPipelineContext` back into callsites as the real public contract. That would make the stable result models nominal only.
- Do not add caller-specific configs that silently change stage order or stage meaning under the names `analyze`, `compile`, or `build`.
- Do not treat `compile` as "half-build". It is a complete validated in-memory executable result, not an editorially weaker path.
## References
- `DEC-0007` Canonical compiler entrypoints for analyze, compile, and build
- `PLN-0009` Propagate DEC-0007 into compiler pipeline specs and public contracts
- `PLN-0010` Refactor BuilderPipelineService into explicit analyze, compile, and build entrypoints
- `PLN-0011` Migrate compiler callsites and tests to explicit build, compile, and analyze entrypoints
- `docs/specs/compiler/23. Compiler Pipeline Entry Points Specification.md`
- `docs/specs/compiler/22. Backend Spec-to-Test Conformance Matrix.md`
- `prometeu-compiler/prometeu-build-pipeline/src/main/java/p/studio/compiler/workspaces/BuilderPipelineService.java`
- `prometeu-compiler/prometeu-build-pipeline/src/test/java/p/studio/compiler/integration/MainProjectPipelineIntegrationTest.java`
## Takeaways
- The durable pattern is one canonical compiler pipeline with explicit terminal entrypoints, not multiple near-duplicate pipelines.
- Side-effect boundaries are first-class API semantics: `analyze` and `compile` must stay no-write, and `build` is the only artifact-materialization path.
- Stable result contracts are part of the architectural fix; callers should consume `AnalysisSnapshot`, `CompileResult`, and `BuildResult`, not mutable pipeline internals.