prometeu-studio/discussion/lessons/DSC-0011-compiler-analyze-compile-build-pipeline-split/LSN-0025-compiler-pipeline-entrypoints-and-result-boundaries.md
2026-03-30 19:52:01 +01:00

5.8 KiB

id ticket title created tags
LSN-0025 compiler-analyze-compile-build-pipeline-split Compiler Pipeline Entrypoints and Result Boundaries 2026-03-30
compiler
pipeline
analyze
compile
build
contracts
conformance

Context

The compiler pipeline used to expose one public run flow that always resolved dependencies, loaded sources, ran the frontend, lowered to IRVM, optimized, emitted bytecode, verified it, and finally wrote build/program.pbx.

That shape hid three different intents behind one operation:

  • tooling-only semantic analysis with no artifact side effects,
  • in-memory executable compilation with no disk write,
  • and filesystem-backed artifact materialization.

This became a real boundary problem once Studio and future LSP-like consumers needed semantic results without forcing PBX persistence.

Key Decisions

Keep One Canonical Pipeline, Not Three Divergent Pipelines

What: The compiler now keeps one canonical shared stage order and exposes three public entrypoints over that same pipeline: analyze, compile, and build.

Why: The important architectural rule is shared semantics with different terminal boundaries, not separate services that slowly drift apart. build must stay defined as compile plus terminal persistence, not as another independent executable path.

Trade-offs: This keeps behavior consistent for callers, but it requires the stage boundaries and result contracts to be explicit. Without explicit contracts, one shared pipeline easily collapses back into a mutable context API that callers misuse.

Make Terminal Stage Boundaries Part of the Public Contract

What: The entrypoints now mean:

  • analyze = ResolveDeps + LoadSources + FrontendPhase
  • compile = analyze + LowerToIRVM + OptimizeIRVM + EmitBytecode + LinkBytecode + VerifyBytecode
  • build = compile + WriteBytecodeArtifact

Why: The value is not just naming. Each entrypoint now communicates a precise side-effect boundary and a precise result boundary. That lets tooling consumers ask for semantic facts, executable callers ask for validated in-memory bytecode, and filesystem callers ask for persisted artifacts without inventing alternate pipeline semantics.

Trade-offs: The team must protect these boundaries with tests and conformance docs. If callers start bypassing them with ad hoc helpers, compile and build drift immediately.

Publish Stable Result Contracts Instead of Leaking Mutable Pipeline Context

What: The public surface now returns stable record contracts:

  • AnalysisSnapshot
  • CompileResult
  • BuildResult

Why: BuilderPipelineContext is mutable internal state, not a good external contract. Stable result models make the minimum payload explicit and keep callers from depending on incidental intermediate fields.

Trade-offs: This adds adapter code from pipeline context to public results. That cost is worth paying because it limits public coupling and makes future multi-caller evolution safer.

Final Implementation

The implementation landed across specs, code, and tests:

  • Chapter 23 in docs/specs/compiler now defines the canonical entrypoints and minimum result contracts.
  • BuilderPipelineService now exposes analyze, compile, and build as the public surface.
  • Compile.main now composes the default filesystem context and calls build.
  • AnalysisSnapshot, CompileResult, and BuildResult carry the stable output contracts.
  • integration coverage now proves that analyze and compile do not write build/program.pbx, while build does.

Examples

  • Use analyze when the caller needs diagnostics, source table access, workspace resolution, and frontend semantic facts for tooling.
  • Use compile when the caller needs validated executable bytecode in memory and must not touch the filesystem.
  • Use build when the caller wants the normal artifact-producing compiler behavior and a concrete program.pbx path.

Pitfalls

  • Do not reintroduce a public run alias. That would blur the side-effect boundary the discussion just made explicit.
  • Do not let build diverge semantically from compile. The only extra step for build is terminal artifact persistence.
  • Do not leak BuilderPipelineContext back into callsites as the real public contract. That would make the stable result models nominal only.
  • Do not add caller-specific configs that silently change stage order or stage meaning under the names analyze, compile, or build.
  • Do not treat compile as "half-build". It is a complete validated in-memory executable result, not an editorially weaker path.

References

  • DEC-0007 Canonical compiler entrypoints for analyze, compile, and build
  • PLN-0009 Propagate DEC-0007 into compiler pipeline specs and public contracts
  • PLN-0010 Refactor BuilderPipelineService into explicit analyze, compile, and build entrypoints
  • PLN-0011 Migrate compiler callsites and tests to explicit build, compile, and analyze entrypoints
  • docs/specs/compiler/23. Compiler Pipeline Entry Points Specification.md
  • docs/specs/compiler/22. Backend Spec-to-Test Conformance Matrix.md
  • prometeu-compiler/prometeu-build-pipeline/src/main/java/p/studio/compiler/workspaces/BuilderPipelineService.java
  • prometeu-compiler/prometeu-build-pipeline/src/test/java/p/studio/compiler/integration/MainProjectPipelineIntegrationTest.java

Takeaways

  • The durable pattern is one canonical compiler pipeline with explicit terminal entrypoints, not multiple near-duplicate pipelines.
  • Side-effect boundaries are first-class API semantics: analyze and compile must stay no-write, and build is the only artifact-materialization path.
  • Stable result contracts are part of the architectural fix; callers should consume AnalysisSnapshot, CompileResult, and BuildResult, not mutable pipeline internals.