prometeu-studio/docs/pbs/agendas/13. Conformance Test Agenda.md

# PBS Conformance Test Agenda

Status: Active

## Purpose

Drive the decisions needed to turn `13. Conformance Test Specification.md` into an executable conformance contract for PBS v1.

## Context

The current conformance spec already defines a source-level baseline, but it still leaves open:

- whether PBS has one conformance level or staged claims,
- how diagnostics participate in conformance,
- when artifact-level golden tests become mandatory,
- how fixtures for stdlib environments, host registries, and capability grants are modeled,
- and how compatibility promises are validated over time.

This agenda should keep conformance practical and layered so the project can evolve from frontend tests to full-toolchain claims without ambiguity.

## Decisions To Produce

1. Decide the conformance-claim model:
   one level only or staged levels such as frontend-only and full toolchain.
2. Decide the normative oracle shape for diagnostics.
3. Decide when artifact-level conformance becomes required in addition to source-level behavior.
4. Decide the minimum fixture model for stdlib, host registry, capabilities, and runtime lines.
5. Decide how compatibility and regression claims are encoded in the conformance corpus.

## Core Questions

1. Is a parser plus binder implementation allowed to claim partial conformance, and under what label?
2. Do diagnostics tests assert codes, phases, spans, wording classes, or only acceptance and rejection?
3. Which lowering and host-binding invariants require golden tests once `12` and `15` are closed?
4. What is the smallest reusable fixture set that still exercises host-backed and stdlib-backed surfaces honestly?
5. How should compatibility expectations across language, stdlib, and cartridge domains be tested over time?

## Proposed Workshop Sequence

### Workshop 1: Conformance Claim Levels

Purpose:

- decide whether PBS has one conformance level or staged claims.

Expected decisions:

- claim taxonomy,
- minimum requirements for each claim,
- and naming constraints for partial implementations.

### Workshop 2: Source-Level Oracle and Diagnostic Oracle

Purpose:

- close what source-level behavior must be asserted,
- and decide how diagnostics enter conformance.

Expected decisions:

- positive versus negative suite boundaries,
- diagnostic oracle granularity,
- and minimum deterministic assertions.

### Workshop 3: Artifact-Level Conformance and Fixtures

Purpose:

- decide when artifact-level tests become mandatory,
- and close the fixture model for stdlib, registry, capability, and runtime scenarios.

Expected decisions:

- artifact-golden boundary,
- fixture strategy,
- and environment assumptions for host-backed tests.

### Workshop 4: Regression and Compatibility Matrices

Purpose:

- decide how already-published behavior claims are preserved and tested over time.

Expected decisions:

- regression corpus policy,
- compatibility-matrix expectations,
- and alignment with `17`.

## Expected Spec Material

The resulting spec work should be able to add or close sections for:

- conformance levels or claim classes,
- required positive and negative suites,
- diagnostic oracle rules,
- artifact-level oracle rules,
- fixture model and environment assumptions,
- regression and compatibility matrices,
- and acceptance criteria for claiming PBS v1 support.

## Non-Goals

- Choosing one repository layout or test framework.
- Turning benchmarks into conformance.
- Replacing the normative specs with test precedent.
- Freezing every temporary implementation quirk as a golden artifact.

## Inputs

- `docs/pbs/specs/11. Diagnostics Specification.md`
- `docs/pbs/specs/12. IR and Lowering Specification.md`
- `docs/pbs/specs/13. Conformance Test Specification.md`
- `docs/pbs/specs/15. Bytecode and PBX Mapping Specification.md`
- `docs/pbs/specs/17. Compatibility and Evolution Policy.md`
- `docs/pbs/specs/19. Verification and Safety Checks Specification.md`