prometeu-studio/docs/packer/pull-requests/PR-24-asset-file-cache-hydration-and-walker-reuse.md
2026-03-24 13:42:49 +00:00

201 lines
14 KiB
Markdown

# PR-24 Asset File Cache Hydration and Walker Reuse
Domain Owner: `docs/packer`
Cross-Domain Impact: `docs/studio`
## Briefing
The runtime loader already walks asset roots and produces `walkResult`, and `PackerWorkspacePaths` already reserves `assets/.prometeu/cache.json`.
What is still missing is the actual cache lifecycle:
- previous cache is not loaded before a walk;
- walkers do not receive prior file facts for comparison;
- `walkResult` does not become a durable cache artifact after the scan.
That leaves the current runtime path unable to reuse prior file knowledge such as `lastModified`, `size`, `fingerprint`, and family-specific probe metadata.
This PR introduces the first durable asset file cache flow for the runtime-backed packer wave.
It also tightens how walk output becomes part of the runtime snapshot and how diagnostics are split between normal aggregated surfaces and file-scoped UI-facing surfaces.
## Objective
Deliver an asset-scoped file cache stored in `assets/.prometeu/cache.json`, hydrated before the runtime walk and refreshed from the current `walkResult` after the walk completes, while also attaching the current `walkResult` to the runtime snapshot.
## Dependencies
- [`./PR-14-project-runtime-core-snapshot-model-and-lifecycle.md`](./PR-14-project-runtime-core-snapshot-model-and-lifecycle.md)
- [`./PR-15-snapshot-backed-asset-query-services.md`](./PR-15-snapshot-backed-asset-query-services.md)
- [`./PR-16-write-lane-command-completion-and-used-write-services.md`](./PR-16-write-lane-command-completion-and-used-write-services.md)
- [`./PR-21-point-in-memory-snapshot-updates-after-write-commit.md`](./PR-21-point-in-memory-snapshot-updates-after-write-commit.md)
- [`../specs/2. Workspace, Registry, and Asset Identity Specification.md`](../specs/2.%20Workspace,%20Registry,%20and%20Asset%20Identity%20Specification.md)
- [`../specs/4. Build Artifacts and Deterministic Packing Specification.md`](../specs/4.%20Build%20Artifacts%20and%20Deterministic%20Packing%20Specification.md)
- [`../specs/5. Diagnostics, Operations, and Studio Integration Specification.md`](../specs/5.%20Diagnostics,%20Operations,%20and%20Studio%20Integration%20Specification.md)
## Scope
- define the durable schema for `assets/.prometeu/cache.json`
- store cache entries per asset and per discovered file, not as one flat global fingerprint bag
- restrict cache and internal file walk analysis to assets that are already registered and therefore have stable `asset_id`
- load prior cache state during runtime snapshot bootstrap and refresh
- pass prior asset-scoped cache entries into the asset walker
- let walkers compare current file observations against prior cached facts such as `lastModified`, `size`, `fingerprint`, and family-specific metadata
- treat the current `walkResult` as the source used to build the next durable cache state
- attach the current `walkResult` to the in-memory runtime snapshot for later query and UI use
- persist refreshed cache after a successful runtime load or write-path point patch that recomputes asset content
- keep cache miss, corruption, or version mismatch non-fatal for normal asset reads
- keep the Studio-visible asset query surface stable while the cache becomes an internal optimization and comparison input
- keep diagnostics out of the durable cache artifact
- sink general walk diagnostics into the normal asset/runtime diagnostics surface
- preserve file-scoped diagnostics as segregated walk output for UI consumers
## Non-Goals
- no remote/shared cache
- no final `build`/`pack` incremental pipeline
- no background watch service or external reconcile loop
- no silent reuse of stale cache entries when file identity no longer matches the current asset file
- no cache file per asset root; the baseline artifact remains the workspace-level `assets/.prometeu/cache.json`
- no UI contract that exposes raw cache internals directly to Studio
- no cache support for unregistered assets; registration remains the prerequisite for internal file analysis and durable cache ownership
## Execution Shape
`PR-24` should be treated as an umbrella execution plan, not as one direct implementation PR.
This work should be split into smaller follow-up PRs so cache persistence, walker reuse policy, and runtime snapshot integration can each land with narrow tests and isolated regressions.
## Execution Method
1. Introduce a packer-owned cache repository around `PackerWorkspacePaths.cachePath(project)`.
The repository must load, validate, and save one workspace cache artifact without leaking raw filesystem JSON handling into loaders or walkers.
2. Define a versioned durable cache model.
The baseline model should include:
- workspace-level schema/version fields
- asset-scoped entries keyed by stable `asset_id`
- file-scoped entries keyed by normalized relative path inside the asset root
- reusable file facts such as mime type, size, `lastModified`, content fingerprint, and family-specific probe metadata
- no persisted diagnostics; diagnostics remain runtime results produced by the current walk only
3. Extend walker inputs so previous cache is available during content probing.
The walker contract should receive the prior asset cache view together with the declaration and asset root, rather than forcing each concrete walker to reopen cache storage on its own.
Unregistered assets do not enter this flow; they must be registered first before internal file analysis and cache ownership apply.
4. Define cache comparison rules inside the walker layer.
Baseline rules:
- if current file `size` differs from cached `size`, cached data is invalid immediately
- if current file `lastModified` is after cached `lastModified`, cached data is invalid immediately
- content hash or fingerprint should be the last comparison step, used only when the cheaper checks do not already force invalidation and the policy still needs stronger confirmation
- if prior file facts remain valid under that ordered comparison policy, the walker may reuse prior metadata instead of recomputing everything
- if identity facts differ, the walker must treat the file as changed and emit fresh probe output
- missing prior cache is a normal cache miss, not an error
- corrupted or incompatible prior cache should surface diagnostics or operational logging as appropriate, then fall back to a cold walk
5. Promote `walkResult` from transient scan output to cache refresh input.
After a successful walk, the loader must convert only the cacheable portions of the current `walkResult` into the next durable asset cache entry set and merge it into the workspace cache model.
Persisted cache data must be limited to reusable probe facts and metadata, never diagnostics.
6. Attach walk output to the runtime snapshot.
The runtime snapshot should retain a dedicated runtime projection of the current walk output, not the raw probe objects themselves, so query services and Studio-facing adapters can access file-scoped probe metadata and file-scoped diagnostics without forcing a new filesystem walk.
The initial runtime posture should keep the full available file set and the subset that is currently build-eligible, plus bank-size measurement data needed by future fixed-size hardware bank checks.
The raw probe may still carry file bytes during the active walk, but the snapshot projection must strip byte payloads before retention.
The snapshot should keep inventory, probe metadata, build-candidate classification, and bank-size measurements, but not whole file contents or raw `PackerFileProbe` instances by default.
Later cleanup may reduce that retained surface, but the first implementation should prefer preserving available walk data rather than prematurely trimming it.
7. Split diagnostic sinks intentionally.
Baseline rule:
- asset-level or walk-level diagnostics that represent the normal operational truth of the asset should flow into the usual runtime/query diagnostics sink
- file-scoped diagnostics produced by probe processing should remain segregated per file inside the walk result projection
- Studio may consume those file-scoped diagnostics for detailed UI rendering, but that segregation must not be lost by collapsing everything into one flat diagnostics list
- none of those diagnostics are persisted in `cache.json`
8. Persist cache only at stable visibility points.
The normal runtime path should save refreshed cache after the loader finishes building a coherent snapshot.
Write-path flows that patch one asset in memory should update only the affected asset cache entry after durable commit and successful re-walk.
9. Keep runtime snapshot and cache ownership aligned.
Runtime snapshot data may retain the current walk output needed by query services, but the durable cache artifact remains a packer-owned operational store under `assets/.prometeu/cache.json`.
10. Emit observability only at meaningful boundaries.
The implementation may emit `cache_hit` and `cache_miss` events or counters, but adapters must not collapse cache behavior into fake asset-change semantics.
## Acceptance Criteria
- runtime load attempts to read `assets/.prometeu/cache.json` before walking assets
- prior asset-scoped cache entries are passed into walkers as comparison input
- cache entries are keyed by stable `asset_id`, not by asset path
- unregistered assets do not receive cache entries and do not undergo internal file analysis before registration
- walkers compare current files against prior facts using ordered checks where `size` invalidates first, `lastModified` invalidates next when the current value is newer, and fingerprint/hash remains the final expensive check
- the current walk output is attached to the in-memory runtime snapshot through a byte-free runtime projection, not through raw probe objects
- the runtime snapshot keeps enough walk data to expose available files, build-candidate files, and bank-size measurement data
- the runtime snapshot does not retain raw bytes for every discovered file by default
- normal asset/runtime diagnostics include the general walk diagnostics that should participate in the standard diagnostics surface
- file-scoped diagnostics remain segregated in the walk result projection for UI consumers
- the resulting `walkResult` is used to refresh the durable cache state
- successful runtime load writes a coherent updated cache artifact back to `assets/.prometeu/cache.json`
- missing, corrupted, or version-mismatched cache does not block snapshot load; the packer falls back to a cold walk
- point write flows that already patch one asset in memory can refresh only that asset's cache slice after commit instead of forcing full cache rebuild
- cache entries are isolated by asset and file path so one asset cannot accidentally reuse another asset's file facts
- persisted cache does not contain diagnostics from prior runs
- Studio list/details behavior remains stable and does not depend on direct cache awareness
## Tests
- loader tests for cold load when `cache.json` is absent
- loader tests for warm load when prior cache exists and matches current files
- loader tests for fallback when `cache.json` is malformed, unreadable, or schema-incompatible
- cache model tests proving asset cache lookup is aligned by `asset_id`
- walker tests proving changed `size` invalidates reuse immediately
- walker tests proving newer `lastModified` invalidates reuse immediately
- walker tests proving fingerprint/hash is evaluated only as the last comparison step when cheaper checks do not already invalidate reuse
- walker tests proving stable files can reuse prior metadata without changing query-visible results
- cache serialization tests proving diagnostics are never written to `cache.json`
- snapshot/query tests proving `walkResult` is attached to the runtime asset model
- tests proving general walk diagnostics sink into the normal diagnostics surface
- tests proving file-scoped diagnostics remain segregated per file for UI-facing consumers
- runtime registry tests for point cache refresh after write commit on one asset
- event or observability tests for `cache_hit` and `cache_miss` boundaries if those signals are emitted in this wave
## Risks and Recovery
- path-keyed cache would become unsafe during relocate flows, so the cache owner key must remain `asset_id`
- overly aggressive cache reuse can hide real content changes if comparison rules are under-specified
- saving cache at the wrong lifecycle point can publish partial truth that no coherent snapshot ever observed
- if one part of the cache flow proves unstable, recovery should disable cache hydration or persistence for that path and preserve the current cold-walk behavior until the narrower follow-up PR is corrected
## Affected Artifacts
- `docs/packer/pull-requests/**`
- `docs/packer/specs/2. Workspace, Registry, and Asset Identity Specification.md`
- `docs/packer/specs/5. Diagnostics, Operations, and Studio Integration Specification.md`
- `prometeu-packer/prometeu-packer-v1/src/main/java/p/packer/PackerWorkspacePaths.java`
- `prometeu-packer/prometeu-packer-v1/src/main/java/p/packer/repositories/**`
- `prometeu-packer/prometeu-packer-v1/src/main/java/p/packer/models/**`
- `prometeu-packer/prometeu-packer-v1/src/test/java/p/packer/services/**`
- `prometeu-packer/prometeu-packer-v1/src/test/java/p/packer/repositories/**`
## Suggested Next Step
Derive smaller implementation PRs from `PR-24`:
1. cache model and repository
Scope:
- durable `cache.json` schema
- load/save repository
- `asset_id`-aligned cache lookup
- serialization tests proving diagnostics are excluded
2. walker contract and comparison policy
Scope:
- previous-cache input contract
- ordered invalidation checks by `size`, then newer `lastModified`, then fingerprint/hash
- file-scoped diagnostics preservation
3. runtime snapshot and loader integration
Scope:
- attach `walkResult` to runtime snapshot
- sink general diagnostics into the normal asset/runtime diagnostics surface
- refresh cache from the cacheable parts of `walkResult`
- point write-path refresh for one affected asset