201 lines
14 KiB
Markdown
201 lines
14 KiB
Markdown
# PR-24 Asset File Cache Hydration and Walker Reuse
|
|
|
|
Domain Owner: `docs/packer`
|
|
Cross-Domain Impact: `docs/studio`
|
|
|
|
## Briefing
|
|
|
|
The runtime loader already walks asset roots and produces `walkResult`, and `PackerWorkspacePaths` already reserves `assets/.prometeu/cache.json`.
|
|
|
|
What is still missing is the actual cache lifecycle:
|
|
|
|
- previous cache is not loaded before a walk;
|
|
- walkers do not receive prior file facts for comparison;
|
|
- `walkResult` does not become a durable cache artifact after the scan.
|
|
|
|
That leaves the current runtime path unable to reuse prior file knowledge such as `lastModified`, `size`, `fingerprint`, and family-specific probe metadata.
|
|
|
|
This PR introduces the first durable asset file cache flow for the runtime-backed packer wave.
|
|
It also tightens how walk output becomes part of the runtime snapshot and how diagnostics are split between normal aggregated surfaces and file-scoped UI-facing surfaces.
|
|
|
|
## Objective
|
|
|
|
Deliver an asset-scoped file cache stored in `assets/.prometeu/cache.json`, hydrated before the runtime walk and refreshed from the current `walkResult` after the walk completes, while also attaching the current `walkResult` to the runtime snapshot.
|
|
|
|
## Dependencies
|
|
|
|
- [`./PR-14-project-runtime-core-snapshot-model-and-lifecycle.md`](./PR-14-project-runtime-core-snapshot-model-and-lifecycle.md)
|
|
- [`./PR-15-snapshot-backed-asset-query-services.md`](./PR-15-snapshot-backed-asset-query-services.md)
|
|
- [`./PR-16-write-lane-command-completion-and-used-write-services.md`](./PR-16-write-lane-command-completion-and-used-write-services.md)
|
|
- [`./PR-21-point-in-memory-snapshot-updates-after-write-commit.md`](./PR-21-point-in-memory-snapshot-updates-after-write-commit.md)
|
|
- [`../specs/2. Workspace, Registry, and Asset Identity Specification.md`](../specs/2.%20Workspace,%20Registry,%20and%20Asset%20Identity%20Specification.md)
|
|
- [`../specs/4. Build Artifacts and Deterministic Packing Specification.md`](../specs/4.%20Build%20Artifacts%20and%20Deterministic%20Packing%20Specification.md)
|
|
- [`../specs/5. Diagnostics, Operations, and Studio Integration Specification.md`](../specs/5.%20Diagnostics,%20Operations,%20and%20Studio%20Integration%20Specification.md)
|
|
|
|
## Scope
|
|
|
|
- define the durable schema for `assets/.prometeu/cache.json`
|
|
- store cache entries per asset and per discovered file, not as one flat global fingerprint bag
|
|
- restrict cache and internal file walk analysis to assets that are already registered and therefore have stable `asset_id`
|
|
- load prior cache state during runtime snapshot bootstrap and refresh
|
|
- pass prior asset-scoped cache entries into the asset walker
|
|
- let walkers compare current file observations against prior cached facts such as `lastModified`, `size`, `fingerprint`, and family-specific metadata
|
|
- treat the current `walkResult` as the source used to build the next durable cache state
|
|
- attach the current `walkResult` to the in-memory runtime snapshot for later query and UI use
|
|
- persist refreshed cache after a successful runtime load or write-path point patch that recomputes asset content
|
|
- keep cache miss, corruption, or version mismatch non-fatal for normal asset reads
|
|
- keep the Studio-visible asset query surface stable while the cache becomes an internal optimization and comparison input
|
|
- keep diagnostics out of the durable cache artifact
|
|
- sink general walk diagnostics into the normal asset/runtime diagnostics surface
|
|
- preserve file-scoped diagnostics as segregated walk output for UI consumers
|
|
|
|
## Non-Goals
|
|
|
|
- no remote/shared cache
|
|
- no final `build`/`pack` incremental pipeline
|
|
- no background watch service or external reconcile loop
|
|
- no silent reuse of stale cache entries when file identity no longer matches the current asset file
|
|
- no cache file per asset root; the baseline artifact remains the workspace-level `assets/.prometeu/cache.json`
|
|
- no UI contract that exposes raw cache internals directly to Studio
|
|
- no cache support for unregistered assets; registration remains the prerequisite for internal file analysis and durable cache ownership
|
|
|
|
## Execution Shape
|
|
|
|
`PR-24` should be treated as an umbrella execution plan, not as one direct implementation PR.
|
|
|
|
This work should be split into smaller follow-up PRs so cache persistence, walker reuse policy, and runtime snapshot integration can each land with narrow tests and isolated regressions.
|
|
|
|
## Execution Method
|
|
|
|
1. Introduce a packer-owned cache repository around `PackerWorkspacePaths.cachePath(project)`.
|
|
The repository must load, validate, and save one workspace cache artifact without leaking raw filesystem JSON handling into loaders or walkers.
|
|
|
|
2. Define a versioned durable cache model.
|
|
The baseline model should include:
|
|
- workspace-level schema/version fields
|
|
- asset-scoped entries keyed by stable `asset_id`
|
|
- file-scoped entries keyed by normalized relative path inside the asset root
|
|
- reusable file facts such as mime type, size, `lastModified`, content fingerprint, and family-specific probe metadata
|
|
- no persisted diagnostics; diagnostics remain runtime results produced by the current walk only
|
|
|
|
3. Extend walker inputs so previous cache is available during content probing.
|
|
The walker contract should receive the prior asset cache view together with the declaration and asset root, rather than forcing each concrete walker to reopen cache storage on its own.
|
|
Unregistered assets do not enter this flow; they must be registered first before internal file analysis and cache ownership apply.
|
|
|
|
4. Define cache comparison rules inside the walker layer.
|
|
Baseline rules:
|
|
- if current file `size` differs from cached `size`, cached data is invalid immediately
|
|
- if current file `lastModified` is after cached `lastModified`, cached data is invalid immediately
|
|
- content hash or fingerprint should be the last comparison step, used only when the cheaper checks do not already force invalidation and the policy still needs stronger confirmation
|
|
- if prior file facts remain valid under that ordered comparison policy, the walker may reuse prior metadata instead of recomputing everything
|
|
- if identity facts differ, the walker must treat the file as changed and emit fresh probe output
|
|
- missing prior cache is a normal cache miss, not an error
|
|
- corrupted or incompatible prior cache should surface diagnostics or operational logging as appropriate, then fall back to a cold walk
|
|
|
|
5. Promote `walkResult` from transient scan output to cache refresh input.
|
|
After a successful walk, the loader must convert only the cacheable portions of the current `walkResult` into the next durable asset cache entry set and merge it into the workspace cache model.
|
|
Persisted cache data must be limited to reusable probe facts and metadata, never diagnostics.
|
|
|
|
6. Attach walk output to the runtime snapshot.
|
|
The runtime snapshot should retain a dedicated runtime projection of the current walk output, not the raw probe objects themselves, so query services and Studio-facing adapters can access file-scoped probe metadata and file-scoped diagnostics without forcing a new filesystem walk.
|
|
The initial runtime posture should keep the full available file set and the subset that is currently build-eligible, plus bank-size measurement data needed by future fixed-size hardware bank checks.
|
|
The raw probe may still carry file bytes during the active walk, but the snapshot projection must strip byte payloads before retention.
|
|
The snapshot should keep inventory, probe metadata, build-candidate classification, and bank-size measurements, but not whole file contents or raw `PackerFileProbe` instances by default.
|
|
Later cleanup may reduce that retained surface, but the first implementation should prefer preserving available walk data rather than prematurely trimming it.
|
|
|
|
7. Split diagnostic sinks intentionally.
|
|
Baseline rule:
|
|
- asset-level or walk-level diagnostics that represent the normal operational truth of the asset should flow into the usual runtime/query diagnostics sink
|
|
- file-scoped diagnostics produced by probe processing should remain segregated per file inside the walk result projection
|
|
- Studio may consume those file-scoped diagnostics for detailed UI rendering, but that segregation must not be lost by collapsing everything into one flat diagnostics list
|
|
- none of those diagnostics are persisted in `cache.json`
|
|
|
|
8. Persist cache only at stable visibility points.
|
|
The normal runtime path should save refreshed cache after the loader finishes building a coherent snapshot.
|
|
Write-path flows that patch one asset in memory should update only the affected asset cache entry after durable commit and successful re-walk.
|
|
|
|
9. Keep runtime snapshot and cache ownership aligned.
|
|
Runtime snapshot data may retain the current walk output needed by query services, but the durable cache artifact remains a packer-owned operational store under `assets/.prometeu/cache.json`.
|
|
|
|
10. Emit observability only at meaningful boundaries.
|
|
The implementation may emit `cache_hit` and `cache_miss` events or counters, but adapters must not collapse cache behavior into fake asset-change semantics.
|
|
|
|
## Acceptance Criteria
|
|
|
|
- runtime load attempts to read `assets/.prometeu/cache.json` before walking assets
|
|
- prior asset-scoped cache entries are passed into walkers as comparison input
|
|
- cache entries are keyed by stable `asset_id`, not by asset path
|
|
- unregistered assets do not receive cache entries and do not undergo internal file analysis before registration
|
|
- walkers compare current files against prior facts using ordered checks where `size` invalidates first, `lastModified` invalidates next when the current value is newer, and fingerprint/hash remains the final expensive check
|
|
- the current walk output is attached to the in-memory runtime snapshot through a byte-free runtime projection, not through raw probe objects
|
|
- the runtime snapshot keeps enough walk data to expose available files, build-candidate files, and bank-size measurement data
|
|
- the runtime snapshot does not retain raw bytes for every discovered file by default
|
|
- normal asset/runtime diagnostics include the general walk diagnostics that should participate in the standard diagnostics surface
|
|
- file-scoped diagnostics remain segregated in the walk result projection for UI consumers
|
|
- the resulting `walkResult` is used to refresh the durable cache state
|
|
- successful runtime load writes a coherent updated cache artifact back to `assets/.prometeu/cache.json`
|
|
- missing, corrupted, or version-mismatched cache does not block snapshot load; the packer falls back to a cold walk
|
|
- point write flows that already patch one asset in memory can refresh only that asset's cache slice after commit instead of forcing full cache rebuild
|
|
- cache entries are isolated by asset and file path so one asset cannot accidentally reuse another asset's file facts
|
|
- persisted cache does not contain diagnostics from prior runs
|
|
- Studio list/details behavior remains stable and does not depend on direct cache awareness
|
|
|
|
## Tests
|
|
|
|
- loader tests for cold load when `cache.json` is absent
|
|
- loader tests for warm load when prior cache exists and matches current files
|
|
- loader tests for fallback when `cache.json` is malformed, unreadable, or schema-incompatible
|
|
- cache model tests proving asset cache lookup is aligned by `asset_id`
|
|
- walker tests proving changed `size` invalidates reuse immediately
|
|
- walker tests proving newer `lastModified` invalidates reuse immediately
|
|
- walker tests proving fingerprint/hash is evaluated only as the last comparison step when cheaper checks do not already invalidate reuse
|
|
- walker tests proving stable files can reuse prior metadata without changing query-visible results
|
|
- cache serialization tests proving diagnostics are never written to `cache.json`
|
|
- snapshot/query tests proving `walkResult` is attached to the runtime asset model
|
|
- tests proving general walk diagnostics sink into the normal diagnostics surface
|
|
- tests proving file-scoped diagnostics remain segregated per file for UI-facing consumers
|
|
- runtime registry tests for point cache refresh after write commit on one asset
|
|
- event or observability tests for `cache_hit` and `cache_miss` boundaries if those signals are emitted in this wave
|
|
|
|
## Risks and Recovery
|
|
|
|
- path-keyed cache would become unsafe during relocate flows, so the cache owner key must remain `asset_id`
|
|
- overly aggressive cache reuse can hide real content changes if comparison rules are under-specified
|
|
- saving cache at the wrong lifecycle point can publish partial truth that no coherent snapshot ever observed
|
|
- if one part of the cache flow proves unstable, recovery should disable cache hydration or persistence for that path and preserve the current cold-walk behavior until the narrower follow-up PR is corrected
|
|
|
|
## Affected Artifacts
|
|
|
|
- `docs/packer/pull-requests/**`
|
|
- `docs/packer/specs/2. Workspace, Registry, and Asset Identity Specification.md`
|
|
- `docs/packer/specs/5. Diagnostics, Operations, and Studio Integration Specification.md`
|
|
- `prometeu-packer/prometeu-packer-v1/src/main/java/p/packer/PackerWorkspacePaths.java`
|
|
- `prometeu-packer/prometeu-packer-v1/src/main/java/p/packer/repositories/**`
|
|
- `prometeu-packer/prometeu-packer-v1/src/main/java/p/packer/models/**`
|
|
- `prometeu-packer/prometeu-packer-v1/src/test/java/p/packer/services/**`
|
|
- `prometeu-packer/prometeu-packer-v1/src/test/java/p/packer/repositories/**`
|
|
|
|
## Suggested Next Step
|
|
|
|
Derive smaller implementation PRs from `PR-24`:
|
|
|
|
1. cache model and repository
|
|
Scope:
|
|
- durable `cache.json` schema
|
|
- load/save repository
|
|
- `asset_id`-aligned cache lookup
|
|
- serialization tests proving diagnostics are excluded
|
|
|
|
2. walker contract and comparison policy
|
|
Scope:
|
|
- previous-cache input contract
|
|
- ordered invalidation checks by `size`, then newer `lastModified`, then fingerprint/hash
|
|
- file-scoped diagnostics preservation
|
|
|
|
3. runtime snapshot and loader integration
|
|
Scope:
|
|
- attach `walkResult` to runtime snapshot
|
|
- sink general diagnostics into the normal asset/runtime diagnostics surface
|
|
- refresh cache from the cacheable parts of `walkResult`
|
|
- point write-path refresh for one affected asset
|