48 lines
1.3 KiB
Markdown
48 lines
1.3 KiB
Markdown
# Incremental Refresh Plan
|
|
|
|
Date: 2026-03-23
|
|
Status: executable in MVP mode
|
|
|
|
## 1. Objective
|
|
|
|
Keep canonical store current for open periods without running full historical load each time.
|
|
|
|
## 2. Schedule recommendation
|
|
|
|
- business hours: every 15-60 minutes for critical sets
|
|
- off-hours: one consolidation run
|
|
- manual targeted run on-demand for urgent drill-down
|
|
|
|
## 3. Standard command
|
|
|
|
Example incremental window:
|
|
|
|
```powershell
|
|
python scripts/run_refresh.py --mode incremental --from-date 2026-01-01T00:00:00 --limit-per-set 200
|
|
```
|
|
|
|
Targeted catch-up:
|
|
|
|
```powershell
|
|
python scripts/run_refresh.py --mode targeted --target-id 68.02 --limit-per-set 200
|
|
```
|
|
|
|
## 4. Operational controls
|
|
|
|
1. Watch latest run status via `GET /refresh/runs`.
|
|
2. Watch store health via `GET /store/stats`.
|
|
3. Alert on consecutive `failed` runs.
|
|
4. Alert on repeated growth of `failed_entity_sets`.
|
|
|
|
## 5. Idempotency and consistency
|
|
|
|
- Entity writes are upsert-based (`source_entity`, `source_id`).
|
|
- Links for each source entity are replaced each run to avoid stale edges.
|
|
- Checkpoints update only for successfully processed entity sets.
|
|
|
|
## 6. Known MVP limits
|
|
|
|
- Date filtering is best-effort from common date fields.
|
|
- No CDC stream; refresh is pull-based.
|
|
- Large enterprise-wide slices still require separate analytical batching strategy.
|