NODEDC_1C/docs/incremental_refresh_plan.md

48 lines
1.3 KiB
Markdown

# Incremental Refresh Plan
Date: 2026-03-23
Status: executable in MVP mode
## 1. Objective
Keep canonical store current for open periods without running full historical load each time.
## 2. Schedule recommendation
- business hours: every 15-60 minutes for critical sets
- off-hours: one consolidation run
- manual targeted run on-demand for urgent drill-down
## 3. Standard command
Example incremental window:
```powershell
python scripts/run_refresh.py --mode incremental --from-date 2026-01-01T00:00:00 --limit-per-set 200
```
Targeted catch-up:
```powershell
python scripts/run_refresh.py --mode targeted --target-id 68.02 --limit-per-set 200
```
## 4. Operational controls
1. Watch latest run status via `GET /refresh/runs`.
2. Watch store health via `GET /store/stats`.
3. Alert on consecutive `failed` runs.
4. Alert on repeated growth of `failed_entity_sets`.
## 5. Idempotency and consistency
- Entity writes are upsert-based (`source_entity`, `source_id`).
- Links for each source entity are replaced each run to avoid stale edges.
- Checkpoints update only for successfully processed entity sets.
## 6. Known MVP limits
- Date filtering is best-effort from common date fields.
- No CDC stream; refresh is pull-based.
- Large enterprise-wide slices still require separate analytical batching strategy.