NODEDC_1C/docs/HARD_SEMANTIC/accounting_canonical_schema.md

103 lines
2.1 KiB
Markdown

# Accounting Canonical Schema (Layer 4)
Date: 2026-03-23
Status: implemented as MVP store (`canonical_layer/store.py`)
## 1. Purpose
Canonical schema is a normalized read-only store for accounting semantics extracted from 1C.
It decouples heavy analytics from live query bridge.
## 2. Physical store
Current implementation supports SQLAlchemy URLs:
- default local: `sqlite:///X:/1C/NDC_1C/data/canonical_store.db`
- production target: PostgreSQL URL
## 3. Tables
### `canonical_entities`
Main canonical facts table.
Fields:
- `source_entity` (original 1C/OData entity set)
- `source_id` (stable source key)
- `display_name` (best effort label)
- `attributes_json` (raw source payload)
- `first_seen_at`
- `updated_at`
- `last_refresh_run_id`
Constraint:
- unique key on (`source_entity`, `source_id`)
### `canonical_links`
Graph edges inferred from reference-like fields.
Fields:
- `source_entity`
- `source_id`
- `relation` (currently `reference`)
- `target_entity`
- `target_id`
- `source_field`
- `updated_at`
- `last_refresh_run_id`
### `refresh_runs`
Operational run log for extraction/refresh jobs.
Fields:
- `id`
- `mode` (`historical`, `incremental`, `targeted`)
- `status` (`running`, `success`, `partial_success`, `failed`)
- `started_at`, `finished_at`
- `requested_entity_sets_json`
- `date_from`, `date_to`
- `limit_per_set`
- `records_read`, `entities_written`, `links_written`, `checkpoints_updated`
- `details_json`, `error_message`
### `refresh_checkpoints`
Per-entity-set watermark/checkpoint state.
Fields:
- `entity_set` (PK)
- `last_success_at`
- `last_refresh_run_id`
- `last_date_from`, `last_date_to`
## 4. Entity model mapping
Current canonical model maps into:
- `Organization`
- `Counterparty`
- `Contract`
- `Account`
- `Subconto`
- `Document`
- `Posting`
- `RegisterMovement`
- `Period`
Mapping source:
- `canonical_layer/mappers.py`
## 5. Known limitations (MVP)
- `attributes_json` stores source row as-is; no typed column model yet.
- Links are heuristic until per-configuration adapters are added.
- No dedicated partitioning strategy yet (planned for PostgreSQL stage).