NODEDC_1C/docs/anomaly_engine_spec.md

99 lines
1.6 KiB
Markdown

# Anomaly Engine Spec (Layer 5 MVP)
Date: 2026-03-23
Status: implemented in `canonical_layer/features.py`
## 1. Engine role
Detect suspicious patterns from canonical data and refresh operations without direct write access to 1C.
## 2. Input data
- `canonical_entities`
- `canonical_links`
- `refresh_runs` (for freshness context)
- previous successful `feature_runs` and `feature_metrics` (for drift context)
## 3. Implemented anomaly rules
### 3.1 `no_canonical_data`
Trigger:
- canonical entity count is zero.
Severity:
- `high`
### 3.2 `empty_display_share_high`
Trigger:
- per-source-entity count >= 50
- and `empty_display_share >= 0.2`
Severity:
- `medium`
### 3.3 `high_link_degree`
Trigger:
- entity link count exceeds dynamic threshold `max(10, mean + 3*std)`
Severity:
- `medium` or `high` by score multiplier over threshold.
### 3.4 `missing_refresh_baseline`
Trigger:
- no successful refresh run exists.
Severity:
- `high`
### 3.5 `stale_refresh`
Trigger:
- `refresh_age_hours` exceeds configured threshold (`ANOMALY_STALE_REFRESH_THRESHOLD_HOURS`)
Severity:
- `high`
### 3.6 `entity_count_drift`
Trigger:
- previous successful feature run exists
- absolute drift ratio for `entity_count` is >= 0.3
- and absolute count difference >= 10
Severity:
- `medium` or `high` (if drift ratio >= 1.0)
## 4. Output contract
Each anomaly signal contains:
- `signal_type`
- `severity`
- `scope`
- `scope_id`
- `score`
- `details`
- `is_active`
## 5. Execution
- API: `POST /features/run`
- CLI: `python scripts/run_features.py`
- PowerShell: `scripts/run_features.ps1`