Planner Autonomy: закрепить catalog-alignment в phase66 spec

This commit is contained in:
dctouch 2026-05-01 16:27:39 +03:00
parent fe12967e2d
commit c69a91342f
3 changed files with 31 additions and 2 deletions

View File

@ -125,6 +125,7 @@ The following consolidation step added catalog-level chain-template scoring:
- truth-harness now raises a warning finding for `selected_lower_rank` and `selected_outside_match_set` alignment states unless the replay spec explicitly marks `allow_catalog_alignment_divergence`.
- scenario acceptance now groups that warning under `catalog_alignment_ok`, and `final_status.md` prints the invariant alongside direct-answer, temporal, truth-gate, human-answer, meta-context, and selected-object gates.
- truth-harness specs can now assert `expected_catalog_alignment_status`, `expected_catalog_chain_top_match`, and `expected_catalog_selected_matches_top` on each step.
- `address_truth_harness_phase66_human_org_open_scope_dialog.json` now uses those fields to assert `value_flow`, `value_flow_comparison`, and `value_flow_ranking` top matches across the open-organization money dialog.
## Why This Matters
@ -275,9 +276,14 @@ Latest validation after catalog-alignment spec assertions:
- Python replay-tooling tests: passed, `7 passed`
- graphify rebuild: `5951 nodes`, `12926 edges`, `139 communities`
Latest validation after phase66 catalog-alignment spec hardening:
- Python replay-tooling tests: passed, `7 passed`
- `load_truth_harness_spec` confirmed the phase66 expected top-match chain sequence: `value_flow`, `value_flow`, `value_flow`, `value_flow_comparison`, `value_flow_comparison`, `value_flow_ranking`, `value_flow_ranking`
## Next Step
The next safe step is still to re-run live replay once the 1C side is actively polling the proxy. In parallel, local-only consolidation can continue by using `alignment_status`, alignment reason-code telemetry, truth-harness artifact surfacing, the soft divergence warning, `catalog_alignment_ok`, step-level expected catalog-alignment assertions, and the representative guard to find remaining manual branches where selected chains diverge from reviewed catalog-fabric intent.
The next safe step is still to re-run live replay once the 1C side is actively polling the proxy. In parallel, local-only consolidation can continue by hardening additional planner-autonomy specs with expected catalog-chain assertions and by using `alignment_status`, alignment reason-code telemetry, truth-harness artifact surfacing, the soft divergence warning, `catalog_alignment_ok`, and the representative guard to find remaining manual branches where selected chains diverge from reviewed catalog-fabric intent.
Recommended order:

View File

@ -88,6 +88,7 @@ It now documents a turnaround that is already operational in code, already mater
- truth-harness now emits a warning finding when selected chains fall below or outside the reviewed catalog top match, unless a spec explicitly allows that divergence;
- scenario acceptance now exposes `catalog_alignment_ok`, so planner-vs-catalog divergence is a first-class acceptance invariant instead of an ungrouped warning;
- truth-harness specs can now assert expected catalog-alignment status/top-match/top-flag per step, so AGENT packs can validate the planner brain's selected chain against the reviewed catalog route fabric;
- the phase66 open-scope money dialog spec now asserts expected catalog-chain top matches across value-flow totals, bidirectional comparison, and ranking follow-ups;
- explicit-counterparty incoming-vs-outgoing data-need graphs now select the reviewed `value_flow_comparison` chain instead of falling back to generic `value_flow`;
- live map sync: [20 - planner_autonomy_consolidation_2026-05-01.md](./20%20-%20planner_autonomy_consolidation_2026-05-01.md)
@ -100,7 +101,7 @@ Current honest status:
- open-world bounded-autonomy readiness: `~85%`
- Post-F semantic integrity module progress: `~99%` operationally closed, with remaining risk now treated as next-slice discovery rather than an open blocker inside the closed slice
- active inventory-stock breadth slice progress: `100%` for the declared scenario pack, not for arbitrary inventory questions
- Planner Autonomy Consolidation progress: `~91%` for the declared module, with catalog-fabric, value-flow arbitration, lifecycle bounded inference, broad-evaluation bridge, inventory catalog templates, inventory runtime-boundary honesty, exact inventory recipe bridging, unambiguous metadata-surface lane inference, catalog chain-template scoring, structured chain-match contract exposure, runtime/debug propagation, subject-aware bidirectional comparison arbitration, structured catalog-alignment verdicts, representative alignment regression guard, catalog-alignment reason-code telemetry, explicit `alignment_status` propagation, truth-harness/acceptance-matrix surfacing, soft divergence warning, `catalog_alignment_ok` acceptance invariant, and step-level expected catalog-alignment assertions validated locally, but live replay for the new bridge is currently blocked by missing active 1C polling and broader unfamiliar 1C asks still need replay-backed growth
- Planner Autonomy Consolidation progress: `~92%` for the declared module, with catalog-fabric, value-flow arbitration, lifecycle bounded inference, broad-evaluation bridge, inventory catalog templates, inventory runtime-boundary honesty, exact inventory recipe bridging, unambiguous metadata-surface lane inference, catalog chain-template scoring, structured chain-match contract exposure, runtime/debug propagation, subject-aware bidirectional comparison arbitration, structured catalog-alignment verdicts, representative alignment regression guard, catalog-alignment reason-code telemetry, explicit `alignment_status` propagation, truth-harness/acceptance-matrix surfacing, soft divergence warning, `catalog_alignment_ok` acceptance invariant, step-level expected catalog-alignment assertions, and phase66 spec alignment expectations validated locally, but live replay for the new bridge is currently blocked by missing active 1C polling and broader unfamiliar 1C asks still need replay-backed growth
- graph snapshot after latest rebuild: `5951 nodes`, `12926 edges`, `139 communities`
- current breakpoint:
- the validated hot paths are no longer structurally broken;
@ -158,6 +159,7 @@ Latest live proof now includes:
- catalog-alignment divergence warning accepted locally: Python truth-harness/acceptance tests passed `5/5`; graphify rebuilt to `5947 nodes`, `12920 edges`, `138 communities`
- catalog-alignment acceptance invariant accepted locally: Python truth-harness/acceptance tests passed `6/6`; graphify rebuilt to `5949 nodes`, `12923 edges`, `136 communities`
- catalog-alignment spec assertions accepted locally: Python truth-harness/acceptance tests passed `7/7`; graphify rebuilt to `5951 nodes`, `12926 edges`, `139 communities`
- phase66 planner-alignment spec hardening accepted locally: Python truth-harness/acceptance tests passed `7/7`; `load_truth_harness_spec` confirmed expected top matches `[value_flow, value_flow, value_flow, value_flow_comparison, value_flow_comparison, value_flow_ranking, value_flow_ranking]`
Current architectural reading:

View File

@ -11,6 +11,9 @@
"title": "The user asks for incoming money without naming the organization yet",
"question": "Хочу быстрый денежный срез по одной организации без привязки к контрагенту. Сколько вообще входящих денег было за 2020 год?",
"allowed_reply_types": ["clarification_required", "partial_coverage"],
"expected_catalog_alignment_status": "selected_matches_top",
"expected_catalog_chain_top_match": "value_flow",
"expected_catalog_selected_matches_top": true,
"required_answer_patterns_all": [
"(?i)уточн|нужно",
"(?i)организац"
@ -23,6 +26,9 @@
"title": "The user selects the organization and gets the 2020 incoming total",
"question": "По ООО Альтернатива Плюс.",
"allowed_reply_types": ["factual", "factual_with_explanation", "partial_coverage"],
"expected_catalog_alignment_status": "selected_matches_top",
"expected_catalog_chain_top_match": "value_flow",
"expected_catalog_selected_matches_top": true,
"required_answer_patterns_all": [
"(?i)2020",
"(?i)входящ|поступлен|получ"
@ -40,6 +46,9 @@
"title": "The user broadens the same organization slice to all available time",
"question": "Понял, тогда за все время.",
"allowed_reply_types": ["factual", "factual_with_explanation", "partial_coverage"],
"expected_catalog_alignment_status": "selected_matches_top",
"expected_catalog_chain_top_match": "value_flow",
"expected_catalog_selected_matches_top": true,
"required_answer_patterns_all": [
"(?i)все доступное время|все время|весь период",
"(?i)входящ|поступлен|получ"
@ -58,6 +67,9 @@
"title": "The user asks which money direction is larger for the organization",
"question": "Хорошо. А что по ООО Альтернатива Плюс больше в 2020 году: входящие или исходящие деньги?",
"allowed_reply_types": ["factual", "factual_with_explanation", "partial_coverage"],
"expected_catalog_alignment_status": "selected_matches_top",
"expected_catalog_chain_top_match": "value_flow_comparison",
"expected_catalog_selected_matches_top": true,
"required_answer_patterns_all": [
"(?i)2020",
"(?i)входящ|исходящ|получ|заплат|больше"
@ -70,6 +82,9 @@
"title": "The user asks the same comparison for another year",
"question": "А что по ООО Альтернатива Плюс больше уже за 2021 год: входящие или исходящие деньги?",
"allowed_reply_types": ["factual", "factual_with_explanation", "partial_coverage"],
"expected_catalog_alignment_status": "selected_matches_top",
"expected_catalog_chain_top_match": "value_flow_comparison",
"expected_catalog_selected_matches_top": true,
"required_answer_patterns_all": [
"(?i)2021",
"(?i)входящ|исходящ|получ|заплат|больше"
@ -82,6 +97,9 @@
"title": "The user asks who brought the most money for the organization",
"question": "И кто больше всего принес денег этой организации в 2020 году?",
"allowed_reply_types": ["factual", "factual_with_explanation", "partial_coverage"],
"expected_catalog_alignment_status": "selected_matches_top",
"expected_catalog_chain_top_match": "value_flow_ranking",
"expected_catalog_selected_matches_top": true,
"required_answer_patterns_all": [
"(?i)2020",
"(?i)кто|контрагент|клиент|принес|доход"
@ -94,6 +112,9 @@
"title": "The user asks the same ranking for another year",
"question": "А в 2021 году?",
"allowed_reply_types": ["factual", "factual_with_explanation", "partial_coverage"],
"expected_catalog_alignment_status": "selected_matches_top",
"expected_catalog_chain_top_match": "value_flow_ranking",
"expected_catalog_selected_matches_top": true,
"required_answer_patterns_all": [
"(?i)2021"
],