Planner Autonomy: вывести catalog-alignment в replay artifacts

This commit is contained in:
dctouch 2026-05-01 15:38:06 +03:00
parent 8b5104a2c6
commit a63742f0d6
7 changed files with 89 additions and 3 deletions

View File

@ -121,6 +121,7 @@ The following consolidation step added catalog-level chain-template scoring:
- `catalog_chain_template_alignment` now records whether the selected chain is the top catalog match, its rank, and whether it appeared in the catalog search results; runtime loop state and debug summary expose the same verdict.
- planner reason codes now emit stable catalog-alignment telemetry for evaluated top-match, selected-equals-top, selected-lower-rank, selected-outside-match-set, and unscored selected-chain states.
- `catalog_chain_template_alignment.alignment_status` now carries the same verdict as one enum-like field, and debug summary exposes it as `mcp_discovery_catalog_chain_alignment_status`.
- `domain_truth_harness` and `scenario_acceptance_policy` now carry the alignment status, top catalog match, and selected-matches-top flag into replay artifacts instead of leaving them buried in raw debug JSON.
## Why This Matters
@ -251,9 +252,14 @@ Latest validation after explicit catalog-alignment status propagation:
- `npm.cmd run build`: passed
- graphify rebuild: `5943 nodes`, `12915 edges`, `136 communities`
Latest validation after truth-harness catalog-alignment artifact surfacing:
- Python replay-tooling tests: passed, `4 passed`
- graphify rebuild: `5946 nodes`, `12918 edges`, `136 communities`
## Next Step
The next safe step is still to re-run live replay once the 1C side is actively polling the proxy. In parallel, local-only consolidation can continue by using `alignment_status`, alignment reason-code telemetry, and the representative guard to find remaining manual branches where selected chains diverge from reviewed catalog-fabric intent.
The next safe step is still to re-run live replay once the 1C side is actively polling the proxy. In parallel, local-only consolidation can continue by using `alignment_status`, alignment reason-code telemetry, truth-harness artifact surfacing, and the representative guard to find remaining manual branches where selected chains diverge from reviewed catalog-fabric intent.
Recommended order:

View File

@ -84,6 +84,7 @@ It now documents a turnaround that is already operational in code, already mater
- planner/runtime/debug surfaces now expose `catalog_chain_template_alignment`, so semantic replay can see whether selected chains match the catalog top match, fall back to a lower-ranked template, or bypass catalog search;
- planner reason codes now also emit stable catalog-alignment telemetry, so automated replay review can filter top-match, lower-rank, outside-match, and unscored selected-chain states without hand-parsing debug JSON;
- catalog-alignment now carries a single `alignment_status` verdict through planner/runtime/debug, making replay divergence detection explicit instead of reconstructing it from booleans;
- truth-harness and scenario acceptance artifacts now preserve catalog-alignment status/top-match fields, so AGENT replay review can spot planner-vs-catalog divergence directly in `truth_review.md` and `scenario_acceptance_matrix.json`;
- explicit-counterparty incoming-vs-outgoing data-need graphs now select the reviewed `value_flow_comparison` chain instead of falling back to generic `value_flow`;
- live map sync: [20 - planner_autonomy_consolidation_2026-05-01.md](./20%20-%20planner_autonomy_consolidation_2026-05-01.md)
@ -96,8 +97,8 @@ Current honest status:
- open-world bounded-autonomy readiness: `~85%`
- Post-F semantic integrity module progress: `~99%` operationally closed, with remaining risk now treated as next-slice discovery rather than an open blocker inside the closed slice
- active inventory-stock breadth slice progress: `100%` for the declared scenario pack, not for arbitrary inventory questions
- Planner Autonomy Consolidation progress: `~87%` for the declared module, with catalog-fabric, value-flow arbitration, lifecycle bounded inference, broad-evaluation bridge, inventory catalog templates, inventory runtime-boundary honesty, exact inventory recipe bridging, unambiguous metadata-surface lane inference, catalog chain-template scoring, structured chain-match contract exposure, runtime/debug propagation, subject-aware bidirectional comparison arbitration, structured catalog-alignment verdicts, representative alignment regression guard, catalog-alignment reason-code telemetry, and explicit `alignment_status` propagation validated locally, but live replay for the new bridge is currently blocked by missing active 1C polling and broader unfamiliar 1C asks still need replay-backed growth
- graph snapshot after latest rebuild: `5943 nodes`, `12915 edges`, `136 communities`
- Planner Autonomy Consolidation progress: `~88%` for the declared module, with catalog-fabric, value-flow arbitration, lifecycle bounded inference, broad-evaluation bridge, inventory catalog templates, inventory runtime-boundary honesty, exact inventory recipe bridging, unambiguous metadata-surface lane inference, catalog chain-template scoring, structured chain-match contract exposure, runtime/debug propagation, subject-aware bidirectional comparison arbitration, structured catalog-alignment verdicts, representative alignment regression guard, catalog-alignment reason-code telemetry, explicit `alignment_status` propagation, and truth-harness/acceptance-matrix surfacing validated locally, but live replay for the new bridge is currently blocked by missing active 1C polling and broader unfamiliar 1C asks still need replay-backed growth
- graph snapshot after latest rebuild: `5946 nodes`, `12918 edges`, `136 communities`
- current breakpoint:
- the validated hot paths are no longer structurally broken;
- flagship continuity collapse is no longer the primary risk;
@ -150,6 +151,7 @@ Latest live proof now includes:
- representative catalog-alignment regression guard accepted locally: planner slice passed `37/37`; full MCP-discovery slice passed `283/283` with `9` skipped; build passed; graphify rebuilt to `5942 nodes`, `12912 edges`, `140 communities`
- catalog-alignment reason-code telemetry accepted locally: planner/runtime slice passed `53/53`; full MCP-discovery suite passed `283/283` with `9` skipped; build passed; graphify rebuilt to `5943 nodes`, `12915 edges`, `136 communities`
- catalog-alignment status verdict accepted locally: planner/runtime/debug slice passed `55/55`; full MCP-discovery suite passed `283/283` with `9` skipped; build passed; graphify rebuilt to `5943 nodes`, `12915 edges`, `136 communities`
- catalog-alignment replay artifact surfacing accepted locally: Python truth-harness/acceptance tests passed `4/4`; graphify rebuilt to `5946 nodes`, `12918 edges`, `136 communities`
Current architectural reading:

View File

@ -1727,6 +1727,9 @@ def build_scenario_step_state(
"selected_recipe": debug.get("selected_recipe"),
"capability_id": debug.get("capability_id"),
"capability_route_mode": debug.get("capability_route_mode"),
"mcp_discovery_catalog_chain_alignment_status": debug.get("mcp_discovery_catalog_chain_alignment_status"),
"mcp_discovery_catalog_chain_top_match": debug.get("mcp_discovery_catalog_chain_top_match"),
"mcp_discovery_catalog_chain_selected_matches_top": debug.get("mcp_discovery_catalog_chain_selected_matches_top"),
"route_expectation_status": debug.get("route_expectation_status"),
"result_mode": debug.get("result_mode"),
"response_type": debug.get("response_type"),

View File

@ -679,6 +679,9 @@ def build_truth_review_markdown(spec: dict[str, Any], scenario_state: dict[str,
f"intent: `{step_state.get('detected_intent') or 'n/a'}`",
f"recipe: `{step_state.get('selected_recipe') or 'n/a'}`",
f"capability: `{step_state.get('capability_id') or 'n/a'}`",
f"catalog_alignment_status: `{step_state.get('mcp_discovery_catalog_chain_alignment_status') or 'n/a'}`",
f"catalog_top_match: `{step_state.get('mcp_discovery_catalog_chain_top_match') or 'n/a'}`",
f"catalog_selected_matches_top: `{step_state.get('mcp_discovery_catalog_chain_selected_matches_top')}`",
f"limited_reason_category: `{step_state.get('limited_reason_category') or 'n/a'}`",
f"filters: `{dump_json(step_state.get('extracted_filters') or {})}`",
f"direct_answer: {step_state.get('actual_direct_answer') or 'n/a'}",

View File

@ -198,6 +198,9 @@ def build_scenario_acceptance_matrix(
"reply_type": step_state.get("reply_type"),
"detected_intent": step_state.get("detected_intent"),
"capability_id": step_state.get("capability_id"),
"mcp_discovery_catalog_chain_alignment_status": step_state.get("mcp_discovery_catalog_chain_alignment_status"),
"mcp_discovery_catalog_chain_top_match": step_state.get("mcp_discovery_catalog_chain_top_match"),
"mcp_discovery_catalog_chain_selected_matches_top": step_state.get("mcp_discovery_catalog_chain_selected_matches_top"),
"selected_object_step": _has_selected_object_signal(step),
"meta_context_step": _has_meta_context_signal(step),
"highest_unresolved_priority": highest_priority,
@ -330,6 +333,9 @@ def build_scenario_acceptance_matrix_markdown(acceptance_matrix: dict[str, Any])
f" review_status: `{row.get('review_status')}`",
f" criticality: `{row.get('criticality')}`",
f" semantic_tags: {', '.join(row.get('semantic_tags') or []) or 'none'}",
f" catalog_alignment_status: `{row.get('mcp_discovery_catalog_chain_alignment_status') or 'n/a'}`",
f" catalog_top_match: `{row.get('mcp_discovery_catalog_chain_top_match') or 'n/a'}`",
f" catalog_selected_matches_top: `{row.get('mcp_discovery_catalog_chain_selected_matches_top')}`",
f" highest_unresolved_priority: `{row.get('highest_unresolved_priority')}`",
f" selected_object_step: `{row.get('selected_object_step')}`",
f" meta_context_step: `{row.get('meta_context_step')}`",

View File

@ -0,0 +1,54 @@
from __future__ import annotations
import sys
import unittest
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent))
import domain_case_loop as dcl
class DomainCaseLoopStepStateTests(unittest.TestCase):
def test_preserves_mcp_catalog_alignment_debug_fields(self) -> None:
step_state = dcl.build_scenario_step_state(
scenario_id="planner_alignment_demo",
domain="planner_autonomy",
step={
"step_id": "step_01",
"title": "Alignment visibility",
"depends_on": [],
"question_template": "show planner alignment",
},
step_index=1,
question_resolved="show planner alignment",
analysis_context={},
turn_artifact={
"assistant_message": {
"reply_type": "factual",
"text": "Confirmed answer",
"message_id": "msg-1",
"trace_id": "trace-1",
},
"technical_debug_payload": {
"detected_mode": "address_query",
"detected_intent": "counterparty_turnover",
"selected_recipe": "counterparty_turnover_by_period",
"capability_id": "confirmed_counterparty_turnover",
"mcp_discovery_catalog_chain_alignment_status": "selected_matches_top",
"mcp_discovery_catalog_chain_top_match": "value_flow",
"mcp_discovery_catalog_chain_selected_matches_top": True,
},
"session_summary": {},
},
entries=[],
)
self.assertEqual(step_state["mcp_discovery_catalog_chain_alignment_status"], "selected_matches_top")
self.assertEqual(step_state["mcp_discovery_catalog_chain_top_match"], "value_flow")
self.assertTrue(step_state["mcp_discovery_catalog_chain_selected_matches_top"])
if __name__ == "__main__":
unittest.main()

View File

@ -84,6 +84,9 @@ class ScenarioAcceptancePolicyTests(unittest.TestCase):
"reply_type": "factual",
"detected_intent": "inventory_on_hand_as_of_date",
"capability_id": "confirmed_inventory_on_hand_as_of_date",
"mcp_discovery_catalog_chain_alignment_status": "selected_matches_top",
"mcp_discovery_catalog_chain_top_match": "inventory_stock_snapshot",
"mcp_discovery_catalog_chain_selected_matches_top": True,
"review_findings": [],
}
},
@ -104,6 +107,15 @@ class ScenarioAcceptancePolicyTests(unittest.TestCase):
self.assertTrue(pack_state["acceptance_gate_passed"])
self.assertTrue(pack_state["critical_path_green"])
self.assertTrue(all(pack_state["invariants"].values()))
self.assertEqual(
acceptance_matrix["rows"][0]["mcp_discovery_catalog_chain_alignment_status"],
"selected_matches_top",
)
self.assertEqual(
acceptance_matrix["rows"][0]["mcp_discovery_catalog_chain_top_match"],
"inventory_stock_snapshot",
)
self.assertTrue(acceptance_matrix["rows"][0]["mcp_discovery_catalog_chain_selected_matches_top"])
def test_flags_meta_context_integrity_when_meta_step_leaks_technical_answer_shape(self) -> None:
spec = {