Planner Autonomy: добавить invariant catalog-alignment
This commit is contained in:
parent
91529d897d
commit
e58a9664e0
|
|
@ -123,6 +123,7 @@ The following consolidation step added catalog-level chain-template scoring:
|
|||
- `catalog_chain_template_alignment.alignment_status` now carries the same verdict as one enum-like field, and debug summary exposes it as `mcp_discovery_catalog_chain_alignment_status`.
|
||||
- `domain_truth_harness` and `scenario_acceptance_policy` now carry the alignment status, top catalog match, and selected-matches-top flag into replay artifacts instead of leaving them buried in raw debug JSON.
|
||||
- truth-harness now raises a warning finding for `selected_lower_rank` and `selected_outside_match_set` alignment states unless the replay spec explicitly marks `allow_catalog_alignment_divergence`.
|
||||
- scenario acceptance now groups that warning under `catalog_alignment_ok`, and `final_status.md` prints the invariant alongside direct-answer, temporal, truth-gate, human-answer, meta-context, and selected-object gates.
|
||||
|
||||
## Why This Matters
|
||||
|
||||
|
|
@ -263,9 +264,14 @@ Latest validation after catalog-alignment divergence warning gate:
|
|||
- Python replay-tooling tests: passed, `5 passed`
|
||||
- graphify rebuild: `5947 nodes`, `12920 edges`, `138 communities`
|
||||
|
||||
Latest validation after catalog-alignment acceptance invariant:
|
||||
|
||||
- Python replay-tooling tests: passed, `6 passed`
|
||||
- graphify rebuild: `5949 nodes`, `12923 edges`, `136 communities`
|
||||
|
||||
## Next Step
|
||||
|
||||
The next safe step is still to re-run live replay once the 1C side is actively polling the proxy. In parallel, local-only consolidation can continue by using `alignment_status`, alignment reason-code telemetry, truth-harness artifact surfacing, the soft divergence warning, and the representative guard to find remaining manual branches where selected chains diverge from reviewed catalog-fabric intent.
|
||||
The next safe step is still to re-run live replay once the 1C side is actively polling the proxy. In parallel, local-only consolidation can continue by using `alignment_status`, alignment reason-code telemetry, truth-harness artifact surfacing, the soft divergence warning, `catalog_alignment_ok`, and the representative guard to find remaining manual branches where selected chains diverge from reviewed catalog-fabric intent.
|
||||
|
||||
Recommended order:
|
||||
|
||||
|
|
|
|||
|
|
@ -86,6 +86,7 @@ It now documents a turnaround that is already operational in code, already mater
|
|||
- catalog-alignment now carries a single `alignment_status` verdict through planner/runtime/debug, making replay divergence detection explicit instead of reconstructing it from booleans;
|
||||
- truth-harness and scenario acceptance artifacts now preserve catalog-alignment status/top-match fields, so AGENT replay review can spot planner-vs-catalog divergence directly in `truth_review.md` and `scenario_acceptance_matrix.json`;
|
||||
- truth-harness now emits a warning finding when selected chains fall below or outside the reviewed catalog top match, unless a spec explicitly allows that divergence;
|
||||
- scenario acceptance now exposes `catalog_alignment_ok`, so planner-vs-catalog divergence is a first-class acceptance invariant instead of an ungrouped warning;
|
||||
- explicit-counterparty incoming-vs-outgoing data-need graphs now select the reviewed `value_flow_comparison` chain instead of falling back to generic `value_flow`;
|
||||
- live map sync: [20 - planner_autonomy_consolidation_2026-05-01.md](./20%20-%20planner_autonomy_consolidation_2026-05-01.md)
|
||||
|
||||
|
|
@ -98,8 +99,8 @@ Current honest status:
|
|||
- open-world bounded-autonomy readiness: `~85%`
|
||||
- Post-F semantic integrity module progress: `~99%` operationally closed, with remaining risk now treated as next-slice discovery rather than an open blocker inside the closed slice
|
||||
- active inventory-stock breadth slice progress: `100%` for the declared scenario pack, not for arbitrary inventory questions
|
||||
- Planner Autonomy Consolidation progress: `~89%` for the declared module, with catalog-fabric, value-flow arbitration, lifecycle bounded inference, broad-evaluation bridge, inventory catalog templates, inventory runtime-boundary honesty, exact inventory recipe bridging, unambiguous metadata-surface lane inference, catalog chain-template scoring, structured chain-match contract exposure, runtime/debug propagation, subject-aware bidirectional comparison arbitration, structured catalog-alignment verdicts, representative alignment regression guard, catalog-alignment reason-code telemetry, explicit `alignment_status` propagation, truth-harness/acceptance-matrix surfacing, and soft divergence warning validated locally, but live replay for the new bridge is currently blocked by missing active 1C polling and broader unfamiliar 1C asks still need replay-backed growth
|
||||
- graph snapshot after latest rebuild: `5947 nodes`, `12920 edges`, `138 communities`
|
||||
- Planner Autonomy Consolidation progress: `~90%` for the declared module, with catalog-fabric, value-flow arbitration, lifecycle bounded inference, broad-evaluation bridge, inventory catalog templates, inventory runtime-boundary honesty, exact inventory recipe bridging, unambiguous metadata-surface lane inference, catalog chain-template scoring, structured chain-match contract exposure, runtime/debug propagation, subject-aware bidirectional comparison arbitration, structured catalog-alignment verdicts, representative alignment regression guard, catalog-alignment reason-code telemetry, explicit `alignment_status` propagation, truth-harness/acceptance-matrix surfacing, soft divergence warning, and `catalog_alignment_ok` acceptance invariant validated locally, but live replay for the new bridge is currently blocked by missing active 1C polling and broader unfamiliar 1C asks still need replay-backed growth
|
||||
- graph snapshot after latest rebuild: `5949 nodes`, `12923 edges`, `136 communities`
|
||||
- current breakpoint:
|
||||
- the validated hot paths are no longer structurally broken;
|
||||
- flagship continuity collapse is no longer the primary risk;
|
||||
|
|
@ -154,6 +155,7 @@ Latest live proof now includes:
|
|||
- catalog-alignment status verdict accepted locally: planner/runtime/debug slice passed `55/55`; full MCP-discovery suite passed `283/283` with `9` skipped; build passed; graphify rebuilt to `5943 nodes`, `12915 edges`, `136 communities`
|
||||
- catalog-alignment replay artifact surfacing accepted locally: Python truth-harness/acceptance tests passed `4/4`; graphify rebuilt to `5946 nodes`, `12918 edges`, `136 communities`
|
||||
- catalog-alignment divergence warning accepted locally: Python truth-harness/acceptance tests passed `5/5`; graphify rebuilt to `5947 nodes`, `12920 edges`, `138 communities`
|
||||
- catalog-alignment acceptance invariant accepted locally: Python truth-harness/acceptance tests passed `6/6`; graphify rebuilt to `5949 nodes`, `12923 edges`, `136 communities`
|
||||
|
||||
Current architectural reading:
|
||||
|
||||
|
|
|
|||
|
|
@ -144,6 +144,10 @@ def _is_meta_context_code(code: str) -> bool:
|
|||
)
|
||||
|
||||
|
||||
def _is_catalog_alignment_code(code: str) -> bool:
|
||||
return code == "catalog_alignment_divergence"
|
||||
|
||||
|
||||
def _derive_step_invariant_failures(step: dict[str, Any], findings: list[dict[str, Any]]) -> dict[str, bool]:
|
||||
codes = [str(item.get("code") or "").strip() for item in findings]
|
||||
selected_object_step = _has_selected_object_signal(step)
|
||||
|
|
@ -155,6 +159,7 @@ def _derive_step_invariant_failures(step: dict[str, Any], findings: list[dict[st
|
|||
"truth_gate": any(_is_truth_gate_code(code) for code in codes),
|
||||
"human_answer_quality": any(_is_human_answer_quality_code(code) for code in codes),
|
||||
"meta_context_integrity": meta_context_step and any(_is_meta_context_code(code) for code in codes),
|
||||
"catalog_alignment": any(_is_catalog_alignment_code(code) for code in codes),
|
||||
}
|
||||
|
||||
|
||||
|
|
@ -171,6 +176,7 @@ def build_scenario_acceptance_matrix(
|
|||
"truth_gate": 0,
|
||||
"human_answer_quality": 0,
|
||||
"meta_context_integrity": 0,
|
||||
"catalog_alignment": 0,
|
||||
}
|
||||
|
||||
for index, step in enumerate(spec.get("steps") or [], start=1):
|
||||
|
|
@ -217,6 +223,7 @@ def build_scenario_acceptance_matrix(
|
|||
"truth_gate_ok": invariant_failure_counts["truth_gate"] == 0,
|
||||
"human_answer_quality_ok": invariant_failure_counts["human_answer_quality"] == 0,
|
||||
"meta_context_integrity_ok": invariant_failure_counts["meta_context_integrity"] == 0,
|
||||
"catalog_alignment_ok": invariant_failure_counts["catalog_alignment"] == 0,
|
||||
}
|
||||
critical_rows = [row for row in rows if row["criticality"] == "critical"]
|
||||
critical_path_green = bool(critical_rows) and all(row["review_status"] == "pass" for row in critical_rows)
|
||||
|
|
@ -323,6 +330,7 @@ def build_scenario_acceptance_matrix_markdown(acceptance_matrix: dict[str, Any])
|
|||
f"- truth_gate_ok: `{invariants.get('truth_gate_ok')}`",
|
||||
f"- human_answer_quality_ok: `{invariants.get('human_answer_quality_ok')}`",
|
||||
f"- meta_context_integrity_ok: `{invariants.get('meta_context_integrity_ok')}`",
|
||||
f"- catalog_alignment_ok: `{invariants.get('catalog_alignment_ok')}`",
|
||||
"",
|
||||
"## Steps",
|
||||
]
|
||||
|
|
@ -359,4 +367,5 @@ def build_truth_harness_final_status_markdown(pack_state: dict[str, Any]) -> str
|
|||
f"- truth_gate_ok: `{invariants.get('truth_gate_ok')}`\n"
|
||||
f"- human_answer_quality_ok: `{invariants.get('human_answer_quality_ok')}`\n"
|
||||
f"- meta_context_integrity_ok: `{invariants.get('meta_context_integrity_ok')}`\n"
|
||||
f"- catalog_alignment_ok: `{invariants.get('catalog_alignment_ok')}`\n"
|
||||
)
|
||||
|
|
|
|||
|
|
@ -163,6 +163,55 @@ class ScenarioAcceptancePolicyTests(unittest.TestCase):
|
|||
self.assertTrue(row["meta_context_step"])
|
||||
self.assertIn("meta_context_integrity", row["invariant_failures"])
|
||||
|
||||
def test_flags_catalog_alignment_invariant_when_planner_diverges_from_catalog_top(self) -> None:
|
||||
spec = {
|
||||
"scenario_id": "demo_planner_alignment",
|
||||
"domain": "planner_autonomy",
|
||||
"title": "Planner alignment",
|
||||
"steps": [
|
||||
{
|
||||
"step_id": "step_01",
|
||||
"title": "Catalog alignment",
|
||||
"question_template": "проверь цепочку MCP",
|
||||
"criticality": "critical",
|
||||
"semantic_tags": ["planner_alignment"],
|
||||
}
|
||||
],
|
||||
}
|
||||
scenario_state = {
|
||||
"session_id": "asst-align",
|
||||
"step_outputs": {
|
||||
"step_01": {
|
||||
"review_status": "warning",
|
||||
"reply_type": "factual",
|
||||
"detected_intent": "counterparty_turnover",
|
||||
"capability_id": "confirmed_counterparty_turnover",
|
||||
"mcp_discovery_catalog_chain_alignment_status": "selected_outside_match_set",
|
||||
"mcp_discovery_catalog_chain_top_match": "value_flow_comparison",
|
||||
"mcp_discovery_catalog_chain_selected_matches_top": False,
|
||||
"review_findings": [
|
||||
{"code": "catalog_alignment_divergence", "severity": "warning"},
|
||||
],
|
||||
}
|
||||
},
|
||||
}
|
||||
review_summary = {
|
||||
"review_source": "live_strict_replay",
|
||||
"overall_status": "warning",
|
||||
"steps_total": 1,
|
||||
"steps_passed": 0,
|
||||
"steps_with_warning": 1,
|
||||
"steps_failed": 0,
|
||||
}
|
||||
|
||||
acceptance_matrix = sap.build_scenario_acceptance_matrix(spec, scenario_state, review_summary)
|
||||
pack_state = sap.derive_truth_harness_pack_state(spec, scenario_state, review_summary, acceptance_matrix)
|
||||
|
||||
self.assertEqual(pack_state["final_status"], "partial")
|
||||
self.assertFalse(pack_state["invariants"]["catalog_alignment_ok"])
|
||||
self.assertEqual(pack_state["unresolved_p1_count"], 1)
|
||||
self.assertIn("catalog_alignment", acceptance_matrix["rows"][0]["invariant_failures"])
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
|
|
|
|||
Loading…
Reference in New Issue