Planner Autonomy: собрать mixed AGENT-прогон для catalog-alignment

This commit is contained in:
dctouch 2026-05-01 16:46:36 +03:00
parent e13068cf4d
commit f6846206ad
7 changed files with 1953 additions and 115 deletions

View File

@ -128,6 +128,7 @@ The following consolidation step added catalog-level chain-template scoring:
- `address_truth_harness_phase66_human_org_open_scope_dialog.json` now uses those fields to assert `value_flow`, `value_flow_comparison`, and `value_flow_ranking` top matches across the open-organization money dialog.
- `address_truth_harness_phase32_planner_selected_chain_end_to_end.json` now uses the same assertions across selected-counterparty entity grounding, incoming/outgoing/net value-flow, document evidence, and movement evidence follow-ups.
- `agent_semantic_pack_builder` now preserves these expected catalog-alignment fields in the reusable source catalog and adds the `planner_catalog_alignment` tag, so future mixed AGENT packs can deliberately select planner-brain regression probes instead of relying on hand-picked replay filenames.
- The new `turnaround_11_planner_brain_alignment_mix` builder recipe generates `address_truth_harness_phase83_planner_brain_alignment_mix.json`, a 20-step mixed canary that crosses selected-counterparty value-flow, open-organization totals/comparison/ranking, broad-evaluation continuity, metadata drilldown, and off-domain living-chat safety.
## Why This Matters
@ -289,9 +290,16 @@ Latest validation after phase32 catalog-alignment spec hardening and AGENT sourc
- `load_truth_harness_spec` confirmed the phase32 expected top-match chain sequence: `entity_resolution`, `value_flow`, `value_flow`, `value_flow_comparison`, `document_evidence`, `movement_evidence`
- `agent_semantic_pack_builder.py inventory` regenerated `agent_semantic_source_catalog.*` with reusable `planner_catalog_alignment` coverage
Latest validation after phase83 mixed planner-brain spec generation:
- `scripts.test_agent_semantic_pack_builder`: passed, `3 passed`
- generated `address_truth_harness_phase83_planner_brain_alignment_mix.json`: `20` steps, `13` expected catalog top-match checks
- regenerated `agent_semantic_source_catalog.*`: `planner_catalog_alignment` is visible with `26` reusable entries, including phase32, phase66, and phase83 probes
- graphify rebuild: `5952 nodes`, `12927 edges`, `138 communities`
## Next Step
The next safe step is still to re-run live replay once the 1C side is actively polling the proxy. In parallel, local-only consolidation can continue by using the regenerated AGENT source catalog to assemble mixed planner-brain canaries, hardening additional planner-autonomy specs with expected catalog-chain assertions, and using `alignment_status`, alignment reason-code telemetry, truth-harness artifact surfacing, the soft divergence warning, `catalog_alignment_ok`, and the representative guard to find remaining manual branches where selected chains diverge from reviewed catalog-fabric intent.
The next safe step is still to re-run live replay once the 1C side is actively polling the proxy. The first live replay candidate should be `address_truth_harness_phase83_planner_brain_alignment_mix.json`; only after it is executed, reviewed semantically, fixed/rerun if needed, and accepted should it be saved into autoruns as a legacy AGENT pack. In parallel, local-only consolidation can continue by hardening additional planner-autonomy specs with expected catalog-chain assertions and using `alignment_status`, alignment reason-code telemetry, truth-harness artifact surfacing, the soft divergence warning, `catalog_alignment_ok`, and the representative guard to find remaining manual branches where selected chains diverge from reviewed catalog-fabric intent.
Recommended order:

View File

@ -91,6 +91,7 @@ It now documents a turnaround that is already operational in code, already mater
- the phase66 open-scope money dialog spec now asserts expected catalog-chain top matches across value-flow totals, bidirectional comparison, and ranking follow-ups;
- the phase32 selected-counterparty chain spec now asserts expected catalog-chain top matches across entity grounding, incoming/outgoing/net value-flow, document evidence, and movement evidence follow-ups;
- AGENT semantic source catalog generation now preserves expected catalog-alignment fields and tags reusable steps as `planner_catalog_alignment`, so mixed pack construction can find planner-brain regression probes explicitly;
- phase83 planner-brain mixed replay spec is now generated from the AGENT source catalog and interleaves selected-counterparty catalog alignment, open-organization money flow/ranking, broad-evaluation continuity, metadata drilldown, and off-domain living-chat safety;
- explicit-counterparty incoming-vs-outgoing data-need graphs now select the reviewed `value_flow_comparison` chain instead of falling back to generic `value_flow`;
- live map sync: [20 - planner_autonomy_consolidation_2026-05-01.md](./20%20-%20planner_autonomy_consolidation_2026-05-01.md)
@ -103,8 +104,8 @@ Current honest status:
- open-world bounded-autonomy readiness: `~85%`
- Post-F semantic integrity module progress: `~99%` operationally closed, with remaining risk now treated as next-slice discovery rather than an open blocker inside the closed slice
- active inventory-stock breadth slice progress: `100%` for the declared scenario pack, not for arbitrary inventory questions
- Planner Autonomy Consolidation progress: `~93%` for the declared module, with catalog-fabric, value-flow arbitration, lifecycle bounded inference, broad-evaluation bridge, inventory catalog templates, inventory runtime-boundary honesty, exact inventory recipe bridging, unambiguous metadata-surface lane inference, catalog chain-template scoring, structured chain-match contract exposure, runtime/debug propagation, subject-aware bidirectional comparison arbitration, structured catalog-alignment verdicts, representative alignment regression guard, catalog-alignment reason-code telemetry, explicit `alignment_status` propagation, truth-harness/acceptance-matrix surfacing, soft divergence warning, `catalog_alignment_ok` acceptance invariant, step-level expected catalog-alignment assertions, phase66 and phase32 spec alignment expectations, and AGENT source-catalog surfacing validated locally, but live replay for the new bridge is currently blocked by missing active 1C polling and broader unfamiliar 1C asks still need replay-backed growth
- graph snapshot after latest rebuild: `5951 nodes`, `12926 edges`, `139 communities`
- Planner Autonomy Consolidation progress: `~94%` for the declared module, with catalog-fabric, value-flow arbitration, lifecycle bounded inference, broad-evaluation bridge, inventory catalog templates, inventory runtime-boundary honesty, exact inventory recipe bridging, unambiguous metadata-surface lane inference, catalog chain-template scoring, structured chain-match contract exposure, runtime/debug propagation, subject-aware bidirectional comparison arbitration, structured catalog-alignment verdicts, representative alignment regression guard, catalog-alignment reason-code telemetry, explicit `alignment_status` propagation, truth-harness/acceptance-matrix surfacing, soft divergence warning, `catalog_alignment_ok` acceptance invariant, step-level expected catalog-alignment assertions, phase66 and phase32 spec alignment expectations, AGENT source-catalog surfacing, and generated phase83 mixed planner-brain replay spec validated locally, but live replay for the new bridge is currently blocked by missing active 1C polling and broader unfamiliar 1C asks still need replay-backed growth
- graph snapshot after latest rebuild: `5952 nodes`, `12927 edges`, `138 communities`
- current breakpoint:
- the validated hot paths are no longer structurally broken;
- flagship continuity collapse is no longer the primary risk;
@ -163,6 +164,7 @@ Latest live proof now includes:
- catalog-alignment spec assertions accepted locally: Python truth-harness/acceptance tests passed `7/7`; graphify rebuilt to `5951 nodes`, `12926 edges`, `139 communities`
- phase66 planner-alignment spec hardening accepted locally: Python truth-harness/acceptance tests passed `7/7`; `load_truth_harness_spec` confirmed expected top matches `[value_flow, value_flow, value_flow, value_flow_comparison, value_flow_comparison, value_flow_ranking, value_flow_ranking]`
- phase32 selected-counterparty planner-alignment spec hardening and AGENT source-catalog surfacing accepted locally: Python replay-tooling tests passed `9/9`; `load_truth_harness_spec` confirmed expected top matches `[entity_resolution, value_flow, value_flow, value_flow_comparison, document_evidence, movement_evidence]`; regenerated source catalog exposes `planner_catalog_alignment` as a reusable tag
- phase83 mixed planner-brain spec generation accepted locally: Python replay-tooling tests passed `10/10`; generated spec has `20` steps and `13` expected catalog top-match checks; regenerated source catalog exposes `planner_catalog_alignment` with `26` reusable entries; graphify rebuilt to `5952 nodes`, `12927 edges`, `138 communities`
Current architectural reading:

View File

@ -0,0 +1,587 @@
{
"schema_version": "domain_truth_harness_spec_v1",
"scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix",
"domain": "planner_autonomy_consolidation",
"title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity",
"description": "Mixed AGENT replay for Planner Autonomy Consolidation. The pack interleaves selected-counterparty catalog-alignment probes, open-organization money flow, ranking, broad-evaluation continuity, metadata drilldown, and off-domain living-chat safety.",
"bindings": {},
"steps": [
{
"step_id": "step_01_human_smalltalk_sanity",
"title": "Human smalltalk remains living chat and does not expose discovery internals",
"question": "привет, ты на связи?",
"required_answer_patterns_any": [
"(?i)привет|на связи|готов|помочь"
],
"forbidden_answer_patterns": [
"(?i)mcp",
"(?i)runtime_",
"(?i)query_documents",
"(?i)primitive"
],
"criticality": "info",
"semantic_tags": [
"human_answer",
"mcp_discovery_gate_sanity",
"meta_smalltalk"
],
"notes": "[mixed_pack_slot=slot_01_smalltalk_sanity source=address_truth_harness_phase19_mcp_discovery_response_gate:step_01_human_smalltalk_sanity]"
},
{
"step_id": "step_01_resolve_counterparty_alias",
"title": "Entity resolution grounds the checked 1C counterparty from a loose alias",
"question": "найди в 1С контрагента СВК",
"allowed_reply_types": [
"factual",
"factual_with_explanation",
"partial_coverage"
],
"expected_catalog_alignment_status": "selected_matches_top",
"expected_catalog_chain_top_match": "entity_resolution",
"expected_catalog_selected_matches_top": true,
"required_answer_patterns_all": [
"(?i)свк",
"(?i)контрагент"
],
"required_answer_patterns_any": [
"(?i)группа\\s+свк",
"(?i)каталог",
"(?i)найден",
"(?i)наиболее вероятн"
],
"forbidden_answer_patterns": [
"(?i)получили",
"(?i)заплатили",
"(?i)нетто",
"(?i)оборот",
"(?i)выручк",
"(?i)сумм(а|ы)"
],
"criticality": "critical",
"semantic_tags": [
"entity_resolution",
"alias_grounding",
"followup_anchor",
"planner_catalog_alignment"
],
"notes": "[mixed_pack_slot=slot_02_counterparty_grounding source=address_truth_harness_phase32_planner_selected_chain_end_to_end:step_01_resolve_counterparty_alias]"
},
{
"step_id": "step_02_incoming_by_resolved_entity",
"title": "Incoming value-flow follow-up reuses the resolved counterparty anchor",
"question": "сколько получили по нему за 2020 год",
"allowed_reply_types": [
"factual_with_explanation",
"partial_coverage"
],
"expected_catalog_alignment_status": "selected_matches_top",
"expected_catalog_chain_top_match": "value_flow",
"expected_catalog_selected_matches_top": true,
"required_answer_patterns_all": [
"(?i)2020",
"(?i)получил|входящ|поступ",
"(?i)руб"
],
"required_answer_patterns_any": [
"(?i)группа\\s+свк",
"(?i)свк"
],
"forbidden_answer_patterns": [
"(?i)не найден контрагент",
"(?i)уточните, какого контрагента",
"(?i)по какому контрагенту"
],
"criticality": "critical",
"semantic_tags": [
"entity_resolution",
"incoming_value_flow",
"followup_reuse",
"planner_catalog_alignment"
],
"notes": "[mixed_pack_slot=slot_03_counterparty_incoming source=address_truth_harness_phase32_planner_selected_chain_end_to_end:step_02_incoming_by_resolved_entity]"
},
{
"step_id": "step_03_payout_switch_by_resolved_entity",
"title": "Outgoing payment follow-up keeps the same grounded counterparty and checked year",
"question": "а теперь сколько заплатили?",
"allowed_reply_types": [
"factual_with_explanation",
"partial_coverage"
],
"expected_catalog_alignment_status": "selected_matches_top",
"expected_catalog_chain_top_match": "value_flow",
"expected_catalog_selected_matches_top": true,
"required_answer_patterns_all": [
"(?i)2020",
"(?i)заплатил|исходящ|списан|платеж",
"(?i)руб"
],
"required_answer_patterns_any": [
"(?i)группа\\s+свк",
"(?i)свк"
],
"forbidden_answer_patterns": [
"(?i)не найден контрагент",
"(?i)уточните, какого контрагента",
"(?i)по какому контрагенту",
"(?i)за какой год"
],
"criticality": "critical",
"semantic_tags": [
"entity_resolution",
"payout_switch",
"followup_reuse",
"date_carryover",
"planner_catalog_alignment"
],
"notes": "[mixed_pack_slot=slot_04_counterparty_payout source=address_truth_harness_phase32_planner_selected_chain_end_to_end:step_03_payout_switch_by_resolved_entity]"
},
{
"step_id": "step_04_net_after_payout",
"title": "Net-flow follow-up reuses the same grounded counterparty and checked year after payout",
"question": "а какое нетто?",
"allowed_reply_types": [
"factual_with_explanation",
"partial_coverage"
],
"expected_catalog_alignment_status": "selected_matches_top",
"expected_catalog_chain_top_match": "value_flow_comparison",
"expected_catalog_selected_matches_top": true,
"required_answer_patterns_all": [
"(?i)2020",
"(?i)нетто|сальдо",
"(?i)руб"
],
"required_answer_patterns_any": [
"(?i)получ",
"(?i)заплат",
"(?i)группа\\s+свк",
"(?i)свк"
],
"forbidden_answer_patterns": [
"(?i)не найден контрагент",
"(?i)уточните, какого контрагента",
"(?i)по какому контрагенту"
],
"criticality": "critical",
"semantic_tags": [
"entity_resolution",
"net_value_flow",
"followup_reuse",
"planner_catalog_alignment"
],
"notes": "[mixed_pack_slot=slot_05_counterparty_net source=address_truth_harness_phase32_planner_selected_chain_end_to_end:step_04_net_after_payout]"
},
{
"step_id": "step_05_documents_after_net",
"title": "Document evidence follow-up keeps the grounded counterparty after the net answer",
"question": "а по документам?",
"allowed_reply_types": [
"factual",
"factual_with_explanation",
"partial_coverage"
],
"expected_catalog_alignment_status": "selected_matches_top",
"expected_catalog_chain_top_match": "document_evidence",
"expected_catalog_selected_matches_top": true,
"required_answer_patterns_all": [
"(?i)документ|счет|накладн|акт"
],
"required_answer_patterns_any": [
"(?i)группа\\s+свк",
"(?i)свк",
"(?i)2020"
],
"forbidden_answer_patterns": [
"(?i)не найден контрагент",
"(?i)уточните, какого контрагента",
"(?i)по какому контрагенту",
"(?i)сколько получили",
"(?i)сколько заплатили",
"(?i)нетто"
],
"criticality": "critical",
"semantic_tags": [
"entity_resolution",
"document_evidence",
"value_flow_pivot",
"followup_reuse",
"planner_catalog_alignment"
],
"notes": "[mixed_pack_slot=slot_06_counterparty_documents source=address_truth_harness_phase32_planner_selected_chain_end_to_end:step_05_documents_after_net]"
},
{
"step_id": "step_06_movements_after_documents",
"title": "Movement evidence follow-up keeps the grounded counterparty after the document answer",
"question": "а по движениям?",
"allowed_reply_types": [
"factual",
"factual_with_explanation",
"partial_coverage"
],
"expected_catalog_alignment_status": "selected_matches_top",
"expected_catalog_chain_top_match": "movement_evidence",
"expected_catalog_selected_matches_top": true,
"required_answer_patterns_all": [
"(?i)движени|операц|платеж|списан|поступ"
],
"required_answer_patterns_any": [
"(?i)группа\\s+свк",
"(?i)свк",
"(?i)2020"
],
"forbidden_answer_patterns": [
"(?i)не найден контрагент",
"(?i)уточните, какого контрагента",
"(?i)по какому контрагенту",
"(?i)сколько получили",
"(?i)сколько заплатили",
"(?i)нетто"
],
"criticality": "critical",
"semantic_tags": [
"entity_resolution",
"movement_evidence",
"document_pivot",
"followup_reuse",
"planner_catalog_alignment"
],
"notes": "[mixed_pack_slot=slot_07_counterparty_movements source=address_truth_harness_phase32_planner_selected_chain_end_to_end:step_06_movements_after_documents]"
},
{
"step_id": "step_01_open_scope_incoming_total",
"title": "The user asks for incoming money without naming the organization yet",
"question": "Хочу быстрый денежный срез по одной организации без привязки к контрагенту. Сколько вообще входящих денег было за 2020 год?",
"allowed_reply_types": [
"clarification_required",
"partial_coverage"
],
"expected_catalog_alignment_status": "selected_matches_top",
"expected_catalog_chain_top_match": "value_flow",
"expected_catalog_selected_matches_top": true,
"required_answer_patterns_all": [
"(?i)уточн|нужно",
"(?i)организац"
],
"criticality": "critical",
"semantic_tags": [
"open_scope_total",
"organization_scope",
"human_dialog",
"planner_catalog_alignment"
],
"notes": "[mixed_pack_slot=slot_08_open_org_total source=address_truth_harness_phase66_human_org_open_scope_dialog:step_01_open_scope_incoming_total]"
},
{
"step_id": "step_02_all_time_same_open_scope",
"title": "The user selects the organization and gets the 2020 incoming total",
"question": "По ООО Альтернатива Плюс.",
"allowed_reply_types": [
"factual",
"factual_with_explanation",
"partial_coverage"
],
"expected_catalog_alignment_status": "selected_matches_top",
"expected_catalog_chain_top_match": "value_flow",
"expected_catalog_selected_matches_top": true,
"required_answer_patterns_all": [
"(?i)2020",
"(?i)входящ|поступлен|получ"
],
"forbidden_answer_patterns": [
"(?i)уточните .*контрагент",
"(?i)не найден контрагент",
"(?i)уточните .*организац"
],
"criticality": "critical",
"semantic_tags": [
"organization_clarification",
"open_scope_total",
"human_dialog",
"planner_catalog_alignment"
],
"notes": "[mixed_pack_slot=slot_09_open_org_selection source=address_truth_harness_phase66_human_org_open_scope_dialog:step_02_all_time_same_open_scope]"
},
{
"step_id": "step_03_all_time_same_open_scope",
"title": "The user broadens the same organization slice to all available time",
"question": "Понял, тогда за все время.",
"allowed_reply_types": [
"factual",
"factual_with_explanation",
"partial_coverage"
],
"expected_catalog_alignment_status": "selected_matches_top",
"expected_catalog_chain_top_match": "value_flow",
"expected_catalog_selected_matches_top": true,
"required_answer_patterns_all": [
"(?i)все доступное время|все время|весь период",
"(?i)входящ|поступлен|получ"
],
"forbidden_answer_patterns": [
"(?i)за 2020",
"(?i)уточните .*контрагент",
"(?i)уточните .*период",
"(?i)уточните .*организац"
],
"criticality": "critical",
"semantic_tags": [
"all_time_followup",
"organization_scope",
"human_dialog",
"planner_catalog_alignment"
],
"notes": "[mixed_pack_slot=slot_10_open_org_all_time source=address_truth_harness_phase66_human_org_open_scope_dialog:step_03_all_time_same_open_scope]"
},
{
"step_id": "step_04_bidirectional_comparison",
"title": "The user asks which money direction is larger for the organization",
"question": "Хорошо. А что по ООО Альтернатива Плюс больше в 2020 году: входящие или исходящие деньги?",
"allowed_reply_types": [
"factual",
"factual_with_explanation",
"partial_coverage"
],
"expected_catalog_alignment_status": "selected_matches_top",
"expected_catalog_chain_top_match": "value_flow_comparison",
"expected_catalog_selected_matches_top": true,
"required_answer_patterns_all": [
"(?i)2020",
"(?i)входящ|исходящ|получ|заплат|больше"
],
"criticality": "critical",
"semantic_tags": [
"value_flow_comparison",
"organization_scope",
"human_dialog",
"planner_catalog_alignment"
],
"notes": "[mixed_pack_slot=slot_11_open_org_comparison source=address_truth_harness_phase66_human_org_open_scope_dialog:step_04_bidirectional_comparison]"
},
{
"step_id": "step_05_comparison_year_switch",
"title": "The user asks the same comparison for another year",
"question": "А что по ООО Альтернатива Плюс больше уже за 2021 год: входящие или исходящие деньги?",
"allowed_reply_types": [
"factual",
"factual_with_explanation",
"partial_coverage"
],
"expected_catalog_alignment_status": "selected_matches_top",
"expected_catalog_chain_top_match": "value_flow_comparison",
"expected_catalog_selected_matches_top": true,
"required_answer_patterns_all": [
"(?i)2021",
"(?i)входящ|исходящ|получ|заплат|больше"
],
"criticality": "critical",
"semantic_tags": [
"value_flow_comparison",
"year_switch",
"organization_scope",
"human_dialog",
"planner_catalog_alignment"
],
"notes": "[mixed_pack_slot=slot_12_open_org_comparison_year_switch source=address_truth_harness_phase66_human_org_open_scope_dialog:step_05_comparison_year_switch]"
},
{
"step_id": "step_06_ranking_top_counterparty",
"title": "The user asks who brought the most money for the organization",
"question": "И кто больше всего принес денег этой организации в 2020 году?",
"allowed_reply_types": [
"factual",
"factual_with_explanation",
"partial_coverage"
],
"expected_catalog_alignment_status": "selected_matches_top",
"expected_catalog_chain_top_match": "value_flow_ranking",
"expected_catalog_selected_matches_top": true,
"required_answer_patterns_all": [
"(?i)2020",
"(?i)кто|контрагент|клиент|принес|доход"
],
"criticality": "critical",
"semantic_tags": [
"value_flow_ranking",
"organization_scope",
"human_dialog",
"planner_catalog_alignment"
],
"notes": "[mixed_pack_slot=slot_13_open_org_ranking source=address_truth_harness_phase66_human_org_open_scope_dialog:step_06_ranking_top_counterparty]"
},
{
"step_id": "step_07_ranking_year_switch",
"title": "The user asks the same ranking for another year",
"question": "А в 2021 году?",
"allowed_reply_types": [
"factual",
"factual_with_explanation",
"partial_coverage"
],
"expected_catalog_alignment_status": "selected_matches_top",
"expected_catalog_chain_top_match": "value_flow_ranking",
"expected_catalog_selected_matches_top": true,
"required_answer_patterns_all": [
"(?i)2021"
],
"criticality": "critical",
"semantic_tags": [
"value_flow_ranking",
"year_switch",
"organization_scope",
"human_dialog",
"planner_catalog_alignment"
],
"notes": "[mixed_pack_slot=slot_14_open_org_ranking_year_switch source=address_truth_harness_phase66_human_org_open_scope_dialog:step_07_ranking_year_switch]"
},
{
"step_id": "step_01_company_activity_lifecycle",
"title": "Lifecycle answer seeds grounded organization context",
"question": "а по Альтернативе Плюс сколько лет активности в базе 1С?",
"allowed_reply_types": [
"partial_coverage",
"factual",
"factual_with_explanation"
],
"required_answer_patterns_any": [
"(?i)лет",
"(?i)активност",
"(?i)1с",
"(?i)не получил|не подтвержден|проверил доступный контур"
],
"criticality": "warning",
"semantic_tags": [
"company_activity_lifecycle",
"grounded_context_seed"
],
"notes": "[mixed_pack_slot=slot_15_broad_eval_context source=address_truth_harness_phase22_broad_business_evaluation_bridge:step_01_company_activity_lifecycle]"
},
{
"step_id": "step_02_broad_business_evaluation",
"title": "Broad business evaluation becomes grounded summary instead of stale lifecycle dump",
"question": "Как ты оценишь деятельность компании?",
"required_answer_patterns_all": [
"(?i)коротко|оценк|частичн",
"(?i)1с|подтвержд",
"(?i)денежн|долг|ндс|контрагент|операц"
],
"forbidden_answer_patterns": [
"(?i)активных заказчиков",
"(?i)последняя активность",
"(?i)^\\s*1\\."
],
"criticality": "warning",
"semantic_tags": [
"broad_business_evaluation",
"grounded_summary"
],
"notes": "[mixed_pack_slot=slot_16_broad_eval_bridge source=address_truth_harness_phase22_broad_business_evaluation_bridge:step_02_broad_business_evaluation]"
},
{
"step_id": "step_03_net_flow_after_broad_eval",
"title": "Exact net-flow follow-up still answers after the broad bridge",
"question": "какое нетто по деньгам с Группа СВК за 2020 год: сколько получили и сколько заплатили?",
"allowed_reply_types": [
"partial_coverage",
"factual_with_explanation"
],
"required_answer_patterns_all": [
"(?i)свк",
"(?i)получил|входящ|поступ",
"(?i)заплат|исходящ|списан|плат[её]ж",
"(?i)нетто|сальдо|разниц",
"(?i)2020|период",
"(?i)руб"
],
"forbidden_answer_patterns": [
"(?i)активных заказчиков",
"(?i)лет в базе",
"(?i)последняя активность"
],
"criticality": "critical",
"semantic_tags": [
"counterparty_net_cash_flow",
"broad_eval_bridge_preserved"
],
"notes": "[mixed_pack_slot=slot_17_broad_eval_return_to_net source=address_truth_harness_phase22_broad_business_evaluation_bridge:step_03_net_flow_after_broad_eval]"
},
{
"step_id": "step_01_catalog_metadata_surface",
"title": "Catalog-oriented metadata surface is surfaced honestly for counterparties",
"question": "какие справочники 1С есть по контрагентам?",
"allowed_reply_types": [
"partial_coverage",
"factual_with_explanation"
],
"required_answer_patterns_all": [
"(?i)metadata|метадан",
"(?i)справоч|catalog|directory",
"(?i)контрагент"
],
"forbidden_answer_patterns": [
"(?i)получили",
"(?i)заплатили",
"(?i)нетто",
"(?i)документные строки найдены",
"(?i)строки денежных движений найдены"
],
"criticality": "warning",
"semantic_tags": [
"catalog_metadata_surface",
"counterparty_catalog_scope"
],
"notes": "[mixed_pack_slot=slot_18_metadata_surface source=address_truth_harness_phase42_catalog_metadata_drilldown:step_01_catalog_metadata_surface]"
},
{
"step_id": "step_02_neutral_followup_catalog_drilldown",
"title": "Neutral follow-up continues into deeper catalog metadata instead of asking for a documents-vs-movements lane choice",
"question": "давай дальше",
"allowed_reply_types": [
"partial_coverage",
"factual_with_explanation"
],
"required_answer_patterns_all": [
"(?i)metadata|метадан|схем",
"(?i)справоч|catalog|directory",
"(?i)контрагент|counterpart"
],
"forbidden_answer_patterns": [
"(?i)документ",
"(?i)движени|регистр",
"(?i)уточн.*контур",
"(?i)получили",
"(?i)заплатили",
"(?i)нетто"
],
"criticality": "warning",
"semantic_tags": [
"catalog_drilldown",
"neutral_followup"
],
"notes": "[mixed_pack_slot=slot_19_metadata_drilldown source=address_truth_harness_phase42_catalog_metadata_drilldown:step_02_neutral_followup_catalog_drilldown]"
},
{
"step_id": "step_08_off_domain_living_chat_not_hijacked",
"title": "Off-domain living chat remains human and is not hijacked by discovery carryover",
"question": "а чем капибара отличается от утки?",
"required_answer_patterns_any": [
"(?i)капибар.*утк|утк.*капибар",
"(?i)млекопита|птиц|грызун"
],
"forbidden_answer_patterns": [
"(?i)свк",
"(?i)контрагент",
"(?i)mcp",
"(?i)query_documents",
"(?i)runtime_",
"(?i)primitive"
],
"criticality": "warning",
"semantic_tags": [
"off_domain_living_chat",
"stale_replay_forbidden"
],
"notes": "[mixed_pack_slot=slot_20_off_domain_guard source=address_truth_harness_phase19_mcp_discovery_response_gate:step_08_off_domain_living_chat_not_hijacked]"
}
]
}

File diff suppressed because it is too large Load Diff

View File

@ -1,6 +1,6 @@
# Agent semantic source catalog
- truth_harness_steps_total: `500`
- truth_harness_steps_total: `520`
- saved_session_questions_total: `229`
## Reusable truth-harness tags
@ -11,11 +11,11 @@
- `aggregate_all_time`: `1`
- `aggregate_revenue`: `1`
- `aggregate_year`: `1`
- `alias_grounding`: `6`
- `alias_grounding`: `7`
- `all_time_after_pivot`: `2`
- `all_time_after_second_pivot`: `2`
- `all_time_after_third_pivot`: `1`
- `all_time_followup`: `9`
- `all_time_followup`: `10`
- `all_time_scope`: `4`
- `ambiguity_probe`: `1`
- `anomaly_probe`: `1`
@ -25,18 +25,18 @@
- `bounded_autonomy`: `47`
- `bounded_retrieval`: `13`
- `bridge_inventory_to_vat`: `3`
- `broad_business_evaluation`: `2`
- `broad_eval_bridge_preserved`: `1`
- `broad_business_evaluation`: `3`
- `broad_eval_bridge_preserved`: `2`
- `broad_eval_followup_continuity`: `1`
- `broad_evaluation_bridge`: `1`
- `capability_meta`: `3`
- `capability_over_followup`: `2`
- `catalog_drilldown`: `1`
- `catalog_drilldown`: `2`
- `catalog_grounding`: `1`
- `catalog_metadata_surface`: `1`
- `catalog_metadata_surface`: `2`
- `clarification_required`: `1`
- `clarification_resume`: `2`
- `company_activity_lifecycle`: `2`
- `company_activity_lifecycle`: `3`
- `company_analytics`: `1`
- `company_authority`: `3`
- `company_authority_probe`: `1`
@ -51,14 +51,14 @@
- `continuity_interrupt`: `1`
- `contracts_followup`: `16`
- `counterparty_carryover`: `1`
- `counterparty_catalog_scope`: `1`
- `counterparty_catalog_scope`: `2`
- `counterparty_documents`: `29`
- `counterparty_followup`: `3`
- `counterparty_grounding`: `1`
- `counterparty_item_flow`: `1`
- `counterparty_lifecycle`: `1`
- `counterparty_monthly_net_cash_flow`: `1`
- `counterparty_net_cash_flow`: `4`
- `counterparty_net_cash_flow`: `5`
- `counterparty_net_value_flow`: `1`
- `counterparty_outgoing_payments`: `1`
- `counterparty_pronoun_resolution`: `15`
@ -76,17 +76,17 @@
- `current_turn_entity_authority`: `1`
- `customer_analytics`: `1`
- `data_scope_meta`: `2`
- `date_carryover`: `6`
- `date_carryover`: `7`
- `date_followup`: `2`
- `date_scope`: `1`
- `debt_polarity`: `1`
- `display_label_integrity`: `3`
- `display_name_integrity`: `1`
- `document_evidence`: `3`
- `document_evidence`: `4`
- `document_lane_after_clarification`: `5`
- `document_lane_continuity`: `6`
- `document_lane_execution`: `5`
- `document_pivot`: `1`
- `document_pivot`: `2`
- `document_pivot_after_movement`: `1`
- `document_pivot_after_movement_retrieval`: `2`
- `document_pivot_after_retrieval`: `2`
@ -96,30 +96,30 @@
- `documents_followup`: `7`
- `documents_pivot`: `2`
- `entity_grounding`: `2`
- `entity_resolution`: `29`
- `entity_resolution`: `35`
- `exact_not_overwritten`: `2`
- `followup_anchor`: `6`
- `followup_reuse`: `21`
- `followup_anchor`: `7`
- `followup_reuse`: `26`
- `followup_short`: `1`
- `fourth_pivot`: `2`
- `garbage_anchor_forbidden`: `1`
- `grounded_context_seed`: `1`
- `grounded_context_seed`: `2`
- `grounded_counterparty`: `13`
- `grounded_counterparty_followup`: `12`
- `grounded_discovery_seed`: `1`
- `grounded_self_correction`: `1`
- `grounded_summary`: `1`
- `grounded_summary`: `2`
- `historical_anchor`: `1`
- `historical_date_anchor`: `3`
- `historical_inventory`: `2`
- `historical_restore`: `1`
- `human_answer`: `3`
- `human_answer`: `4`
- `human_answer_quality`: `2`
- `human_dialog`: `39`
- `human_dialog`: `46`
- `hybrid_investigation_followup`: `2`
- `hybrid_investigation_root`: `2`
- `incoming`: `8`
- `incoming_value_flow`: `8`
- `incoming_value_flow`: `9`
- `inline_organization_clarification`: `13`
- `integrity_guard`: `57`
- `inventory_aging`: `3`
@ -142,7 +142,7 @@
- `manual_9lieoh`: `11`
- `materialization_gap`: `1`
- `mcp_discovery_bidirectional_value_flow`: `2`
- `mcp_discovery_gate_sanity`: `1`
- `mcp_discovery_gate_sanity`: `2`
- `mcp_discovery_response_gate`: `1`
- `mcp_discovery_supplier_payout`: `1`
- `mcp_discovery_value_flow`: `1`
@ -152,12 +152,12 @@
- `meta_memory`: `6`
- `meta_return_to_business`: `1`
- `meta_scope`: `12`
- `meta_smalltalk`: `13`
- `meta_smalltalk`: `14`
- `meta_verify`: `1`
- `metadata_lane_choice_clarification`: `15`
- `metadata_surface`: `17`
- `mixed_ambiguity`: `15`
- `movement_evidence`: `3`
- `movement_evidence`: `4`
- `movement_execution`: `1`
- `movement_lane_after_clarification`: `11`
- `movement_lane_after_metadata`: `2`
@ -170,19 +170,19 @@
- `multi_company_entry`: `2`
- `multi_hop_clarification`: `21`
- `net_switch`: `1`
- `net_value_flow`: `5`
- `neutral_followup`: `16`
- `net_value_flow`: `6`
- `neutral_followup`: `17`
- `numeric_counterparty_suffix`: `1`
- `off_domain_living_chat`: `2`
- `off_domain_living_chat`: `3`
- `open_scope`: `9`
- `open_scope_net`: `3`
- `open_scope_total`: `8`
- `open_scope_total`: `10`
- `organization_activity_age`: `5`
- `organization_authority`: `7`
- `organization_clarification`: `8`
- `organization_clarification`: `9`
- `organization_fact_boundary`: `1`
- `organization_followup_reuse`: `20`
- `organization_scope`: `28`
- `organization_scope`: `34`
- `organization_scoped`: `4`
- `organization_second_recovery`: `1`
- `outgoing`: `3`
@ -190,7 +190,7 @@
- `payables`: `1`
- `payables_snapshot`: `1`
- `payments_followup`: `20`
- `payout_switch`: `4`
- `payout_switch`: `5`
- `payout_value_flow`: `2`
- `payout_year_switch`: `3`
- `period_carryover`: `1`
@ -205,7 +205,7 @@
- `period_narrowing`: `1`
- `period_scope`: `9`
- `pivot_seed`: `8`
- `planner_catalog_alignment`: `13`
- `planner_catalog_alignment`: `26`
- `polarity_flip`: `1`
- `post_f`: `9`
- `post_f_integrity_hardening`: `6`
@ -260,7 +260,7 @@
- `stale_entity_seed`: `1`
- `stale_inventory_scope`: `1`
- `stale_lifecycle_override`: `1`
- `stale_replay_forbidden`: `2`
- `stale_replay_forbidden`: `3`
- `stale_scope_guard`: `1`
- `stale_temporal_carryover`: `1`
- `supported_route_not_hijacked_by_mcp_discovery`: `1`
@ -274,10 +274,10 @@
- `topic_reset`: `5`
- `translit_wording`: `1`
- `unsupported_current_turn_meaning_boundary`: `5`
- `value_flow_comparison`: `11`
- `value_flow_comparison`: `13`
- `value_flow_net`: `6`
- `value_flow_pivot`: `3`
- `value_flow_ranking`: `17`
- `value_flow_pivot`: `4`
- `value_flow_ranking`: `19`
- `value_flow_total`: `16`
- `vat`: `37`
- `vat_colloquial_wording`: `2`
@ -288,7 +288,7 @@
- `vat_orientation`: `2`
- `very_old_stock`: `1`
- `year_specific`: `1`
- `year_switch`: `14`
- `year_switch`: `16`
- `year_switch_after_document_pivot`: `1`
- `year_switch_after_fourth_pivot`: `2`
- `year_switch_after_pivot`: `4`
@ -728,6 +728,26 @@
- `address_truth_harness_phase82_human_mixed_integrity_status_dialog:step_17_counterparty_net_followup` | tags: grounded_counterparty, net_value_flow, human_dialog | question: А какое нетто?
- `address_truth_harness_phase82_human_mixed_integrity_status_dialog:step_18_counterparty_documents_pivot` | tags: grounded_counterparty, documents_pivot, human_dialog, counterparty_documents | question: А по документам?
- `address_truth_harness_phase82_human_mixed_integrity_status_dialog:step_19_counterparty_movements_pivot` | tags: grounded_counterparty, movements_pivot, human_dialog | question: А по движениям?
- `address_truth_harness_phase83_planner_brain_alignment_mix:step_01_human_smalltalk_sanity` | tags: human_answer, mcp_discovery_gate_sanity, meta_smalltalk | question: привет, ты на связи?
- `address_truth_harness_phase83_planner_brain_alignment_mix:step_01_resolve_counterparty_alias` | tags: entity_resolution, alias_grounding, followup_anchor, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=entity_resolution, selected_matches_top=True | question: найди в 1С контрагента СВК
- `address_truth_harness_phase83_planner_brain_alignment_mix:step_02_incoming_by_resolved_entity` | tags: entity_resolution, incoming_value_flow, followup_reuse, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=value_flow, selected_matches_top=True | question: сколько получили по нему за 2020 год
- `address_truth_harness_phase83_planner_brain_alignment_mix:step_03_payout_switch_by_resolved_entity` | tags: entity_resolution, payout_switch, followup_reuse, date_carryover, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=value_flow, selected_matches_top=True | question: а теперь сколько заплатили?
- `address_truth_harness_phase83_planner_brain_alignment_mix:step_04_net_after_payout` | tags: entity_resolution, net_value_flow, followup_reuse, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=value_flow_comparison, selected_matches_top=True | question: а какое нетто?
- `address_truth_harness_phase83_planner_brain_alignment_mix:step_05_documents_after_net` | tags: entity_resolution, document_evidence, value_flow_pivot, followup_reuse, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=document_evidence, selected_matches_top=True | question: а по документам?
- `address_truth_harness_phase83_planner_brain_alignment_mix:step_06_movements_after_documents` | tags: entity_resolution, movement_evidence, document_pivot, followup_reuse, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=movement_evidence, selected_matches_top=True | question: а по движениям?
- `address_truth_harness_phase83_planner_brain_alignment_mix:step_01_open_scope_incoming_total` | tags: open_scope_total, organization_scope, human_dialog, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=value_flow, selected_matches_top=True | question: Хочу быстрый денежный срез по одной организации без привязки к контрагенту. Сколько вообще входящих денег было за 2020 год?
- `address_truth_harness_phase83_planner_brain_alignment_mix:step_02_all_time_same_open_scope` | tags: organization_clarification, open_scope_total, human_dialog, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=value_flow, selected_matches_top=True | question: По ООО Альтернатива Плюс.
- `address_truth_harness_phase83_planner_brain_alignment_mix:step_03_all_time_same_open_scope` | tags: all_time_followup, organization_scope, human_dialog, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=value_flow, selected_matches_top=True | question: Понял, тогда за все время.
- `address_truth_harness_phase83_planner_brain_alignment_mix:step_04_bidirectional_comparison` | tags: value_flow_comparison, organization_scope, human_dialog, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=value_flow_comparison, selected_matches_top=True | question: Хорошо. А что по ООО Альтернатива Плюс больше в 2020 году: входящие или исходящие деньги?
- `address_truth_harness_phase83_planner_brain_alignment_mix:step_05_comparison_year_switch` | tags: value_flow_comparison, year_switch, organization_scope, human_dialog, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=value_flow_comparison, selected_matches_top=True | question: А что по ООО Альтернатива Плюс больше уже за 2021 год: входящие или исходящие деньги?
- `address_truth_harness_phase83_planner_brain_alignment_mix:step_06_ranking_top_counterparty` | tags: value_flow_ranking, organization_scope, human_dialog, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=value_flow_ranking, selected_matches_top=True | question: И кто больше всего принес денег этой организации в 2020 году?
- `address_truth_harness_phase83_planner_brain_alignment_mix:step_07_ranking_year_switch` | tags: value_flow_ranking, year_switch, organization_scope, human_dialog, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=value_flow_ranking, selected_matches_top=True | question: А в 2021 году?
- `address_truth_harness_phase83_planner_brain_alignment_mix:step_01_company_activity_lifecycle` | tags: company_activity_lifecycle, grounded_context_seed | question: а по Альтернативе Плюс сколько лет активности в базе 1С?
- `address_truth_harness_phase83_planner_brain_alignment_mix:step_02_broad_business_evaluation` | tags: broad_business_evaluation, grounded_summary | question: Как ты оценишь деятельность компании?
- `address_truth_harness_phase83_planner_brain_alignment_mix:step_03_net_flow_after_broad_eval` | tags: counterparty_net_cash_flow, broad_eval_bridge_preserved | question: какое нетто по деньгам с Группа СВК за 2020 год: сколько получили и сколько заплатили?
- `address_truth_harness_phase83_planner_brain_alignment_mix:step_01_catalog_metadata_surface` | tags: catalog_metadata_surface, counterparty_catalog_scope | question: какие справочники 1С есть по контрагентам?
- `address_truth_harness_phase83_planner_brain_alignment_mix:step_02_neutral_followup_catalog_drilldown` | tags: catalog_drilldown, neutral_followup | question: давай дальше
- `address_truth_harness_phase83_planner_brain_alignment_mix:step_08_off_domain_living_chat_not_hijacked` | tags: off_domain_living_chat, stale_replay_forbidden | question: а чем капибара отличается от утки?
- `address_truth_harness_phase8_manual_runtime_authority_mix:step_01_smalltalk` | tags: meta_smalltalk | question: привет, как дела?
- `address_truth_harness_phase8_manual_runtime_authority_mix:step_02_data_scope_meta` | tags: meta_scope | question: по какой компании мы сейчас работаем?
- `address_truth_harness_phase8_manual_runtime_authority_mix:step_03_counterparty_documents` | tags: counterparty_documents | question: покажи все документы по чепурнову

View File

@ -167,7 +167,180 @@ RECIPE_LIBRARY: dict[str, dict[str, Any]] = {
"required_tags": ["meta_scope"],
},
],
}
},
"turnaround_11_planner_brain_alignment_mix": {
"scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix",
"domain": "planner_autonomy_consolidation",
"title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity",
"description": (
"Mixed AGENT replay for Planner Autonomy Consolidation. The pack interleaves selected-counterparty "
"catalog-alignment probes, open-organization money flow, ranking, broad-evaluation continuity, "
"metadata drilldown, and off-domain living-chat safety."
),
"bindings": {},
"step_plan": [
{
"slot_id": "slot_01_smalltalk_sanity",
"criticality": "info",
"preferred_candidate_ids": [
"address_truth_harness_phase19_mcp_discovery_response_gate:step_01_human_smalltalk_sanity",
],
"required_tags": ["meta_smalltalk"],
},
{
"slot_id": "slot_02_counterparty_grounding",
"criticality": "critical",
"preferred_candidate_ids": [
"address_truth_harness_phase32_planner_selected_chain_end_to_end:step_01_resolve_counterparty_alias",
],
"required_tags": ["planner_catalog_alignment", "entity_resolution"],
},
{
"slot_id": "slot_03_counterparty_incoming",
"criticality": "critical",
"preferred_candidate_ids": [
"address_truth_harness_phase32_planner_selected_chain_end_to_end:step_02_incoming_by_resolved_entity",
],
"required_tags": ["planner_catalog_alignment"],
},
{
"slot_id": "slot_04_counterparty_payout",
"criticality": "critical",
"preferred_candidate_ids": [
"address_truth_harness_phase32_planner_selected_chain_end_to_end:step_03_payout_switch_by_resolved_entity",
],
"required_tags": ["planner_catalog_alignment"],
},
{
"slot_id": "slot_05_counterparty_net",
"criticality": "critical",
"preferred_candidate_ids": [
"address_truth_harness_phase32_planner_selected_chain_end_to_end:step_04_net_after_payout",
],
"required_tags": ["planner_catalog_alignment"],
},
{
"slot_id": "slot_06_counterparty_documents",
"criticality": "critical",
"preferred_candidate_ids": [
"address_truth_harness_phase32_planner_selected_chain_end_to_end:step_05_documents_after_net",
],
"required_tags": ["planner_catalog_alignment", "document_evidence"],
},
{
"slot_id": "slot_07_counterparty_movements",
"criticality": "critical",
"preferred_candidate_ids": [
"address_truth_harness_phase32_planner_selected_chain_end_to_end:step_06_movements_after_documents",
],
"required_tags": ["planner_catalog_alignment", "movement_evidence"],
},
{
"slot_id": "slot_08_open_org_total",
"criticality": "critical",
"preferred_candidate_ids": [
"address_truth_harness_phase66_human_org_open_scope_dialog:step_01_open_scope_incoming_total",
],
"required_tags": ["planner_catalog_alignment"],
},
{
"slot_id": "slot_09_open_org_selection",
"criticality": "critical",
"preferred_candidate_ids": [
"address_truth_harness_phase66_human_org_open_scope_dialog:step_02_all_time_same_open_scope",
],
"required_tags": ["planner_catalog_alignment"],
},
{
"slot_id": "slot_10_open_org_all_time",
"criticality": "critical",
"preferred_candidate_ids": [
"address_truth_harness_phase66_human_org_open_scope_dialog:step_03_all_time_same_open_scope",
],
"required_tags": ["planner_catalog_alignment"],
},
{
"slot_id": "slot_11_open_org_comparison",
"criticality": "critical",
"preferred_candidate_ids": [
"address_truth_harness_phase66_human_org_open_scope_dialog:step_04_bidirectional_comparison",
],
"required_tags": ["planner_catalog_alignment", "value_flow_comparison"],
},
{
"slot_id": "slot_12_open_org_comparison_year_switch",
"criticality": "critical",
"preferred_candidate_ids": [
"address_truth_harness_phase66_human_org_open_scope_dialog:step_05_comparison_year_switch",
],
"required_tags": ["planner_catalog_alignment", "value_flow_comparison"],
},
{
"slot_id": "slot_13_open_org_ranking",
"criticality": "critical",
"preferred_candidate_ids": [
"address_truth_harness_phase66_human_org_open_scope_dialog:step_06_ranking_top_counterparty",
],
"required_tags": ["planner_catalog_alignment", "value_flow_ranking"],
},
{
"slot_id": "slot_14_open_org_ranking_year_switch",
"criticality": "critical",
"preferred_candidate_ids": [
"address_truth_harness_phase66_human_org_open_scope_dialog:step_07_ranking_year_switch",
],
"required_tags": ["planner_catalog_alignment", "value_flow_ranking"],
},
{
"slot_id": "slot_15_broad_eval_context",
"criticality": "warning",
"preferred_candidate_ids": [
"address_truth_harness_phase22_broad_business_evaluation_bridge:step_01_company_activity_lifecycle",
],
"required_tags": ["company_activity_lifecycle"],
},
{
"slot_id": "slot_16_broad_eval_bridge",
"criticality": "warning",
"preferred_candidate_ids": [
"address_truth_harness_phase22_broad_business_evaluation_bridge:step_02_broad_business_evaluation",
],
"required_tags": ["broad_business_evaluation"],
},
{
"slot_id": "slot_17_broad_eval_return_to_net",
"criticality": "critical",
"preferred_candidate_ids": [
"address_truth_harness_phase22_broad_business_evaluation_bridge:step_03_net_flow_after_broad_eval",
],
"required_tags": ["broad_eval_bridge_preserved"],
},
{
"slot_id": "slot_18_metadata_surface",
"criticality": "warning",
"preferred_candidate_ids": [
"address_truth_harness_phase42_catalog_metadata_drilldown:step_01_catalog_metadata_surface",
],
"required_tags": ["catalog_metadata_surface"],
},
{
"slot_id": "slot_19_metadata_drilldown",
"criticality": "warning",
"preferred_candidate_ids": [
"address_truth_harness_phase42_catalog_metadata_drilldown:step_02_neutral_followup_catalog_drilldown",
],
"required_tags": ["catalog_drilldown"],
},
{
"slot_id": "slot_20_off_domain_guard",
"criticality": "warning",
"preferred_candidate_ids": [
"address_truth_harness_phase19_mcp_discovery_response_gate:step_08_off_domain_living_chat_not_hijacked",
],
"required_tags": ["off_domain_living_chat"],
},
],
},
}

View File

@ -49,6 +49,25 @@ class AgentSemanticPackBuilderTests(unittest.TestCase):
self.assertIn("same_date_restore", all_tags)
self.assertIn("settlements_receivables", all_tags)
def test_build_recipe_spec_creates_planner_brain_alignment_pack(self) -> None:
catalog = builder.build_source_catalog()
spec = builder.build_recipe_spec(catalog, "turnaround_11_planner_brain_alignment_mix")
self.assertEqual(spec["scenario_id"], "address_truth_harness_phase83_planner_brain_alignment_mix")
self.assertEqual(len(spec["steps"]), 20)
all_tags = {tag for step in spec["steps"] for tag in step.get("semantic_tags", [])}
self.assertIn("planner_catalog_alignment", all_tags)
self.assertIn("value_flow_comparison", all_tags)
self.assertIn("value_flow_ranking", all_tags)
self.assertIn("broad_business_evaluation", all_tags)
self.assertIn("catalog_drilldown", all_tags)
self.assertIn("off_domain_living_chat", all_tags)
catalog_checked_steps = [
step for step in spec["steps"] if step.get("expected_catalog_chain_top_match")
]
self.assertEqual(len(catalog_checked_steps), 13)
if __name__ == "__main__":
unittest.main()