diff --git a/docs/ARCH/11 - architecture_turnaround/20 - planner_autonomy_consolidation_2026-05-01.md b/docs/ARCH/11 - architecture_turnaround/20 - planner_autonomy_consolidation_2026-05-01.md index 5b290d0..6889149 100644 --- a/docs/ARCH/11 - architecture_turnaround/20 - planner_autonomy_consolidation_2026-05-01.md +++ b/docs/ARCH/11 - architecture_turnaround/20 - planner_autonomy_consolidation_2026-05-01.md @@ -128,6 +128,7 @@ The following consolidation step added catalog-level chain-template scoring: - `address_truth_harness_phase66_human_org_open_scope_dialog.json` now uses those fields to assert `value_flow`, `value_flow_comparison`, and `value_flow_ranking` top matches across the open-organization money dialog. - `address_truth_harness_phase32_planner_selected_chain_end_to_end.json` now uses the same assertions across selected-counterparty entity grounding, incoming/outgoing/net value-flow, document evidence, and movement evidence follow-ups. - `agent_semantic_pack_builder` now preserves these expected catalog-alignment fields in the reusable source catalog and adds the `planner_catalog_alignment` tag, so future mixed AGENT packs can deliberately select planner-brain regression probes instead of relying on hand-picked replay filenames. +- The new `turnaround_11_planner_brain_alignment_mix` builder recipe generates `address_truth_harness_phase83_planner_brain_alignment_mix.json`, a 20-step mixed canary that crosses selected-counterparty value-flow, open-organization totals/comparison/ranking, broad-evaluation continuity, metadata drilldown, and off-domain living-chat safety. ## Why This Matters @@ -289,9 +290,16 @@ Latest validation after phase32 catalog-alignment spec hardening and AGENT sourc - `load_truth_harness_spec` confirmed the phase32 expected top-match chain sequence: `entity_resolution`, `value_flow`, `value_flow`, `value_flow_comparison`, `document_evidence`, `movement_evidence` - `agent_semantic_pack_builder.py inventory` regenerated `agent_semantic_source_catalog.*` with reusable `planner_catalog_alignment` coverage +Latest validation after phase83 mixed planner-brain spec generation: + +- `scripts.test_agent_semantic_pack_builder`: passed, `3 passed` +- generated `address_truth_harness_phase83_planner_brain_alignment_mix.json`: `20` steps, `13` expected catalog top-match checks +- regenerated `agent_semantic_source_catalog.*`: `planner_catalog_alignment` is visible with `26` reusable entries, including phase32, phase66, and phase83 probes +- graphify rebuild: `5952 nodes`, `12927 edges`, `138 communities` + ## Next Step -The next safe step is still to re-run live replay once the 1C side is actively polling the proxy. In parallel, local-only consolidation can continue by using the regenerated AGENT source catalog to assemble mixed planner-brain canaries, hardening additional planner-autonomy specs with expected catalog-chain assertions, and using `alignment_status`, alignment reason-code telemetry, truth-harness artifact surfacing, the soft divergence warning, `catalog_alignment_ok`, and the representative guard to find remaining manual branches where selected chains diverge from reviewed catalog-fabric intent. +The next safe step is still to re-run live replay once the 1C side is actively polling the proxy. The first live replay candidate should be `address_truth_harness_phase83_planner_brain_alignment_mix.json`; only after it is executed, reviewed semantically, fixed/rerun if needed, and accepted should it be saved into autoruns as a legacy AGENT pack. In parallel, local-only consolidation can continue by hardening additional planner-autonomy specs with expected catalog-chain assertions and using `alignment_status`, alignment reason-code telemetry, truth-harness artifact surfacing, the soft divergence warning, `catalog_alignment_ok`, and the representative guard to find remaining manual branches where selected chains diverge from reviewed catalog-fabric intent. Recommended order: diff --git a/docs/ARCH/11 - architecture_turnaround/README.md b/docs/ARCH/11 - architecture_turnaround/README.md index 2e1f988..4e79c62 100644 --- a/docs/ARCH/11 - architecture_turnaround/README.md +++ b/docs/ARCH/11 - architecture_turnaround/README.md @@ -91,6 +91,7 @@ It now documents a turnaround that is already operational in code, already mater - the phase66 open-scope money dialog spec now asserts expected catalog-chain top matches across value-flow totals, bidirectional comparison, and ranking follow-ups; - the phase32 selected-counterparty chain spec now asserts expected catalog-chain top matches across entity grounding, incoming/outgoing/net value-flow, document evidence, and movement evidence follow-ups; - AGENT semantic source catalog generation now preserves expected catalog-alignment fields and tags reusable steps as `planner_catalog_alignment`, so mixed pack construction can find planner-brain regression probes explicitly; + - phase83 planner-brain mixed replay spec is now generated from the AGENT source catalog and interleaves selected-counterparty catalog alignment, open-organization money flow/ranking, broad-evaluation continuity, metadata drilldown, and off-domain living-chat safety; - explicit-counterparty incoming-vs-outgoing data-need graphs now select the reviewed `value_flow_comparison` chain instead of falling back to generic `value_flow`; - live map sync: [20 - planner_autonomy_consolidation_2026-05-01.md](./20%20-%20planner_autonomy_consolidation_2026-05-01.md) @@ -103,8 +104,8 @@ Current honest status: - open-world bounded-autonomy readiness: `~85%` - Post-F semantic integrity module progress: `~99%` operationally closed, with remaining risk now treated as next-slice discovery rather than an open blocker inside the closed slice - active inventory-stock breadth slice progress: `100%` for the declared scenario pack, not for arbitrary inventory questions -- Planner Autonomy Consolidation progress: `~93%` for the declared module, with catalog-fabric, value-flow arbitration, lifecycle bounded inference, broad-evaluation bridge, inventory catalog templates, inventory runtime-boundary honesty, exact inventory recipe bridging, unambiguous metadata-surface lane inference, catalog chain-template scoring, structured chain-match contract exposure, runtime/debug propagation, subject-aware bidirectional comparison arbitration, structured catalog-alignment verdicts, representative alignment regression guard, catalog-alignment reason-code telemetry, explicit `alignment_status` propagation, truth-harness/acceptance-matrix surfacing, soft divergence warning, `catalog_alignment_ok` acceptance invariant, step-level expected catalog-alignment assertions, phase66 and phase32 spec alignment expectations, and AGENT source-catalog surfacing validated locally, but live replay for the new bridge is currently blocked by missing active 1C polling and broader unfamiliar 1C asks still need replay-backed growth -- graph snapshot after latest rebuild: `5951 nodes`, `12926 edges`, `139 communities` +- Planner Autonomy Consolidation progress: `~94%` for the declared module, with catalog-fabric, value-flow arbitration, lifecycle bounded inference, broad-evaluation bridge, inventory catalog templates, inventory runtime-boundary honesty, exact inventory recipe bridging, unambiguous metadata-surface lane inference, catalog chain-template scoring, structured chain-match contract exposure, runtime/debug propagation, subject-aware bidirectional comparison arbitration, structured catalog-alignment verdicts, representative alignment regression guard, catalog-alignment reason-code telemetry, explicit `alignment_status` propagation, truth-harness/acceptance-matrix surfacing, soft divergence warning, `catalog_alignment_ok` acceptance invariant, step-level expected catalog-alignment assertions, phase66 and phase32 spec alignment expectations, AGENT source-catalog surfacing, and generated phase83 mixed planner-brain replay spec validated locally, but live replay for the new bridge is currently blocked by missing active 1C polling and broader unfamiliar 1C asks still need replay-backed growth +- graph snapshot after latest rebuild: `5952 nodes`, `12927 edges`, `138 communities` - current breakpoint: - the validated hot paths are no longer structurally broken; - flagship continuity collapse is no longer the primary risk; @@ -163,6 +164,7 @@ Latest live proof now includes: - catalog-alignment spec assertions accepted locally: Python truth-harness/acceptance tests passed `7/7`; graphify rebuilt to `5951 nodes`, `12926 edges`, `139 communities` - phase66 planner-alignment spec hardening accepted locally: Python truth-harness/acceptance tests passed `7/7`; `load_truth_harness_spec` confirmed expected top matches `[value_flow, value_flow, value_flow, value_flow_comparison, value_flow_comparison, value_flow_ranking, value_flow_ranking]` - phase32 selected-counterparty planner-alignment spec hardening and AGENT source-catalog surfacing accepted locally: Python replay-tooling tests passed `9/9`; `load_truth_harness_spec` confirmed expected top matches `[entity_resolution, value_flow, value_flow, value_flow_comparison, document_evidence, movement_evidence]`; regenerated source catalog exposes `planner_catalog_alignment` as a reusable tag +- phase83 mixed planner-brain spec generation accepted locally: Python replay-tooling tests passed `10/10`; generated spec has `20` steps and `13` expected catalog top-match checks; regenerated source catalog exposes `planner_catalog_alignment` with `26` reusable entries; graphify rebuilt to `5952 nodes`, `12927 edges`, `138 communities` Current architectural reading: diff --git a/docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json b/docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json new file mode 100644 index 0000000..94d78cf --- /dev/null +++ b/docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json @@ -0,0 +1,587 @@ +{ + "schema_version": "domain_truth_harness_spec_v1", + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "description": "Mixed AGENT replay for Planner Autonomy Consolidation. The pack interleaves selected-counterparty catalog-alignment probes, open-organization money flow, ranking, broad-evaluation continuity, metadata drilldown, and off-domain living-chat safety.", + "bindings": {}, + "steps": [ + { + "step_id": "step_01_human_smalltalk_sanity", + "title": "Human smalltalk remains living chat and does not expose discovery internals", + "question": "привет, ты на связи?", + "required_answer_patterns_any": [ + "(?i)привет|на связи|готов|помочь" + ], + "forbidden_answer_patterns": [ + "(?i)mcp", + "(?i)runtime_", + "(?i)query_documents", + "(?i)primitive" + ], + "criticality": "info", + "semantic_tags": [ + "human_answer", + "mcp_discovery_gate_sanity", + "meta_smalltalk" + ], + "notes": "[mixed_pack_slot=slot_01_smalltalk_sanity source=address_truth_harness_phase19_mcp_discovery_response_gate:step_01_human_smalltalk_sanity]" + }, + { + "step_id": "step_01_resolve_counterparty_alias", + "title": "Entity resolution grounds the checked 1C counterparty from a loose alias", + "question": "найди в 1С контрагента СВК", + "allowed_reply_types": [ + "factual", + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "entity_resolution", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)свк", + "(?i)контрагент" + ], + "required_answer_patterns_any": [ + "(?i)группа\\s+свк", + "(?i)каталог", + "(?i)найден", + "(?i)наиболее вероятн" + ], + "forbidden_answer_patterns": [ + "(?i)получили", + "(?i)заплатили", + "(?i)нетто", + "(?i)оборот", + "(?i)выручк", + "(?i)сумм(а|ы)" + ], + "criticality": "critical", + "semantic_tags": [ + "entity_resolution", + "alias_grounding", + "followup_anchor", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_02_counterparty_grounding source=address_truth_harness_phase32_planner_selected_chain_end_to_end:step_01_resolve_counterparty_alias]" + }, + { + "step_id": "step_02_incoming_by_resolved_entity", + "title": "Incoming value-flow follow-up reuses the resolved counterparty anchor", + "question": "сколько получили по нему за 2020 год", + "allowed_reply_types": [ + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)2020", + "(?i)получил|входящ|поступ", + "(?i)руб" + ], + "required_answer_patterns_any": [ + "(?i)группа\\s+свк", + "(?i)свк" + ], + "forbidden_answer_patterns": [ + "(?i)не найден контрагент", + "(?i)уточните, какого контрагента", + "(?i)по какому контрагенту" + ], + "criticality": "critical", + "semantic_tags": [ + "entity_resolution", + "incoming_value_flow", + "followup_reuse", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_03_counterparty_incoming source=address_truth_harness_phase32_planner_selected_chain_end_to_end:step_02_incoming_by_resolved_entity]" + }, + { + "step_id": "step_03_payout_switch_by_resolved_entity", + "title": "Outgoing payment follow-up keeps the same grounded counterparty and checked year", + "question": "а теперь сколько заплатили?", + "allowed_reply_types": [ + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)2020", + "(?i)заплатил|исходящ|списан|платеж", + "(?i)руб" + ], + "required_answer_patterns_any": [ + "(?i)группа\\s+свк", + "(?i)свк" + ], + "forbidden_answer_patterns": [ + "(?i)не найден контрагент", + "(?i)уточните, какого контрагента", + "(?i)по какому контрагенту", + "(?i)за какой год" + ], + "criticality": "critical", + "semantic_tags": [ + "entity_resolution", + "payout_switch", + "followup_reuse", + "date_carryover", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_04_counterparty_payout source=address_truth_harness_phase32_planner_selected_chain_end_to_end:step_03_payout_switch_by_resolved_entity]" + }, + { + "step_id": "step_04_net_after_payout", + "title": "Net-flow follow-up reuses the same grounded counterparty and checked year after payout", + "question": "а какое нетто?", + "allowed_reply_types": [ + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow_comparison", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)2020", + "(?i)нетто|сальдо", + "(?i)руб" + ], + "required_answer_patterns_any": [ + "(?i)получ", + "(?i)заплат", + "(?i)группа\\s+свк", + "(?i)свк" + ], + "forbidden_answer_patterns": [ + "(?i)не найден контрагент", + "(?i)уточните, какого контрагента", + "(?i)по какому контрагенту" + ], + "criticality": "critical", + "semantic_tags": [ + "entity_resolution", + "net_value_flow", + "followup_reuse", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_05_counterparty_net source=address_truth_harness_phase32_planner_selected_chain_end_to_end:step_04_net_after_payout]" + }, + { + "step_id": "step_05_documents_after_net", + "title": "Document evidence follow-up keeps the grounded counterparty after the net answer", + "question": "а по документам?", + "allowed_reply_types": [ + "factual", + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "document_evidence", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)документ|счет|накладн|акт" + ], + "required_answer_patterns_any": [ + "(?i)группа\\s+свк", + "(?i)свк", + "(?i)2020" + ], + "forbidden_answer_patterns": [ + "(?i)не найден контрагент", + "(?i)уточните, какого контрагента", + "(?i)по какому контрагенту", + "(?i)сколько получили", + "(?i)сколько заплатили", + "(?i)нетто" + ], + "criticality": "critical", + "semantic_tags": [ + "entity_resolution", + "document_evidence", + "value_flow_pivot", + "followup_reuse", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_06_counterparty_documents source=address_truth_harness_phase32_planner_selected_chain_end_to_end:step_05_documents_after_net]" + }, + { + "step_id": "step_06_movements_after_documents", + "title": "Movement evidence follow-up keeps the grounded counterparty after the document answer", + "question": "а по движениям?", + "allowed_reply_types": [ + "factual", + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "movement_evidence", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)движени|операц|платеж|списан|поступ" + ], + "required_answer_patterns_any": [ + "(?i)группа\\s+свк", + "(?i)свк", + "(?i)2020" + ], + "forbidden_answer_patterns": [ + "(?i)не найден контрагент", + "(?i)уточните, какого контрагента", + "(?i)по какому контрагенту", + "(?i)сколько получили", + "(?i)сколько заплатили", + "(?i)нетто" + ], + "criticality": "critical", + "semantic_tags": [ + "entity_resolution", + "movement_evidence", + "document_pivot", + "followup_reuse", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_07_counterparty_movements source=address_truth_harness_phase32_planner_selected_chain_end_to_end:step_06_movements_after_documents]" + }, + { + "step_id": "step_01_open_scope_incoming_total", + "title": "The user asks for incoming money without naming the organization yet", + "question": "Хочу быстрый денежный срез по одной организации без привязки к контрагенту. Сколько вообще входящих денег было за 2020 год?", + "allowed_reply_types": [ + "clarification_required", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)уточн|нужно", + "(?i)организац" + ], + "criticality": "critical", + "semantic_tags": [ + "open_scope_total", + "organization_scope", + "human_dialog", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_08_open_org_total source=address_truth_harness_phase66_human_org_open_scope_dialog:step_01_open_scope_incoming_total]" + }, + { + "step_id": "step_02_all_time_same_open_scope", + "title": "The user selects the organization and gets the 2020 incoming total", + "question": "По ООО Альтернатива Плюс.", + "allowed_reply_types": [ + "factual", + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)2020", + "(?i)входящ|поступлен|получ" + ], + "forbidden_answer_patterns": [ + "(?i)уточните .*контрагент", + "(?i)не найден контрагент", + "(?i)уточните .*организац" + ], + "criticality": "critical", + "semantic_tags": [ + "organization_clarification", + "open_scope_total", + "human_dialog", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_09_open_org_selection source=address_truth_harness_phase66_human_org_open_scope_dialog:step_02_all_time_same_open_scope]" + }, + { + "step_id": "step_03_all_time_same_open_scope", + "title": "The user broadens the same organization slice to all available time", + "question": "Понял, тогда за все время.", + "allowed_reply_types": [ + "factual", + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)все доступное время|все время|весь период", + "(?i)входящ|поступлен|получ" + ], + "forbidden_answer_patterns": [ + "(?i)за 2020", + "(?i)уточните .*контрагент", + "(?i)уточните .*период", + "(?i)уточните .*организац" + ], + "criticality": "critical", + "semantic_tags": [ + "all_time_followup", + "organization_scope", + "human_dialog", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_10_open_org_all_time source=address_truth_harness_phase66_human_org_open_scope_dialog:step_03_all_time_same_open_scope]" + }, + { + "step_id": "step_04_bidirectional_comparison", + "title": "The user asks which money direction is larger for the organization", + "question": "Хорошо. А что по ООО Альтернатива Плюс больше в 2020 году: входящие или исходящие деньги?", + "allowed_reply_types": [ + "factual", + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow_comparison", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)2020", + "(?i)входящ|исходящ|получ|заплат|больше" + ], + "criticality": "critical", + "semantic_tags": [ + "value_flow_comparison", + "organization_scope", + "human_dialog", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_11_open_org_comparison source=address_truth_harness_phase66_human_org_open_scope_dialog:step_04_bidirectional_comparison]" + }, + { + "step_id": "step_05_comparison_year_switch", + "title": "The user asks the same comparison for another year", + "question": "А что по ООО Альтернатива Плюс больше уже за 2021 год: входящие или исходящие деньги?", + "allowed_reply_types": [ + "factual", + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow_comparison", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)2021", + "(?i)входящ|исходящ|получ|заплат|больше" + ], + "criticality": "critical", + "semantic_tags": [ + "value_flow_comparison", + "year_switch", + "organization_scope", + "human_dialog", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_12_open_org_comparison_year_switch source=address_truth_harness_phase66_human_org_open_scope_dialog:step_05_comparison_year_switch]" + }, + { + "step_id": "step_06_ranking_top_counterparty", + "title": "The user asks who brought the most money for the organization", + "question": "И кто больше всего принес денег этой организации в 2020 году?", + "allowed_reply_types": [ + "factual", + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow_ranking", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)2020", + "(?i)кто|контрагент|клиент|принес|доход" + ], + "criticality": "critical", + "semantic_tags": [ + "value_flow_ranking", + "organization_scope", + "human_dialog", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_13_open_org_ranking source=address_truth_harness_phase66_human_org_open_scope_dialog:step_06_ranking_top_counterparty]" + }, + { + "step_id": "step_07_ranking_year_switch", + "title": "The user asks the same ranking for another year", + "question": "А в 2021 году?", + "allowed_reply_types": [ + "factual", + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow_ranking", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)2021" + ], + "criticality": "critical", + "semantic_tags": [ + "value_flow_ranking", + "year_switch", + "organization_scope", + "human_dialog", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_14_open_org_ranking_year_switch source=address_truth_harness_phase66_human_org_open_scope_dialog:step_07_ranking_year_switch]" + }, + { + "step_id": "step_01_company_activity_lifecycle", + "title": "Lifecycle answer seeds grounded organization context", + "question": "а по Альтернативе Плюс сколько лет активности в базе 1С?", + "allowed_reply_types": [ + "partial_coverage", + "factual", + "factual_with_explanation" + ], + "required_answer_patterns_any": [ + "(?i)лет", + "(?i)активност", + "(?i)1с", + "(?i)не получил|не подтвержден|проверил доступный контур" + ], + "criticality": "warning", + "semantic_tags": [ + "company_activity_lifecycle", + "grounded_context_seed" + ], + "notes": "[mixed_pack_slot=slot_15_broad_eval_context source=address_truth_harness_phase22_broad_business_evaluation_bridge:step_01_company_activity_lifecycle]" + }, + { + "step_id": "step_02_broad_business_evaluation", + "title": "Broad business evaluation becomes grounded summary instead of stale lifecycle dump", + "question": "Как ты оценишь деятельность компании?", + "required_answer_patterns_all": [ + "(?i)коротко|оценк|частичн", + "(?i)1с|подтвержд", + "(?i)денежн|долг|ндс|контрагент|операц" + ], + "forbidden_answer_patterns": [ + "(?i)активных заказчиков", + "(?i)последняя активность", + "(?i)^\\s*1\\." + ], + "criticality": "warning", + "semantic_tags": [ + "broad_business_evaluation", + "grounded_summary" + ], + "notes": "[mixed_pack_slot=slot_16_broad_eval_bridge source=address_truth_harness_phase22_broad_business_evaluation_bridge:step_02_broad_business_evaluation]" + }, + { + "step_id": "step_03_net_flow_after_broad_eval", + "title": "Exact net-flow follow-up still answers after the broad bridge", + "question": "какое нетто по деньгам с Группа СВК за 2020 год: сколько получили и сколько заплатили?", + "allowed_reply_types": [ + "partial_coverage", + "factual_with_explanation" + ], + "required_answer_patterns_all": [ + "(?i)свк", + "(?i)получил|входящ|поступ", + "(?i)заплат|исходящ|списан|плат[её]ж", + "(?i)нетто|сальдо|разниц", + "(?i)2020|период", + "(?i)руб" + ], + "forbidden_answer_patterns": [ + "(?i)активных заказчиков", + "(?i)лет в базе", + "(?i)последняя активность" + ], + "criticality": "critical", + "semantic_tags": [ + "counterparty_net_cash_flow", + "broad_eval_bridge_preserved" + ], + "notes": "[mixed_pack_slot=slot_17_broad_eval_return_to_net source=address_truth_harness_phase22_broad_business_evaluation_bridge:step_03_net_flow_after_broad_eval]" + }, + { + "step_id": "step_01_catalog_metadata_surface", + "title": "Catalog-oriented metadata surface is surfaced honestly for counterparties", + "question": "какие справочники 1С есть по контрагентам?", + "allowed_reply_types": [ + "partial_coverage", + "factual_with_explanation" + ], + "required_answer_patterns_all": [ + "(?i)metadata|метадан", + "(?i)справоч|catalog|directory", + "(?i)контрагент" + ], + "forbidden_answer_patterns": [ + "(?i)получили", + "(?i)заплатили", + "(?i)нетто", + "(?i)документные строки найдены", + "(?i)строки денежных движений найдены" + ], + "criticality": "warning", + "semantic_tags": [ + "catalog_metadata_surface", + "counterparty_catalog_scope" + ], + "notes": "[mixed_pack_slot=slot_18_metadata_surface source=address_truth_harness_phase42_catalog_metadata_drilldown:step_01_catalog_metadata_surface]" + }, + { + "step_id": "step_02_neutral_followup_catalog_drilldown", + "title": "Neutral follow-up continues into deeper catalog metadata instead of asking for a documents-vs-movements lane choice", + "question": "давай дальше", + "allowed_reply_types": [ + "partial_coverage", + "factual_with_explanation" + ], + "required_answer_patterns_all": [ + "(?i)metadata|метадан|схем", + "(?i)справоч|catalog|directory", + "(?i)контрагент|counterpart" + ], + "forbidden_answer_patterns": [ + "(?i)документ", + "(?i)движени|регистр", + "(?i)уточн.*контур", + "(?i)получили", + "(?i)заплатили", + "(?i)нетто" + ], + "criticality": "warning", + "semantic_tags": [ + "catalog_drilldown", + "neutral_followup" + ], + "notes": "[mixed_pack_slot=slot_19_metadata_drilldown source=address_truth_harness_phase42_catalog_metadata_drilldown:step_02_neutral_followup_catalog_drilldown]" + }, + { + "step_id": "step_08_off_domain_living_chat_not_hijacked", + "title": "Off-domain living chat remains human and is not hijacked by discovery carryover", + "question": "а чем капибара отличается от утки?", + "required_answer_patterns_any": [ + "(?i)капибар.*утк|утк.*капибар", + "(?i)млекопита|птиц|грызун" + ], + "forbidden_answer_patterns": [ + "(?i)свк", + "(?i)контрагент", + "(?i)mcp", + "(?i)query_documents", + "(?i)runtime_", + "(?i)primitive" + ], + "criticality": "warning", + "semantic_tags": [ + "off_domain_living_chat", + "stale_replay_forbidden" + ], + "notes": "[mixed_pack_slot=slot_20_off_domain_guard source=address_truth_harness_phase19_mcp_discovery_response_gate:step_08_off_domain_living_chat_not_hijacked]" + } + ] +} diff --git a/docs/orchestration/agent_semantic_source_catalog.json b/docs/orchestration/agent_semantic_source_catalog.json index f3147d5..a988857 100644 --- a/docs/orchestration/agent_semantic_source_catalog.json +++ b/docs/orchestration/agent_semantic_source_catalog.json @@ -1,8 +1,8 @@ { "schema_version": "agent_semantic_source_catalog_v1", - "generated_at": "2026-05-01T13:36:23+00:00", + "generated_at": "2026-05-01T13:43:18+00:00", "summary": { - "truth_harness_steps_total": 500, + "truth_harness_steps_total": 520, "saved_session_questions_total": 229, "reusable_truth_harness_tags": { "account_60": 2, @@ -12,11 +12,11 @@ "aggregate_all_time": 1, "aggregate_revenue": 1, "aggregate_year": 1, - "alias_grounding": 6, + "alias_grounding": 7, "all_time_after_pivot": 2, "all_time_after_second_pivot": 2, "all_time_after_third_pivot": 1, - "all_time_followup": 9, + "all_time_followup": 10, "all_time_scope": 4, "ambiguity_probe": 1, "anomaly_probe": 1, @@ -26,18 +26,18 @@ "bounded_autonomy": 47, "bounded_retrieval": 13, "bridge_inventory_to_vat": 3, - "broad_business_evaluation": 2, - "broad_eval_bridge_preserved": 1, + "broad_business_evaluation": 3, + "broad_eval_bridge_preserved": 2, "broad_eval_followup_continuity": 1, "broad_evaluation_bridge": 1, "capability_meta": 3, "capability_over_followup": 2, - "catalog_drilldown": 1, + "catalog_drilldown": 2, "catalog_grounding": 1, - "catalog_metadata_surface": 1, + "catalog_metadata_surface": 2, "clarification_required": 1, "clarification_resume": 2, - "company_activity_lifecycle": 2, + "company_activity_lifecycle": 3, "company_analytics": 1, "company_authority": 3, "company_authority_probe": 1, @@ -52,14 +52,14 @@ "continuity_interrupt": 1, "contracts_followup": 16, "counterparty_carryover": 1, - "counterparty_catalog_scope": 1, + "counterparty_catalog_scope": 2, "counterparty_documents": 29, "counterparty_followup": 3, "counterparty_grounding": 1, "counterparty_item_flow": 1, "counterparty_lifecycle": 1, "counterparty_monthly_net_cash_flow": 1, - "counterparty_net_cash_flow": 4, + "counterparty_net_cash_flow": 5, "counterparty_net_value_flow": 1, "counterparty_outgoing_payments": 1, "counterparty_pronoun_resolution": 15, @@ -77,17 +77,17 @@ "current_turn_entity_authority": 1, "customer_analytics": 1, "data_scope_meta": 2, - "date_carryover": 6, + "date_carryover": 7, "date_followup": 2, "date_scope": 1, "debt_polarity": 1, "display_label_integrity": 3, "display_name_integrity": 1, - "document_evidence": 3, + "document_evidence": 4, "document_lane_after_clarification": 5, "document_lane_continuity": 6, "document_lane_execution": 5, - "document_pivot": 1, + "document_pivot": 2, "document_pivot_after_movement": 1, "document_pivot_after_movement_retrieval": 2, "document_pivot_after_retrieval": 2, @@ -97,30 +97,30 @@ "documents_followup": 7, "documents_pivot": 2, "entity_grounding": 2, - "entity_resolution": 29, + "entity_resolution": 35, "exact_not_overwritten": 2, - "followup_anchor": 6, - "followup_reuse": 21, + "followup_anchor": 7, + "followup_reuse": 26, "followup_short": 1, "fourth_pivot": 2, "garbage_anchor_forbidden": 1, - "grounded_context_seed": 1, + "grounded_context_seed": 2, "grounded_counterparty": 13, "grounded_counterparty_followup": 12, "grounded_discovery_seed": 1, "grounded_self_correction": 1, - "grounded_summary": 1, + "grounded_summary": 2, "historical_anchor": 1, "historical_date_anchor": 3, "historical_inventory": 2, "historical_restore": 1, - "human_answer": 3, + "human_answer": 4, "human_answer_quality": 2, - "human_dialog": 39, + "human_dialog": 46, "hybrid_investigation_followup": 2, "hybrid_investigation_root": 2, "incoming": 8, - "incoming_value_flow": 8, + "incoming_value_flow": 9, "inline_organization_clarification": 13, "integrity_guard": 57, "inventory_aging": 3, @@ -143,7 +143,7 @@ "manual_9lieoh": 11, "materialization_gap": 1, "mcp_discovery_bidirectional_value_flow": 2, - "mcp_discovery_gate_sanity": 1, + "mcp_discovery_gate_sanity": 2, "mcp_discovery_response_gate": 1, "mcp_discovery_supplier_payout": 1, "mcp_discovery_value_flow": 1, @@ -153,12 +153,12 @@ "meta_memory": 6, "meta_return_to_business": 1, "meta_scope": 12, - "meta_smalltalk": 13, + "meta_smalltalk": 14, "meta_verify": 1, "metadata_lane_choice_clarification": 15, "metadata_surface": 17, "mixed_ambiguity": 15, - "movement_evidence": 3, + "movement_evidence": 4, "movement_execution": 1, "movement_lane_after_clarification": 11, "movement_lane_after_metadata": 2, @@ -171,19 +171,19 @@ "multi_company_entry": 2, "multi_hop_clarification": 21, "net_switch": 1, - "net_value_flow": 5, - "neutral_followup": 16, + "net_value_flow": 6, + "neutral_followup": 17, "numeric_counterparty_suffix": 1, - "off_domain_living_chat": 2, + "off_domain_living_chat": 3, "open_scope": 9, "open_scope_net": 3, - "open_scope_total": 8, + "open_scope_total": 10, "organization_activity_age": 5, "organization_authority": 7, - "organization_clarification": 8, + "organization_clarification": 9, "organization_fact_boundary": 1, "organization_followup_reuse": 20, - "organization_scope": 28, + "organization_scope": 34, "organization_scoped": 4, "organization_second_recovery": 1, "outgoing": 3, @@ -191,7 +191,7 @@ "payables": 1, "payables_snapshot": 1, "payments_followup": 20, - "payout_switch": 4, + "payout_switch": 5, "payout_value_flow": 2, "payout_year_switch": 3, "period_carryover": 1, @@ -206,7 +206,7 @@ "period_narrowing": 1, "period_scope": 9, "pivot_seed": 8, - "planner_catalog_alignment": 13, + "planner_catalog_alignment": 26, "polarity_flip": 1, "post_f": 9, "post_f_integrity_hardening": 6, @@ -261,7 +261,7 @@ "stale_entity_seed": 1, "stale_inventory_scope": 1, "stale_lifecycle_override": 1, - "stale_replay_forbidden": 2, + "stale_replay_forbidden": 3, "stale_scope_guard": 1, "stale_temporal_carryover": 1, "supported_route_not_hijacked_by_mcp_discovery": 1, @@ -275,10 +275,10 @@ "topic_reset": 5, "translit_wording": 1, "unsupported_current_turn_meaning_boundary": 5, - "value_flow_comparison": 11, + "value_flow_comparison": 13, "value_flow_net": 6, - "value_flow_pivot": 3, - "value_flow_ranking": 17, + "value_flow_pivot": 4, + "value_flow_ranking": 19, "value_flow_total": 16, "vat": 37, "vat_colloquial_wording": 2, @@ -289,7 +289,7 @@ "vat_orientation": 2, "very_old_stock": 1, "year_specific": 1, - "year_switch": 14, + "year_switch": 16, "year_switch_after_document_pivot": 1, "year_switch_after_fourth_pivot": 2, "year_switch_after_pivot": 4, @@ -308,11 +308,11 @@ "aggregate_all_time": 1, "aggregate_revenue": 1, "aggregate_year": 1, - "alias_grounding": 6, + "alias_grounding": 7, "all_time_after_pivot": 2, "all_time_after_second_pivot": 2, "all_time_after_third_pivot": 1, - "all_time_followup": 9, + "all_time_followup": 10, "all_time_scope": 4, "ambiguity_probe": 1, "anomaly_probe": 1, @@ -322,18 +322,18 @@ "bounded_autonomy": 47, "bounded_retrieval": 13, "bridge_inventory_to_vat": 3, - "broad_business_evaluation": 2, - "broad_eval_bridge_preserved": 1, + "broad_business_evaluation": 3, + "broad_eval_bridge_preserved": 2, "broad_eval_followup_continuity": 1, "broad_evaluation_bridge": 1, "capability_meta": 3, "capability_over_followup": 2, - "catalog_drilldown": 1, + "catalog_drilldown": 2, "catalog_grounding": 1, - "catalog_metadata_surface": 1, + "catalog_metadata_surface": 2, "clarification_required": 1, "clarification_resume": 2, - "company_activity_lifecycle": 2, + "company_activity_lifecycle": 3, "company_analytics": 1, "company_authority": 3, "company_authority_probe": 1, @@ -348,14 +348,14 @@ "continuity_interrupt": 1, "contracts_followup": 16, "counterparty_carryover": 1, - "counterparty_catalog_scope": 1, + "counterparty_catalog_scope": 2, "counterparty_documents": 35, "counterparty_followup": 3, "counterparty_grounding": 1, "counterparty_item_flow": 1, "counterparty_lifecycle": 1, "counterparty_monthly_net_cash_flow": 1, - "counterparty_net_cash_flow": 4, + "counterparty_net_cash_flow": 5, "counterparty_net_value_flow": 1, "counterparty_outgoing_payments": 1, "counterparty_pronoun_resolution": 15, @@ -373,17 +373,17 @@ "current_turn_entity_authority": 1, "customer_analytics": 1, "data_scope_meta": 2, - "date_carryover": 6, + "date_carryover": 7, "date_followup": 2, "date_scope": 1, "debt_polarity": 1, "display_label_integrity": 3, "display_name_integrity": 1, - "document_evidence": 3, + "document_evidence": 4, "document_lane_after_clarification": 5, "document_lane_continuity": 6, "document_lane_execution": 5, - "document_pivot": 1, + "document_pivot": 2, "document_pivot_after_movement": 1, "document_pivot_after_movement_retrieval": 2, "document_pivot_after_retrieval": 2, @@ -393,30 +393,30 @@ "documents_followup": 7, "documents_pivot": 2, "entity_grounding": 2, - "entity_resolution": 29, + "entity_resolution": 35, "exact_not_overwritten": 2, - "followup_anchor": 6, - "followup_reuse": 21, + "followup_anchor": 7, + "followup_reuse": 26, "followup_short": 1, "fourth_pivot": 2, "garbage_anchor_forbidden": 1, - "grounded_context_seed": 1, + "grounded_context_seed": 2, "grounded_counterparty": 13, "grounded_counterparty_followup": 12, "grounded_discovery_seed": 1, "grounded_self_correction": 1, - "grounded_summary": 1, + "grounded_summary": 2, "historical_anchor": 1, "historical_date_anchor": 3, "historical_inventory": 2, "historical_restore": 1, - "human_answer": 3, + "human_answer": 4, "human_answer_quality": 2, - "human_dialog": 39, + "human_dialog": 46, "hybrid_investigation_followup": 2, "hybrid_investigation_root": 2, "incoming": 8, - "incoming_value_flow": 8, + "incoming_value_flow": 9, "inline_organization_clarification": 13, "integrity_guard": 57, "inventory_aging": 3, @@ -439,7 +439,7 @@ "manual_9lieoh": 11, "materialization_gap": 1, "mcp_discovery_bidirectional_value_flow": 2, - "mcp_discovery_gate_sanity": 1, + "mcp_discovery_gate_sanity": 2, "mcp_discovery_response_gate": 1, "mcp_discovery_supplier_payout": 1, "mcp_discovery_value_flow": 1, @@ -449,12 +449,12 @@ "meta_memory": 11, "meta_return_to_business": 1, "meta_scope": 20, - "meta_smalltalk": 19, + "meta_smalltalk": 20, "meta_verify": 1, "metadata_lane_choice_clarification": 15, "metadata_surface": 17, "mixed_ambiguity": 15, - "movement_evidence": 3, + "movement_evidence": 4, "movement_execution": 1, "movement_lane_after_clarification": 11, "movement_lane_after_metadata": 2, @@ -467,19 +467,19 @@ "multi_company_entry": 2, "multi_hop_clarification": 21, "net_switch": 1, - "net_value_flow": 5, - "neutral_followup": 16, + "net_value_flow": 6, + "neutral_followup": 17, "numeric_counterparty_suffix": 1, - "off_domain_living_chat": 2, + "off_domain_living_chat": 3, "open_scope": 9, "open_scope_net": 3, - "open_scope_total": 8, + "open_scope_total": 10, "organization_activity_age": 5, "organization_authority": 7, - "organization_clarification": 8, + "organization_clarification": 9, "organization_fact_boundary": 1, "organization_followup_reuse": 20, - "organization_scope": 28, + "organization_scope": 34, "organization_scoped": 4, "organization_second_recovery": 1, "outgoing": 3, @@ -487,7 +487,7 @@ "payables": 1, "payables_snapshot": 1, "payments_followup": 20, - "payout_switch": 4, + "payout_switch": 5, "payout_value_flow": 2, "payout_year_switch": 3, "period_carryover": 1, @@ -502,7 +502,7 @@ "period_narrowing": 1, "period_scope": 9, "pivot_seed": 8, - "planner_catalog_alignment": 13, + "planner_catalog_alignment": 26, "polarity_flip": 1, "post_f": 9, "post_f_integrity_hardening": 6, @@ -557,7 +557,7 @@ "stale_entity_seed": 1, "stale_inventory_scope": 1, "stale_lifecycle_override": 1, - "stale_replay_forbidden": 2, + "stale_replay_forbidden": 3, "stale_scope_guard": 1, "stale_temporal_carryover": 1, "supported_route_not_hijacked_by_mcp_discovery": 1, @@ -571,10 +571,10 @@ "topic_reset": 5, "translit_wording": 1, "unsupported_current_turn_meaning_boundary": 5, - "value_flow_comparison": 11, + "value_flow_comparison": 13, "value_flow_net": 6, - "value_flow_pivot": 3, - "value_flow_ranking": 17, + "value_flow_pivot": 4, + "value_flow_ranking": 19, "value_flow_total": 16, "vat": 53, "vat_colloquial_wording": 2, @@ -585,7 +585,7 @@ "vat_orientation": 2, "very_old_stock": 1, "year_specific": 1, - "year_switch": 14, + "year_switch": 16, "year_switch_after_document_pivot": 1, "year_switch_after_fourth_pivot": 2, "year_switch_after_pivot": 4, @@ -20049,6 +20049,1035 @@ ] } }, + { + "entry_id": "address_truth_harness_phase83_planner_brain_alignment_mix:step_01_human_smalltalk_sanity", + "source_type": "truth_harness_step", + "source_file": "docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json", + "source_title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "reusable_in_agent_pack": true, + "step_id": "step_01_human_smalltalk_sanity", + "title": "Human smalltalk remains living chat and does not expose discovery internals", + "question": "привет, ты на связи?", + "criticality": "info", + "expected_intents": [], + "expected_catalog_alignment_status": null, + "expected_catalog_chain_top_match": null, + "expected_catalog_selected_matches_top": null, + "semantic_tags": [ + "human_answer", + "mcp_discovery_gate_sanity", + "meta_smalltalk" + ], + "step_payload": { + "step_id": "step_01_human_smalltalk_sanity", + "title": "Human smalltalk remains living chat and does not expose discovery internals", + "question": "привет, ты на связи?", + "required_answer_patterns_any": [ + "(?i)привет|на связи|готов|помочь" + ], + "forbidden_answer_patterns": [ + "(?i)mcp", + "(?i)runtime_", + "(?i)query_documents", + "(?i)primitive" + ], + "criticality": "info", + "semantic_tags": [ + "human_answer", + "mcp_discovery_gate_sanity", + "meta_smalltalk" + ], + "notes": "[mixed_pack_slot=slot_01_smalltalk_sanity source=address_truth_harness_phase19_mcp_discovery_response_gate:step_01_human_smalltalk_sanity]" + } + }, + { + "entry_id": "address_truth_harness_phase83_planner_brain_alignment_mix:step_01_resolve_counterparty_alias", + "source_type": "truth_harness_step", + "source_file": "docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json", + "source_title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "reusable_in_agent_pack": true, + "step_id": "step_01_resolve_counterparty_alias", + "title": "Entity resolution grounds the checked 1C counterparty from a loose alias", + "question": "найди в 1С контрагента СВК", + "criticality": "critical", + "expected_intents": [], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "entity_resolution", + "expected_catalog_selected_matches_top": true, + "semantic_tags": [ + "entity_resolution", + "alias_grounding", + "followup_anchor", + "planner_catalog_alignment" + ], + "step_payload": { + "step_id": "step_01_resolve_counterparty_alias", + "title": "Entity resolution grounds the checked 1C counterparty from a loose alias", + "question": "найди в 1С контрагента СВК", + "allowed_reply_types": [ + "factual", + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "entity_resolution", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)свк", + "(?i)контрагент" + ], + "required_answer_patterns_any": [ + "(?i)группа\\s+свк", + "(?i)каталог", + "(?i)найден", + "(?i)наиболее вероятн" + ], + "forbidden_answer_patterns": [ + "(?i)получили", + "(?i)заплатили", + "(?i)нетто", + "(?i)оборот", + "(?i)выручк", + "(?i)сумм(а|ы)" + ], + "criticality": "critical", + "semantic_tags": [ + "entity_resolution", + "alias_grounding", + "followup_anchor", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_02_counterparty_grounding source=address_truth_harness_phase32_planner_selected_chain_end_to_end:step_01_resolve_counterparty_alias]" + } + }, + { + "entry_id": "address_truth_harness_phase83_planner_brain_alignment_mix:step_02_incoming_by_resolved_entity", + "source_type": "truth_harness_step", + "source_file": "docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json", + "source_title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "reusable_in_agent_pack": true, + "step_id": "step_02_incoming_by_resolved_entity", + "title": "Incoming value-flow follow-up reuses the resolved counterparty anchor", + "question": "сколько получили по нему за 2020 год", + "criticality": "critical", + "expected_intents": [], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow", + "expected_catalog_selected_matches_top": true, + "semantic_tags": [ + "entity_resolution", + "incoming_value_flow", + "followup_reuse", + "planner_catalog_alignment" + ], + "step_payload": { + "step_id": "step_02_incoming_by_resolved_entity", + "title": "Incoming value-flow follow-up reuses the resolved counterparty anchor", + "question": "сколько получили по нему за 2020 год", + "allowed_reply_types": [ + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)2020", + "(?i)получил|входящ|поступ", + "(?i)руб" + ], + "required_answer_patterns_any": [ + "(?i)группа\\s+свк", + "(?i)свк" + ], + "forbidden_answer_patterns": [ + "(?i)не найден контрагент", + "(?i)уточните, какого контрагента", + "(?i)по какому контрагенту" + ], + "criticality": "critical", + "semantic_tags": [ + "entity_resolution", + "incoming_value_flow", + "followup_reuse", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_03_counterparty_incoming source=address_truth_harness_phase32_planner_selected_chain_end_to_end:step_02_incoming_by_resolved_entity]" + } + }, + { + "entry_id": "address_truth_harness_phase83_planner_brain_alignment_mix:step_03_payout_switch_by_resolved_entity", + "source_type": "truth_harness_step", + "source_file": "docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json", + "source_title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "reusable_in_agent_pack": true, + "step_id": "step_03_payout_switch_by_resolved_entity", + "title": "Outgoing payment follow-up keeps the same grounded counterparty and checked year", + "question": "а теперь сколько заплатили?", + "criticality": "critical", + "expected_intents": [], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow", + "expected_catalog_selected_matches_top": true, + "semantic_tags": [ + "entity_resolution", + "payout_switch", + "followup_reuse", + "date_carryover", + "planner_catalog_alignment" + ], + "step_payload": { + "step_id": "step_03_payout_switch_by_resolved_entity", + "title": "Outgoing payment follow-up keeps the same grounded counterparty and checked year", + "question": "а теперь сколько заплатили?", + "allowed_reply_types": [ + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)2020", + "(?i)заплатил|исходящ|списан|платеж", + "(?i)руб" + ], + "required_answer_patterns_any": [ + "(?i)группа\\s+свк", + "(?i)свк" + ], + "forbidden_answer_patterns": [ + "(?i)не найден контрагент", + "(?i)уточните, какого контрагента", + "(?i)по какому контрагенту", + "(?i)за какой год" + ], + "criticality": "critical", + "semantic_tags": [ + "entity_resolution", + "payout_switch", + "followup_reuse", + "date_carryover", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_04_counterparty_payout source=address_truth_harness_phase32_planner_selected_chain_end_to_end:step_03_payout_switch_by_resolved_entity]" + } + }, + { + "entry_id": "address_truth_harness_phase83_planner_brain_alignment_mix:step_04_net_after_payout", + "source_type": "truth_harness_step", + "source_file": "docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json", + "source_title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "reusable_in_agent_pack": true, + "step_id": "step_04_net_after_payout", + "title": "Net-flow follow-up reuses the same grounded counterparty and checked year after payout", + "question": "а какое нетто?", + "criticality": "critical", + "expected_intents": [], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow_comparison", + "expected_catalog_selected_matches_top": true, + "semantic_tags": [ + "entity_resolution", + "net_value_flow", + "followup_reuse", + "planner_catalog_alignment" + ], + "step_payload": { + "step_id": "step_04_net_after_payout", + "title": "Net-flow follow-up reuses the same grounded counterparty and checked year after payout", + "question": "а какое нетто?", + "allowed_reply_types": [ + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow_comparison", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)2020", + "(?i)нетто|сальдо", + "(?i)руб" + ], + "required_answer_patterns_any": [ + "(?i)получ", + "(?i)заплат", + "(?i)группа\\s+свк", + "(?i)свк" + ], + "forbidden_answer_patterns": [ + "(?i)не найден контрагент", + "(?i)уточните, какого контрагента", + "(?i)по какому контрагенту" + ], + "criticality": "critical", + "semantic_tags": [ + "entity_resolution", + "net_value_flow", + "followup_reuse", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_05_counterparty_net source=address_truth_harness_phase32_planner_selected_chain_end_to_end:step_04_net_after_payout]" + } + }, + { + "entry_id": "address_truth_harness_phase83_planner_brain_alignment_mix:step_05_documents_after_net", + "source_type": "truth_harness_step", + "source_file": "docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json", + "source_title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "reusable_in_agent_pack": true, + "step_id": "step_05_documents_after_net", + "title": "Document evidence follow-up keeps the grounded counterparty after the net answer", + "question": "а по документам?", + "criticality": "critical", + "expected_intents": [], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "document_evidence", + "expected_catalog_selected_matches_top": true, + "semantic_tags": [ + "entity_resolution", + "document_evidence", + "value_flow_pivot", + "followup_reuse", + "planner_catalog_alignment" + ], + "step_payload": { + "step_id": "step_05_documents_after_net", + "title": "Document evidence follow-up keeps the grounded counterparty after the net answer", + "question": "а по документам?", + "allowed_reply_types": [ + "factual", + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "document_evidence", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)документ|счет|накладн|акт" + ], + "required_answer_patterns_any": [ + "(?i)группа\\s+свк", + "(?i)свк", + "(?i)2020" + ], + "forbidden_answer_patterns": [ + "(?i)не найден контрагент", + "(?i)уточните, какого контрагента", + "(?i)по какому контрагенту", + "(?i)сколько получили", + "(?i)сколько заплатили", + "(?i)нетто" + ], + "criticality": "critical", + "semantic_tags": [ + "entity_resolution", + "document_evidence", + "value_flow_pivot", + "followup_reuse", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_06_counterparty_documents source=address_truth_harness_phase32_planner_selected_chain_end_to_end:step_05_documents_after_net]" + } + }, + { + "entry_id": "address_truth_harness_phase83_planner_brain_alignment_mix:step_06_movements_after_documents", + "source_type": "truth_harness_step", + "source_file": "docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json", + "source_title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "reusable_in_agent_pack": true, + "step_id": "step_06_movements_after_documents", + "title": "Movement evidence follow-up keeps the grounded counterparty after the document answer", + "question": "а по движениям?", + "criticality": "critical", + "expected_intents": [], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "movement_evidence", + "expected_catalog_selected_matches_top": true, + "semantic_tags": [ + "entity_resolution", + "movement_evidence", + "document_pivot", + "followup_reuse", + "planner_catalog_alignment" + ], + "step_payload": { + "step_id": "step_06_movements_after_documents", + "title": "Movement evidence follow-up keeps the grounded counterparty after the document answer", + "question": "а по движениям?", + "allowed_reply_types": [ + "factual", + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "movement_evidence", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)движени|операц|платеж|списан|поступ" + ], + "required_answer_patterns_any": [ + "(?i)группа\\s+свк", + "(?i)свк", + "(?i)2020" + ], + "forbidden_answer_patterns": [ + "(?i)не найден контрагент", + "(?i)уточните, какого контрагента", + "(?i)по какому контрагенту", + "(?i)сколько получили", + "(?i)сколько заплатили", + "(?i)нетто" + ], + "criticality": "critical", + "semantic_tags": [ + "entity_resolution", + "movement_evidence", + "document_pivot", + "followup_reuse", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_07_counterparty_movements source=address_truth_harness_phase32_planner_selected_chain_end_to_end:step_06_movements_after_documents]" + } + }, + { + "entry_id": "address_truth_harness_phase83_planner_brain_alignment_mix:step_01_open_scope_incoming_total", + "source_type": "truth_harness_step", + "source_file": "docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json", + "source_title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "reusable_in_agent_pack": true, + "step_id": "step_01_open_scope_incoming_total", + "title": "The user asks for incoming money without naming the organization yet", + "question": "Хочу быстрый денежный срез по одной организации без привязки к контрагенту. Сколько вообще входящих денег было за 2020 год?", + "criticality": "critical", + "expected_intents": [], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow", + "expected_catalog_selected_matches_top": true, + "semantic_tags": [ + "open_scope_total", + "organization_scope", + "human_dialog", + "planner_catalog_alignment" + ], + "step_payload": { + "step_id": "step_01_open_scope_incoming_total", + "title": "The user asks for incoming money without naming the organization yet", + "question": "Хочу быстрый денежный срез по одной организации без привязки к контрагенту. Сколько вообще входящих денег было за 2020 год?", + "allowed_reply_types": [ + "clarification_required", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)уточн|нужно", + "(?i)организац" + ], + "criticality": "critical", + "semantic_tags": [ + "open_scope_total", + "organization_scope", + "human_dialog", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_08_open_org_total source=address_truth_harness_phase66_human_org_open_scope_dialog:step_01_open_scope_incoming_total]" + } + }, + { + "entry_id": "address_truth_harness_phase83_planner_brain_alignment_mix:step_02_all_time_same_open_scope", + "source_type": "truth_harness_step", + "source_file": "docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json", + "source_title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "reusable_in_agent_pack": true, + "step_id": "step_02_all_time_same_open_scope", + "title": "The user selects the organization and gets the 2020 incoming total", + "question": "По ООО Альтернатива Плюс.", + "criticality": "critical", + "expected_intents": [], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow", + "expected_catalog_selected_matches_top": true, + "semantic_tags": [ + "organization_clarification", + "open_scope_total", + "human_dialog", + "planner_catalog_alignment" + ], + "step_payload": { + "step_id": "step_02_all_time_same_open_scope", + "title": "The user selects the organization and gets the 2020 incoming total", + "question": "По ООО Альтернатива Плюс.", + "allowed_reply_types": [ + "factual", + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)2020", + "(?i)входящ|поступлен|получ" + ], + "forbidden_answer_patterns": [ + "(?i)уточните .*контрагент", + "(?i)не найден контрагент", + "(?i)уточните .*организац" + ], + "criticality": "critical", + "semantic_tags": [ + "organization_clarification", + "open_scope_total", + "human_dialog", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_09_open_org_selection source=address_truth_harness_phase66_human_org_open_scope_dialog:step_02_all_time_same_open_scope]" + } + }, + { + "entry_id": "address_truth_harness_phase83_planner_brain_alignment_mix:step_03_all_time_same_open_scope", + "source_type": "truth_harness_step", + "source_file": "docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json", + "source_title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "reusable_in_agent_pack": true, + "step_id": "step_03_all_time_same_open_scope", + "title": "The user broadens the same organization slice to all available time", + "question": "Понял, тогда за все время.", + "criticality": "critical", + "expected_intents": [], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow", + "expected_catalog_selected_matches_top": true, + "semantic_tags": [ + "all_time_followup", + "organization_scope", + "human_dialog", + "planner_catalog_alignment" + ], + "step_payload": { + "step_id": "step_03_all_time_same_open_scope", + "title": "The user broadens the same organization slice to all available time", + "question": "Понял, тогда за все время.", + "allowed_reply_types": [ + "factual", + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)все доступное время|все время|весь период", + "(?i)входящ|поступлен|получ" + ], + "forbidden_answer_patterns": [ + "(?i)за 2020", + "(?i)уточните .*контрагент", + "(?i)уточните .*период", + "(?i)уточните .*организац" + ], + "criticality": "critical", + "semantic_tags": [ + "all_time_followup", + "organization_scope", + "human_dialog", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_10_open_org_all_time source=address_truth_harness_phase66_human_org_open_scope_dialog:step_03_all_time_same_open_scope]" + } + }, + { + "entry_id": "address_truth_harness_phase83_planner_brain_alignment_mix:step_04_bidirectional_comparison", + "source_type": "truth_harness_step", + "source_file": "docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json", + "source_title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "reusable_in_agent_pack": true, + "step_id": "step_04_bidirectional_comparison", + "title": "The user asks which money direction is larger for the organization", + "question": "Хорошо. А что по ООО Альтернатива Плюс больше в 2020 году: входящие или исходящие деньги?", + "criticality": "critical", + "expected_intents": [], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow_comparison", + "expected_catalog_selected_matches_top": true, + "semantic_tags": [ + "value_flow_comparison", + "organization_scope", + "human_dialog", + "planner_catalog_alignment" + ], + "step_payload": { + "step_id": "step_04_bidirectional_comparison", + "title": "The user asks which money direction is larger for the organization", + "question": "Хорошо. А что по ООО Альтернатива Плюс больше в 2020 году: входящие или исходящие деньги?", + "allowed_reply_types": [ + "factual", + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow_comparison", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)2020", + "(?i)входящ|исходящ|получ|заплат|больше" + ], + "criticality": "critical", + "semantic_tags": [ + "value_flow_comparison", + "organization_scope", + "human_dialog", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_11_open_org_comparison source=address_truth_harness_phase66_human_org_open_scope_dialog:step_04_bidirectional_comparison]" + } + }, + { + "entry_id": "address_truth_harness_phase83_planner_brain_alignment_mix:step_05_comparison_year_switch", + "source_type": "truth_harness_step", + "source_file": "docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json", + "source_title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "reusable_in_agent_pack": true, + "step_id": "step_05_comparison_year_switch", + "title": "The user asks the same comparison for another year", + "question": "А что по ООО Альтернатива Плюс больше уже за 2021 год: входящие или исходящие деньги?", + "criticality": "critical", + "expected_intents": [], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow_comparison", + "expected_catalog_selected_matches_top": true, + "semantic_tags": [ + "value_flow_comparison", + "year_switch", + "organization_scope", + "human_dialog", + "planner_catalog_alignment" + ], + "step_payload": { + "step_id": "step_05_comparison_year_switch", + "title": "The user asks the same comparison for another year", + "question": "А что по ООО Альтернатива Плюс больше уже за 2021 год: входящие или исходящие деньги?", + "allowed_reply_types": [ + "factual", + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow_comparison", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)2021", + "(?i)входящ|исходящ|получ|заплат|больше" + ], + "criticality": "critical", + "semantic_tags": [ + "value_flow_comparison", + "year_switch", + "organization_scope", + "human_dialog", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_12_open_org_comparison_year_switch source=address_truth_harness_phase66_human_org_open_scope_dialog:step_05_comparison_year_switch]" + } + }, + { + "entry_id": "address_truth_harness_phase83_planner_brain_alignment_mix:step_06_ranking_top_counterparty", + "source_type": "truth_harness_step", + "source_file": "docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json", + "source_title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "reusable_in_agent_pack": true, + "step_id": "step_06_ranking_top_counterparty", + "title": "The user asks who brought the most money for the organization", + "question": "И кто больше всего принес денег этой организации в 2020 году?", + "criticality": "critical", + "expected_intents": [], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow_ranking", + "expected_catalog_selected_matches_top": true, + "semantic_tags": [ + "value_flow_ranking", + "organization_scope", + "human_dialog", + "planner_catalog_alignment" + ], + "step_payload": { + "step_id": "step_06_ranking_top_counterparty", + "title": "The user asks who brought the most money for the organization", + "question": "И кто больше всего принес денег этой организации в 2020 году?", + "allowed_reply_types": [ + "factual", + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow_ranking", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)2020", + "(?i)кто|контрагент|клиент|принес|доход" + ], + "criticality": "critical", + "semantic_tags": [ + "value_flow_ranking", + "organization_scope", + "human_dialog", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_13_open_org_ranking source=address_truth_harness_phase66_human_org_open_scope_dialog:step_06_ranking_top_counterparty]" + } + }, + { + "entry_id": "address_truth_harness_phase83_planner_brain_alignment_mix:step_07_ranking_year_switch", + "source_type": "truth_harness_step", + "source_file": "docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json", + "source_title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "reusable_in_agent_pack": true, + "step_id": "step_07_ranking_year_switch", + "title": "The user asks the same ranking for another year", + "question": "А в 2021 году?", + "criticality": "critical", + "expected_intents": [], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow_ranking", + "expected_catalog_selected_matches_top": true, + "semantic_tags": [ + "value_flow_ranking", + "year_switch", + "organization_scope", + "human_dialog", + "planner_catalog_alignment" + ], + "step_payload": { + "step_id": "step_07_ranking_year_switch", + "title": "The user asks the same ranking for another year", + "question": "А в 2021 году?", + "allowed_reply_types": [ + "factual", + "factual_with_explanation", + "partial_coverage" + ], + "expected_catalog_alignment_status": "selected_matches_top", + "expected_catalog_chain_top_match": "value_flow_ranking", + "expected_catalog_selected_matches_top": true, + "required_answer_patterns_all": [ + "(?i)2021" + ], + "criticality": "critical", + "semantic_tags": [ + "value_flow_ranking", + "year_switch", + "organization_scope", + "human_dialog", + "planner_catalog_alignment" + ], + "notes": "[mixed_pack_slot=slot_14_open_org_ranking_year_switch source=address_truth_harness_phase66_human_org_open_scope_dialog:step_07_ranking_year_switch]" + } + }, + { + "entry_id": "address_truth_harness_phase83_planner_brain_alignment_mix:step_01_company_activity_lifecycle", + "source_type": "truth_harness_step", + "source_file": "docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json", + "source_title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "reusable_in_agent_pack": true, + "step_id": "step_01_company_activity_lifecycle", + "title": "Lifecycle answer seeds grounded organization context", + "question": "а по Альтернативе Плюс сколько лет активности в базе 1С?", + "criticality": "warning", + "expected_intents": [], + "expected_catalog_alignment_status": null, + "expected_catalog_chain_top_match": null, + "expected_catalog_selected_matches_top": null, + "semantic_tags": [ + "company_activity_lifecycle", + "grounded_context_seed" + ], + "step_payload": { + "step_id": "step_01_company_activity_lifecycle", + "title": "Lifecycle answer seeds grounded organization context", + "question": "а по Альтернативе Плюс сколько лет активности в базе 1С?", + "allowed_reply_types": [ + "partial_coverage", + "factual", + "factual_with_explanation" + ], + "required_answer_patterns_any": [ + "(?i)лет", + "(?i)активност", + "(?i)1с", + "(?i)не получил|не подтвержден|проверил доступный контур" + ], + "criticality": "warning", + "semantic_tags": [ + "company_activity_lifecycle", + "grounded_context_seed" + ], + "notes": "[mixed_pack_slot=slot_15_broad_eval_context source=address_truth_harness_phase22_broad_business_evaluation_bridge:step_01_company_activity_lifecycle]" + } + }, + { + "entry_id": "address_truth_harness_phase83_planner_brain_alignment_mix:step_02_broad_business_evaluation", + "source_type": "truth_harness_step", + "source_file": "docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json", + "source_title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "reusable_in_agent_pack": true, + "step_id": "step_02_broad_business_evaluation", + "title": "Broad business evaluation becomes grounded summary instead of stale lifecycle dump", + "question": "Как ты оценишь деятельность компании?", + "criticality": "warning", + "expected_intents": [], + "expected_catalog_alignment_status": null, + "expected_catalog_chain_top_match": null, + "expected_catalog_selected_matches_top": null, + "semantic_tags": [ + "broad_business_evaluation", + "grounded_summary" + ], + "step_payload": { + "step_id": "step_02_broad_business_evaluation", + "title": "Broad business evaluation becomes grounded summary instead of stale lifecycle dump", + "question": "Как ты оценишь деятельность компании?", + "required_answer_patterns_all": [ + "(?i)коротко|оценк|частичн", + "(?i)1с|подтвержд", + "(?i)денежн|долг|ндс|контрагент|операц" + ], + "forbidden_answer_patterns": [ + "(?i)активных заказчиков", + "(?i)последняя активность", + "(?i)^\\s*1\\." + ], + "criticality": "warning", + "semantic_tags": [ + "broad_business_evaluation", + "grounded_summary" + ], + "notes": "[mixed_pack_slot=slot_16_broad_eval_bridge source=address_truth_harness_phase22_broad_business_evaluation_bridge:step_02_broad_business_evaluation]" + } + }, + { + "entry_id": "address_truth_harness_phase83_planner_brain_alignment_mix:step_03_net_flow_after_broad_eval", + "source_type": "truth_harness_step", + "source_file": "docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json", + "source_title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "reusable_in_agent_pack": true, + "step_id": "step_03_net_flow_after_broad_eval", + "title": "Exact net-flow follow-up still answers after the broad bridge", + "question": "какое нетто по деньгам с Группа СВК за 2020 год: сколько получили и сколько заплатили?", + "criticality": "critical", + "expected_intents": [], + "expected_catalog_alignment_status": null, + "expected_catalog_chain_top_match": null, + "expected_catalog_selected_matches_top": null, + "semantic_tags": [ + "counterparty_net_cash_flow", + "broad_eval_bridge_preserved" + ], + "step_payload": { + "step_id": "step_03_net_flow_after_broad_eval", + "title": "Exact net-flow follow-up still answers after the broad bridge", + "question": "какое нетто по деньгам с Группа СВК за 2020 год: сколько получили и сколько заплатили?", + "allowed_reply_types": [ + "partial_coverage", + "factual_with_explanation" + ], + "required_answer_patterns_all": [ + "(?i)свк", + "(?i)получил|входящ|поступ", + "(?i)заплат|исходящ|списан|плат[её]ж", + "(?i)нетто|сальдо|разниц", + "(?i)2020|период", + "(?i)руб" + ], + "forbidden_answer_patterns": [ + "(?i)активных заказчиков", + "(?i)лет в базе", + "(?i)последняя активность" + ], + "criticality": "critical", + "semantic_tags": [ + "counterparty_net_cash_flow", + "broad_eval_bridge_preserved" + ], + "notes": "[mixed_pack_slot=slot_17_broad_eval_return_to_net source=address_truth_harness_phase22_broad_business_evaluation_bridge:step_03_net_flow_after_broad_eval]" + } + }, + { + "entry_id": "address_truth_harness_phase83_planner_brain_alignment_mix:step_01_catalog_metadata_surface", + "source_type": "truth_harness_step", + "source_file": "docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json", + "source_title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "reusable_in_agent_pack": true, + "step_id": "step_01_catalog_metadata_surface", + "title": "Catalog-oriented metadata surface is surfaced honestly for counterparties", + "question": "какие справочники 1С есть по контрагентам?", + "criticality": "warning", + "expected_intents": [], + "expected_catalog_alignment_status": null, + "expected_catalog_chain_top_match": null, + "expected_catalog_selected_matches_top": null, + "semantic_tags": [ + "catalog_metadata_surface", + "counterparty_catalog_scope" + ], + "step_payload": { + "step_id": "step_01_catalog_metadata_surface", + "title": "Catalog-oriented metadata surface is surfaced honestly for counterparties", + "question": "какие справочники 1С есть по контрагентам?", + "allowed_reply_types": [ + "partial_coverage", + "factual_with_explanation" + ], + "required_answer_patterns_all": [ + "(?i)metadata|метадан", + "(?i)справоч|catalog|directory", + "(?i)контрагент" + ], + "forbidden_answer_patterns": [ + "(?i)получили", + "(?i)заплатили", + "(?i)нетто", + "(?i)документные строки найдены", + "(?i)строки денежных движений найдены" + ], + "criticality": "warning", + "semantic_tags": [ + "catalog_metadata_surface", + "counterparty_catalog_scope" + ], + "notes": "[mixed_pack_slot=slot_18_metadata_surface source=address_truth_harness_phase42_catalog_metadata_drilldown:step_01_catalog_metadata_surface]" + } + }, + { + "entry_id": "address_truth_harness_phase83_planner_brain_alignment_mix:step_02_neutral_followup_catalog_drilldown", + "source_type": "truth_harness_step", + "source_file": "docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json", + "source_title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "reusable_in_agent_pack": true, + "step_id": "step_02_neutral_followup_catalog_drilldown", + "title": "Neutral follow-up continues into deeper catalog metadata instead of asking for a documents-vs-movements lane choice", + "question": "давай дальше", + "criticality": "warning", + "expected_intents": [], + "expected_catalog_alignment_status": null, + "expected_catalog_chain_top_match": null, + "expected_catalog_selected_matches_top": null, + "semantic_tags": [ + "catalog_drilldown", + "neutral_followup" + ], + "step_payload": { + "step_id": "step_02_neutral_followup_catalog_drilldown", + "title": "Neutral follow-up continues into deeper catalog metadata instead of asking for a documents-vs-movements lane choice", + "question": "давай дальше", + "allowed_reply_types": [ + "partial_coverage", + "factual_with_explanation" + ], + "required_answer_patterns_all": [ + "(?i)metadata|метадан|схем", + "(?i)справоч|catalog|directory", + "(?i)контрагент|counterpart" + ], + "forbidden_answer_patterns": [ + "(?i)документ", + "(?i)движени|регистр", + "(?i)уточн.*контур", + "(?i)получили", + "(?i)заплатили", + "(?i)нетто" + ], + "criticality": "warning", + "semantic_tags": [ + "catalog_drilldown", + "neutral_followup" + ], + "notes": "[mixed_pack_slot=slot_19_metadata_drilldown source=address_truth_harness_phase42_catalog_metadata_drilldown:step_02_neutral_followup_catalog_drilldown]" + } + }, + { + "entry_id": "address_truth_harness_phase83_planner_brain_alignment_mix:step_08_off_domain_living_chat_not_hijacked", + "source_type": "truth_harness_step", + "source_file": "docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json", + "source_title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "reusable_in_agent_pack": true, + "step_id": "step_08_off_domain_living_chat_not_hijacked", + "title": "Off-domain living chat remains human and is not hijacked by discovery carryover", + "question": "а чем капибара отличается от утки?", + "criticality": "warning", + "expected_intents": [], + "expected_catalog_alignment_status": null, + "expected_catalog_chain_top_match": null, + "expected_catalog_selected_matches_top": null, + "semantic_tags": [ + "off_domain_living_chat", + "stale_replay_forbidden" + ], + "step_payload": { + "step_id": "step_08_off_domain_living_chat_not_hijacked", + "title": "Off-domain living chat remains human and is not hijacked by discovery carryover", + "question": "а чем капибара отличается от утки?", + "required_answer_patterns_any": [ + "(?i)капибар.*утк|утк.*капибар", + "(?i)млекопита|птиц|грызун" + ], + "forbidden_answer_patterns": [ + "(?i)свк", + "(?i)контрагент", + "(?i)mcp", + "(?i)query_documents", + "(?i)runtime_", + "(?i)primitive" + ], + "criticality": "warning", + "semantic_tags": [ + "off_domain_living_chat", + "stale_replay_forbidden" + ], + "notes": "[mixed_pack_slot=slot_20_off_domain_guard source=address_truth_harness_phase19_mcp_discovery_response_gate:step_08_off_domain_living_chat_not_hijacked]" + } + }, { "entry_id": "address_truth_harness_phase8_manual_runtime_authority_mix:step_01_smalltalk", "source_type": "truth_harness_step", diff --git a/docs/orchestration/agent_semantic_source_catalog.md b/docs/orchestration/agent_semantic_source_catalog.md index b62a75a..cef3f82 100644 --- a/docs/orchestration/agent_semantic_source_catalog.md +++ b/docs/orchestration/agent_semantic_source_catalog.md @@ -1,6 +1,6 @@ # Agent semantic source catalog -- truth_harness_steps_total: `500` +- truth_harness_steps_total: `520` - saved_session_questions_total: `229` ## Reusable truth-harness tags @@ -11,11 +11,11 @@ - `aggregate_all_time`: `1` - `aggregate_revenue`: `1` - `aggregate_year`: `1` -- `alias_grounding`: `6` +- `alias_grounding`: `7` - `all_time_after_pivot`: `2` - `all_time_after_second_pivot`: `2` - `all_time_after_third_pivot`: `1` -- `all_time_followup`: `9` +- `all_time_followup`: `10` - `all_time_scope`: `4` - `ambiguity_probe`: `1` - `anomaly_probe`: `1` @@ -25,18 +25,18 @@ - `bounded_autonomy`: `47` - `bounded_retrieval`: `13` - `bridge_inventory_to_vat`: `3` -- `broad_business_evaluation`: `2` -- `broad_eval_bridge_preserved`: `1` +- `broad_business_evaluation`: `3` +- `broad_eval_bridge_preserved`: `2` - `broad_eval_followup_continuity`: `1` - `broad_evaluation_bridge`: `1` - `capability_meta`: `3` - `capability_over_followup`: `2` -- `catalog_drilldown`: `1` +- `catalog_drilldown`: `2` - `catalog_grounding`: `1` -- `catalog_metadata_surface`: `1` +- `catalog_metadata_surface`: `2` - `clarification_required`: `1` - `clarification_resume`: `2` -- `company_activity_lifecycle`: `2` +- `company_activity_lifecycle`: `3` - `company_analytics`: `1` - `company_authority`: `3` - `company_authority_probe`: `1` @@ -51,14 +51,14 @@ - `continuity_interrupt`: `1` - `contracts_followup`: `16` - `counterparty_carryover`: `1` -- `counterparty_catalog_scope`: `1` +- `counterparty_catalog_scope`: `2` - `counterparty_documents`: `29` - `counterparty_followup`: `3` - `counterparty_grounding`: `1` - `counterparty_item_flow`: `1` - `counterparty_lifecycle`: `1` - `counterparty_monthly_net_cash_flow`: `1` -- `counterparty_net_cash_flow`: `4` +- `counterparty_net_cash_flow`: `5` - `counterparty_net_value_flow`: `1` - `counterparty_outgoing_payments`: `1` - `counterparty_pronoun_resolution`: `15` @@ -76,17 +76,17 @@ - `current_turn_entity_authority`: `1` - `customer_analytics`: `1` - `data_scope_meta`: `2` -- `date_carryover`: `6` +- `date_carryover`: `7` - `date_followup`: `2` - `date_scope`: `1` - `debt_polarity`: `1` - `display_label_integrity`: `3` - `display_name_integrity`: `1` -- `document_evidence`: `3` +- `document_evidence`: `4` - `document_lane_after_clarification`: `5` - `document_lane_continuity`: `6` - `document_lane_execution`: `5` -- `document_pivot`: `1` +- `document_pivot`: `2` - `document_pivot_after_movement`: `1` - `document_pivot_after_movement_retrieval`: `2` - `document_pivot_after_retrieval`: `2` @@ -96,30 +96,30 @@ - `documents_followup`: `7` - `documents_pivot`: `2` - `entity_grounding`: `2` -- `entity_resolution`: `29` +- `entity_resolution`: `35` - `exact_not_overwritten`: `2` -- `followup_anchor`: `6` -- `followup_reuse`: `21` +- `followup_anchor`: `7` +- `followup_reuse`: `26` - `followup_short`: `1` - `fourth_pivot`: `2` - `garbage_anchor_forbidden`: `1` -- `grounded_context_seed`: `1` +- `grounded_context_seed`: `2` - `grounded_counterparty`: `13` - `grounded_counterparty_followup`: `12` - `grounded_discovery_seed`: `1` - `grounded_self_correction`: `1` -- `grounded_summary`: `1` +- `grounded_summary`: `2` - `historical_anchor`: `1` - `historical_date_anchor`: `3` - `historical_inventory`: `2` - `historical_restore`: `1` -- `human_answer`: `3` +- `human_answer`: `4` - `human_answer_quality`: `2` -- `human_dialog`: `39` +- `human_dialog`: `46` - `hybrid_investigation_followup`: `2` - `hybrid_investigation_root`: `2` - `incoming`: `8` -- `incoming_value_flow`: `8` +- `incoming_value_flow`: `9` - `inline_organization_clarification`: `13` - `integrity_guard`: `57` - `inventory_aging`: `3` @@ -142,7 +142,7 @@ - `manual_9lieoh`: `11` - `materialization_gap`: `1` - `mcp_discovery_bidirectional_value_flow`: `2` -- `mcp_discovery_gate_sanity`: `1` +- `mcp_discovery_gate_sanity`: `2` - `mcp_discovery_response_gate`: `1` - `mcp_discovery_supplier_payout`: `1` - `mcp_discovery_value_flow`: `1` @@ -152,12 +152,12 @@ - `meta_memory`: `6` - `meta_return_to_business`: `1` - `meta_scope`: `12` -- `meta_smalltalk`: `13` +- `meta_smalltalk`: `14` - `meta_verify`: `1` - `metadata_lane_choice_clarification`: `15` - `metadata_surface`: `17` - `mixed_ambiguity`: `15` -- `movement_evidence`: `3` +- `movement_evidence`: `4` - `movement_execution`: `1` - `movement_lane_after_clarification`: `11` - `movement_lane_after_metadata`: `2` @@ -170,19 +170,19 @@ - `multi_company_entry`: `2` - `multi_hop_clarification`: `21` - `net_switch`: `1` -- `net_value_flow`: `5` -- `neutral_followup`: `16` +- `net_value_flow`: `6` +- `neutral_followup`: `17` - `numeric_counterparty_suffix`: `1` -- `off_domain_living_chat`: `2` +- `off_domain_living_chat`: `3` - `open_scope`: `9` - `open_scope_net`: `3` -- `open_scope_total`: `8` +- `open_scope_total`: `10` - `organization_activity_age`: `5` - `organization_authority`: `7` -- `organization_clarification`: `8` +- `organization_clarification`: `9` - `organization_fact_boundary`: `1` - `organization_followup_reuse`: `20` -- `organization_scope`: `28` +- `organization_scope`: `34` - `organization_scoped`: `4` - `organization_second_recovery`: `1` - `outgoing`: `3` @@ -190,7 +190,7 @@ - `payables`: `1` - `payables_snapshot`: `1` - `payments_followup`: `20` -- `payout_switch`: `4` +- `payout_switch`: `5` - `payout_value_flow`: `2` - `payout_year_switch`: `3` - `period_carryover`: `1` @@ -205,7 +205,7 @@ - `period_narrowing`: `1` - `period_scope`: `9` - `pivot_seed`: `8` -- `planner_catalog_alignment`: `13` +- `planner_catalog_alignment`: `26` - `polarity_flip`: `1` - `post_f`: `9` - `post_f_integrity_hardening`: `6` @@ -260,7 +260,7 @@ - `stale_entity_seed`: `1` - `stale_inventory_scope`: `1` - `stale_lifecycle_override`: `1` -- `stale_replay_forbidden`: `2` +- `stale_replay_forbidden`: `3` - `stale_scope_guard`: `1` - `stale_temporal_carryover`: `1` - `supported_route_not_hijacked_by_mcp_discovery`: `1` @@ -274,10 +274,10 @@ - `topic_reset`: `5` - `translit_wording`: `1` - `unsupported_current_turn_meaning_boundary`: `5` -- `value_flow_comparison`: `11` +- `value_flow_comparison`: `13` - `value_flow_net`: `6` -- `value_flow_pivot`: `3` -- `value_flow_ranking`: `17` +- `value_flow_pivot`: `4` +- `value_flow_ranking`: `19` - `value_flow_total`: `16` - `vat`: `37` - `vat_colloquial_wording`: `2` @@ -288,7 +288,7 @@ - `vat_orientation`: `2` - `very_old_stock`: `1` - `year_specific`: `1` -- `year_switch`: `14` +- `year_switch`: `16` - `year_switch_after_document_pivot`: `1` - `year_switch_after_fourth_pivot`: `2` - `year_switch_after_pivot`: `4` @@ -728,6 +728,26 @@ - `address_truth_harness_phase82_human_mixed_integrity_status_dialog:step_17_counterparty_net_followup` | tags: grounded_counterparty, net_value_flow, human_dialog | question: А какое нетто? - `address_truth_harness_phase82_human_mixed_integrity_status_dialog:step_18_counterparty_documents_pivot` | tags: grounded_counterparty, documents_pivot, human_dialog, counterparty_documents | question: А по документам? - `address_truth_harness_phase82_human_mixed_integrity_status_dialog:step_19_counterparty_movements_pivot` | tags: grounded_counterparty, movements_pivot, human_dialog | question: А по движениям? +- `address_truth_harness_phase83_planner_brain_alignment_mix:step_01_human_smalltalk_sanity` | tags: human_answer, mcp_discovery_gate_sanity, meta_smalltalk | question: привет, ты на связи? +- `address_truth_harness_phase83_planner_brain_alignment_mix:step_01_resolve_counterparty_alias` | tags: entity_resolution, alias_grounding, followup_anchor, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=entity_resolution, selected_matches_top=True | question: найди в 1С контрагента СВК +- `address_truth_harness_phase83_planner_brain_alignment_mix:step_02_incoming_by_resolved_entity` | tags: entity_resolution, incoming_value_flow, followup_reuse, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=value_flow, selected_matches_top=True | question: сколько получили по нему за 2020 год +- `address_truth_harness_phase83_planner_brain_alignment_mix:step_03_payout_switch_by_resolved_entity` | tags: entity_resolution, payout_switch, followup_reuse, date_carryover, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=value_flow, selected_matches_top=True | question: а теперь сколько заплатили? +- `address_truth_harness_phase83_planner_brain_alignment_mix:step_04_net_after_payout` | tags: entity_resolution, net_value_flow, followup_reuse, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=value_flow_comparison, selected_matches_top=True | question: а какое нетто? +- `address_truth_harness_phase83_planner_brain_alignment_mix:step_05_documents_after_net` | tags: entity_resolution, document_evidence, value_flow_pivot, followup_reuse, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=document_evidence, selected_matches_top=True | question: а по документам? +- `address_truth_harness_phase83_planner_brain_alignment_mix:step_06_movements_after_documents` | tags: entity_resolution, movement_evidence, document_pivot, followup_reuse, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=movement_evidence, selected_matches_top=True | question: а по движениям? +- `address_truth_harness_phase83_planner_brain_alignment_mix:step_01_open_scope_incoming_total` | tags: open_scope_total, organization_scope, human_dialog, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=value_flow, selected_matches_top=True | question: Хочу быстрый денежный срез по одной организации без привязки к контрагенту. Сколько вообще входящих денег было за 2020 год? +- `address_truth_harness_phase83_planner_brain_alignment_mix:step_02_all_time_same_open_scope` | tags: organization_clarification, open_scope_total, human_dialog, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=value_flow, selected_matches_top=True | question: По ООО Альтернатива Плюс. +- `address_truth_harness_phase83_planner_brain_alignment_mix:step_03_all_time_same_open_scope` | tags: all_time_followup, organization_scope, human_dialog, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=value_flow, selected_matches_top=True | question: Понял, тогда за все время. +- `address_truth_harness_phase83_planner_brain_alignment_mix:step_04_bidirectional_comparison` | tags: value_flow_comparison, organization_scope, human_dialog, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=value_flow_comparison, selected_matches_top=True | question: Хорошо. А что по ООО Альтернатива Плюс больше в 2020 году: входящие или исходящие деньги? +- `address_truth_harness_phase83_planner_brain_alignment_mix:step_05_comparison_year_switch` | tags: value_flow_comparison, year_switch, organization_scope, human_dialog, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=value_flow_comparison, selected_matches_top=True | question: А что по ООО Альтернатива Плюс больше уже за 2021 год: входящие или исходящие деньги? +- `address_truth_harness_phase83_planner_brain_alignment_mix:step_06_ranking_top_counterparty` | tags: value_flow_ranking, organization_scope, human_dialog, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=value_flow_ranking, selected_matches_top=True | question: И кто больше всего принес денег этой организации в 2020 году? +- `address_truth_harness_phase83_planner_brain_alignment_mix:step_07_ranking_year_switch` | tags: value_flow_ranking, year_switch, organization_scope, human_dialog, planner_catalog_alignment | catalog_alignment: status=selected_matches_top, top=value_flow_ranking, selected_matches_top=True | question: А в 2021 году? +- `address_truth_harness_phase83_planner_brain_alignment_mix:step_01_company_activity_lifecycle` | tags: company_activity_lifecycle, grounded_context_seed | question: а по Альтернативе Плюс сколько лет активности в базе 1С? +- `address_truth_harness_phase83_planner_brain_alignment_mix:step_02_broad_business_evaluation` | tags: broad_business_evaluation, grounded_summary | question: Как ты оценишь деятельность компании? +- `address_truth_harness_phase83_planner_brain_alignment_mix:step_03_net_flow_after_broad_eval` | tags: counterparty_net_cash_flow, broad_eval_bridge_preserved | question: какое нетто по деньгам с Группа СВК за 2020 год: сколько получили и сколько заплатили? +- `address_truth_harness_phase83_planner_brain_alignment_mix:step_01_catalog_metadata_surface` | tags: catalog_metadata_surface, counterparty_catalog_scope | question: какие справочники 1С есть по контрагентам? +- `address_truth_harness_phase83_planner_brain_alignment_mix:step_02_neutral_followup_catalog_drilldown` | tags: catalog_drilldown, neutral_followup | question: давай дальше +- `address_truth_harness_phase83_planner_brain_alignment_mix:step_08_off_domain_living_chat_not_hijacked` | tags: off_domain_living_chat, stale_replay_forbidden | question: а чем капибара отличается от утки? - `address_truth_harness_phase8_manual_runtime_authority_mix:step_01_smalltalk` | tags: meta_smalltalk | question: привет, как дела? - `address_truth_harness_phase8_manual_runtime_authority_mix:step_02_data_scope_meta` | tags: meta_scope | question: по какой компании мы сейчас работаем? - `address_truth_harness_phase8_manual_runtime_authority_mix:step_03_counterparty_documents` | tags: counterparty_documents | question: покажи все документы по чепурнову diff --git a/scripts/agent_semantic_pack_builder.py b/scripts/agent_semantic_pack_builder.py index c88ab7e..fd94dfb 100644 --- a/scripts/agent_semantic_pack_builder.py +++ b/scripts/agent_semantic_pack_builder.py @@ -167,7 +167,180 @@ RECIPE_LIBRARY: dict[str, dict[str, Any]] = { "required_tags": ["meta_scope"], }, ], - } + }, + "turnaround_11_planner_brain_alignment_mix": { + "scenario_id": "address_truth_harness_phase83_planner_brain_alignment_mix", + "domain": "planner_autonomy_consolidation", + "title": "Phase 83 mixed planner-brain replay for catalog alignment, pivots, and legacy continuity", + "description": ( + "Mixed AGENT replay for Planner Autonomy Consolidation. The pack interleaves selected-counterparty " + "catalog-alignment probes, open-organization money flow, ranking, broad-evaluation continuity, " + "metadata drilldown, and off-domain living-chat safety." + ), + "bindings": {}, + "step_plan": [ + { + "slot_id": "slot_01_smalltalk_sanity", + "criticality": "info", + "preferred_candidate_ids": [ + "address_truth_harness_phase19_mcp_discovery_response_gate:step_01_human_smalltalk_sanity", + ], + "required_tags": ["meta_smalltalk"], + }, + { + "slot_id": "slot_02_counterparty_grounding", + "criticality": "critical", + "preferred_candidate_ids": [ + "address_truth_harness_phase32_planner_selected_chain_end_to_end:step_01_resolve_counterparty_alias", + ], + "required_tags": ["planner_catalog_alignment", "entity_resolution"], + }, + { + "slot_id": "slot_03_counterparty_incoming", + "criticality": "critical", + "preferred_candidate_ids": [ + "address_truth_harness_phase32_planner_selected_chain_end_to_end:step_02_incoming_by_resolved_entity", + ], + "required_tags": ["planner_catalog_alignment"], + }, + { + "slot_id": "slot_04_counterparty_payout", + "criticality": "critical", + "preferred_candidate_ids": [ + "address_truth_harness_phase32_planner_selected_chain_end_to_end:step_03_payout_switch_by_resolved_entity", + ], + "required_tags": ["planner_catalog_alignment"], + }, + { + "slot_id": "slot_05_counterparty_net", + "criticality": "critical", + "preferred_candidate_ids": [ + "address_truth_harness_phase32_planner_selected_chain_end_to_end:step_04_net_after_payout", + ], + "required_tags": ["planner_catalog_alignment"], + }, + { + "slot_id": "slot_06_counterparty_documents", + "criticality": "critical", + "preferred_candidate_ids": [ + "address_truth_harness_phase32_planner_selected_chain_end_to_end:step_05_documents_after_net", + ], + "required_tags": ["planner_catalog_alignment", "document_evidence"], + }, + { + "slot_id": "slot_07_counterparty_movements", + "criticality": "critical", + "preferred_candidate_ids": [ + "address_truth_harness_phase32_planner_selected_chain_end_to_end:step_06_movements_after_documents", + ], + "required_tags": ["planner_catalog_alignment", "movement_evidence"], + }, + { + "slot_id": "slot_08_open_org_total", + "criticality": "critical", + "preferred_candidate_ids": [ + "address_truth_harness_phase66_human_org_open_scope_dialog:step_01_open_scope_incoming_total", + ], + "required_tags": ["planner_catalog_alignment"], + }, + { + "slot_id": "slot_09_open_org_selection", + "criticality": "critical", + "preferred_candidate_ids": [ + "address_truth_harness_phase66_human_org_open_scope_dialog:step_02_all_time_same_open_scope", + ], + "required_tags": ["planner_catalog_alignment"], + }, + { + "slot_id": "slot_10_open_org_all_time", + "criticality": "critical", + "preferred_candidate_ids": [ + "address_truth_harness_phase66_human_org_open_scope_dialog:step_03_all_time_same_open_scope", + ], + "required_tags": ["planner_catalog_alignment"], + }, + { + "slot_id": "slot_11_open_org_comparison", + "criticality": "critical", + "preferred_candidate_ids": [ + "address_truth_harness_phase66_human_org_open_scope_dialog:step_04_bidirectional_comparison", + ], + "required_tags": ["planner_catalog_alignment", "value_flow_comparison"], + }, + { + "slot_id": "slot_12_open_org_comparison_year_switch", + "criticality": "critical", + "preferred_candidate_ids": [ + "address_truth_harness_phase66_human_org_open_scope_dialog:step_05_comparison_year_switch", + ], + "required_tags": ["planner_catalog_alignment", "value_flow_comparison"], + }, + { + "slot_id": "slot_13_open_org_ranking", + "criticality": "critical", + "preferred_candidate_ids": [ + "address_truth_harness_phase66_human_org_open_scope_dialog:step_06_ranking_top_counterparty", + ], + "required_tags": ["planner_catalog_alignment", "value_flow_ranking"], + }, + { + "slot_id": "slot_14_open_org_ranking_year_switch", + "criticality": "critical", + "preferred_candidate_ids": [ + "address_truth_harness_phase66_human_org_open_scope_dialog:step_07_ranking_year_switch", + ], + "required_tags": ["planner_catalog_alignment", "value_flow_ranking"], + }, + { + "slot_id": "slot_15_broad_eval_context", + "criticality": "warning", + "preferred_candidate_ids": [ + "address_truth_harness_phase22_broad_business_evaluation_bridge:step_01_company_activity_lifecycle", + ], + "required_tags": ["company_activity_lifecycle"], + }, + { + "slot_id": "slot_16_broad_eval_bridge", + "criticality": "warning", + "preferred_candidate_ids": [ + "address_truth_harness_phase22_broad_business_evaluation_bridge:step_02_broad_business_evaluation", + ], + "required_tags": ["broad_business_evaluation"], + }, + { + "slot_id": "slot_17_broad_eval_return_to_net", + "criticality": "critical", + "preferred_candidate_ids": [ + "address_truth_harness_phase22_broad_business_evaluation_bridge:step_03_net_flow_after_broad_eval", + ], + "required_tags": ["broad_eval_bridge_preserved"], + }, + { + "slot_id": "slot_18_metadata_surface", + "criticality": "warning", + "preferred_candidate_ids": [ + "address_truth_harness_phase42_catalog_metadata_drilldown:step_01_catalog_metadata_surface", + ], + "required_tags": ["catalog_metadata_surface"], + }, + { + "slot_id": "slot_19_metadata_drilldown", + "criticality": "warning", + "preferred_candidate_ids": [ + "address_truth_harness_phase42_catalog_metadata_drilldown:step_02_neutral_followup_catalog_drilldown", + ], + "required_tags": ["catalog_drilldown"], + }, + { + "slot_id": "slot_20_off_domain_guard", + "criticality": "warning", + "preferred_candidate_ids": [ + "address_truth_harness_phase19_mcp_discovery_response_gate:step_08_off_domain_living_chat_not_hijacked", + ], + "required_tags": ["off_domain_living_chat"], + }, + ], + }, } diff --git a/scripts/test_agent_semantic_pack_builder.py b/scripts/test_agent_semantic_pack_builder.py index 0cd07d5..1c6bc78 100644 --- a/scripts/test_agent_semantic_pack_builder.py +++ b/scripts/test_agent_semantic_pack_builder.py @@ -49,6 +49,25 @@ class AgentSemanticPackBuilderTests(unittest.TestCase): self.assertIn("same_date_restore", all_tags) self.assertIn("settlements_receivables", all_tags) + def test_build_recipe_spec_creates_planner_brain_alignment_pack(self) -> None: + catalog = builder.build_source_catalog() + spec = builder.build_recipe_spec(catalog, "turnaround_11_planner_brain_alignment_mix") + + self.assertEqual(spec["scenario_id"], "address_truth_harness_phase83_planner_brain_alignment_mix") + self.assertEqual(len(spec["steps"]), 20) + all_tags = {tag for step in spec["steps"] for tag in step.get("semantic_tags", [])} + self.assertIn("planner_catalog_alignment", all_tags) + self.assertIn("value_flow_comparison", all_tags) + self.assertIn("value_flow_ranking", all_tags) + self.assertIn("broad_business_evaluation", all_tags) + self.assertIn("catalog_drilldown", all_tags) + self.assertIn("off_domain_living_chat", all_tags) + + catalog_checked_steps = [ + step for step in spec["steps"] if step.get("expected_catalog_chain_top_match") + ] + self.assertEqual(len(catalog_checked_steps), 13) + if __name__ == "__main__": unittest.main()