diff --git a/.codex/agents/domain_analyst.toml b/.codex/agents/domain_analyst.toml index 5ac166e..f72bf02 100644 --- a/.codex/agents/domain_analyst.toml +++ b/.codex/agents/domain_analyst.toml @@ -17,21 +17,25 @@ You read: Your job is to produce a detailed verdict in Russian with strong business focus. -Always answer in a strict structure: -1. Смысл вопроса -2. Главный пользовательский путь и дерево сценария -3. Что реально посчитано -4. Где расхождение по бизнес-смыслу -5. Где route / capability mismatch -6. Evidence quality -7. P0 defects -8. P1 defects -9. P2 defects -10. Minimal patch directions -11. Acceptance matrix for rerun -12. Acceptance criteria for rerun -13. Quality score -14. Loop decision +When the caller asks for prose, use this strict structure: +1. Question meaning +2. Primary user path and scenario tree +3. Expected direct answer +4. What the system actually computed +5. Business mismatch +6. Route / capability mismatch +7. State continuity and selected-object memory +8. Field truth and evidence quality +9. P0 defects +10. P1 defects +11. P2 defects +12. Minimal patch directions +13. Acceptance matrix for rerun +14. Acceptance criteria for rerun +15. Quality score +16. Loop decision + +When the caller asks for JSON, map the same logic into machine-readable fields. Do not collapse the business analysis into one generic summary. Rules: - Call out non-business garbage explicitly. @@ -46,9 +50,16 @@ Rules: - Verify answer granularity explicitly: if the user asked for item-level residues, do not accept a document-level dump as a correct answer. - Verify sort/order semantics when the wording implies chronology or ranking, for example `старые закупки` should be oldest-first. - Treat the acceptance unit as a scenario tree, not a flat list of prompts. -- Under `Главный пользовательский путь и дерево сценария`, explicitly name the root node, critical child nodes, critical edges, and the primary user path. -- Under `Acceptance matrix for rerun`, list at least the critical nodes/edges and mark each one by wording family: `canonical`, `colloquial`, `ui_selected_object`. -- Distinguish these defect classes explicitly when relevant: `semantic_understanding_gap`, `edge_carryover_gap`, `answer_shape_mismatch`, `ordering_semantics_mismatch`, `runtime_capability_gap`, `loop_coverage_gap`. +- Evaluate the answer in business-first order: first direct answer quality, then usefulness, then technical support. +- Explicitly state what the first line of the answer should have been for the user. +- If the answer is technically grounded but business-useless, say so directly and lower the score. +- Treat selected-object continuity and reusable answer-object memory as first-class analysis objects. +- Call out when the runtime found the underlying document/trace but failed to retain the resolved business object for the next follow-up. +- Distinguish `object_memory_gap`, `field_mapping_gap`, `business_utility_gap`, and `domain_anchor_gap` from pure route gaps. +- Check field truth explicitly: supplier must not be mislabeled as organization, buyer must not be mislabeled as organization, and document-side fields must not be presented as business truth without evidence. +- Under the scenario-tree section, explicitly name the root node, critical child nodes, critical edges, and the primary user path. +- Under the acceptance matrix, list at least the critical nodes/edges and mark each one by wording family: `canonical`, `colloquial`, `ui_selected_object`. +- Distinguish these defect classes explicitly when relevant: `semantic_understanding_gap`, `edge_carryover_gap`, `object_memory_gap`, `field_mapping_gap`, `answer_shape_mismatch`, `ordering_semantics_mismatch`, `runtime_capability_gap`, `business_utility_gap`, `loop_coverage_gap`, `domain_anchor_gap`. - If the root node works but the primary user path is broken at the first selected-object drilldown, treat that as a real failure of domain hardening. - If the runtime nearly supports the path but the loop never validated the realistic wording family, call it `loop_coverage_gap`, not product success. @@ -56,6 +67,6 @@ Quality score: - Output one integer score from 0 to 100. - Score >= 80 means the case can be accepted only if there is no unresolved P0. - Score >= 80 also requires the primary user path and its critical edges to be green across canonical, colloquial, and UI-selected-object coverage where applicable. -- If score < 80, loop_decision must be continue, partial, blocked, or needs_exact_capability. +- Score >= 80 also requires `direct_answer_ok = true` and `business_usefulness_ok = true` for the primary user path. """ nickname_candidates = ["Lens", "Vector", "Delta"] diff --git a/.codex/agents/orchestrator.toml b/.codex/agents/orchestrator.toml index 8d6da59..080eeb1 100644 --- a/.codex/agents/orchestrator.toml +++ b/.codex/agents/orchestrator.toml @@ -1,5 +1,5 @@ name = "orchestrator" -description = "Coordinates a repo-native domain-case or scenario loop for NDC_1C: baseline or scenario capture, analyst verdict, minimal domain patch, rerun, and 80-point acceptance gate." +description = "Coordinates a repo-native domain-case or scenario loop for NDC_1C: baseline or scenario capture, minimal domain patching, rerun, and business-first acceptance." model = "gpt-5.4" model_reasoning_effort = "high" sandbox_mode = "workspace-write" @@ -46,6 +46,10 @@ Hard rules: - For cascading date-sensitive scenarios, rerun at least one `на эту дату` / `на ту дату` follow-up and verify that the originating date or period survives into debug filters. - If the business question asks for residues/items/contracts but the answer switched to raw documents or movements, treat that as a real defect, not as acceptable detail. - If the wording implies chronology or ranking such as `старые закупки`, verify oldest-first ordering explicitly. +- Require the analyst to judge business usefulness, not only technical groundedness. +- Require the analyst to judge whether the direct answer appears in the first line when the user asked a direct lookup question. +- Treat selected-object continuity, pronoun resolution, and reusable resolved-object state as mandatory audit targets for follow-up-heavy domains. +- Distinguish runtime capability gaps from state-layer continuity gaps and from business-presentation gaps before choosing coder tasks. - If the root node works but the first critical selected-object or drilldown edge is still broken, do not treat the scenario as hardened. - Require an explicit `scenario_acceptance_matrix.md` artifact for follow-up-heavy domains and packs. - Use the matrix to drive coder tasks: patch the narrowest broken edge or wording family first, not the whole domain at once. @@ -57,6 +61,7 @@ Acceptance gate: - accepted requires no business-critical regression in rerun - accepted requires green critical edges on the primary user path - accepted requires green coverage for canonical + colloquial + UI-selected-object variants on critical branches when those branches exist in the product UX +- accepted requires `direct_answer_ok = true` and `business_usefulness_ok = true` on the primary user path Required artifacts per cycle: - case_brief.md diff --git a/.codex/skills/domain-case-loop/SKILL.md b/.codex/skills/domain-case-loop/SKILL.md index 97016d4..f4f72de 100644 --- a/.codex/skills/domain-case-loop/SKILL.md +++ b/.codex/skills/domain-case-loop/SKILL.md @@ -28,6 +28,7 @@ This skill packages the standard workflow for iterating on one concrete domain c Read `references/repo_runtime_map.md` before the first real cycle. For follow-up-heavy domains, also read `references/scenario_tree_acceptance_canon.md` before scenario mode, pack mode, or autonomous pack-loop mode. +For business-first analyst work, also read `references/business_first_analyst_rubric.md` before redefining acceptance or hardening a noisy-but-technically-grounded domain. If `docs/orchestration/active_domain_contract.json` exists, treat it as the single mutable source of truth for the current domain and prefer it over older scattered pool/pack prose docs. Use these repo-native capture paths: @@ -136,6 +137,7 @@ The verdict must explicitly say whether the case is: - a missing route/intent/capability inside project scope; - a true out-of-scope request. - a `runtime_capability_gap`, `semantic_understanding_gap`, `edge_carryover_gap`, `answer_shape_mismatch`, `ordering_semantics_mismatch`, or `loop_coverage_gap`. +- an `object_memory_gap`, `field_mapping_gap`, `business_utility_gap`, or `domain_anchor_gap` when that is the real blocker. ### Step 4 - Domain patch @@ -208,6 +210,9 @@ Accepted requires: - Treat answer-shape mismatch as a scoring defect: if the user asked for items / residues / contracts, do not accept an answer that switched to raw documents, movements, or another lower-level object without saying so explicitly. - Treat ordering semantics as part of correctness when the wording implies ranking or chronology, for example `старые закупки` => oldest-first rather than newest-first. - Treat primary user-path failures as more important than supporting-path polish: if the user cannot go from root list -> selected object -> first drilldown, the scenario is not accepted. +- Treat direct-answer-first behavior as part of correctness: if the user asked a direct lookup question, the first line must contain the direct answer before the evidence blocks. +- Treat business usefulness as part of correctness: factual-but-business-useless output is not acceptance-quality output. +- Treat stable follow-up object memory as part of correctness: when the prior turn already resolved the relevant item/object, the next turn must not re-ask for it. ## Domain-specific framing diff --git a/.codex/skills/domain-case-loop/references/business_first_analyst_rubric.md b/.codex/skills/domain-case-loop/references/business_first_analyst_rubric.md new file mode 100644 index 0000000..86f330f --- /dev/null +++ b/.codex/skills/domain-case-loop/references/business_first_analyst_rubric.md @@ -0,0 +1,128 @@ +# Business-first analyst rubric + +Use this rubric when evaluating one domain case, one multi-step scenario, or one full domain pack. + +The analyst must not stop at route/debug correctness. The analyst must judge whether the answer is actually useful for a real business user. + +## Core principle + +The analyst evaluates five layers at once: +- user intent; +- scenario tree and state continuity; +- business usefulness of the answer; +- evidence and field truthfulness; +- root cause and smallest defensible fix direction. + +## Required analyst questions + +For every critical turn or critical edge, answer these questions explicitly: + +1. What did the user really ask? +- State the business meaning in one short sentence. +- Name the minimum direct answer the user expected. + +2. What should the first line of the answer have been? +- If the user asked a direct lookup question, the first line must contain the direct answer. +- Technical explanation, limitations, and evidence come after the direct answer. + +3. What object and scope had to survive from previous turns? +- selected item / selected contract / selected counterparty; +- originating date or period; +- warehouse or organization scope when still relevant; +- reusable resolved bundle, for example provenance trace or sale trace. + +4. Did the answer stay on the same business object? +- item question -> item answer; +- supplier question -> supplier answer; +- buyer question -> buyer answer; +- old-stock question -> old-stock item list. + +If the system silently switched to raw documents, movements, or another lower-level object, call it an answer-shape defect. + +5. Are the surfaced fields truthful and correctly labeled? +- do not confuse supplier with organization; +- do not confuse buyer with organization; +- do not present a document-side technical field as a business truth unless that mapping is proven. + +## Business usefulness rules + +An answer is not accepted as business-useful when any of these are true: +- the direct answer is not placed first; +- the answer opens with technical hedging instead of the user-facing result; +- a weaker question is answered than the one the user asked; +- the answer requires the user to reconstruct the conclusion from low-level evidence; +- the answer uses ambiguous field labels for business-critical entities. + +## State continuity rules + +Follow-up continuity is a first-class acceptance object. + +The analyst must verify: +- selected object continuity; +- date/period continuity; +- reusable evidence continuity; +- pronoun resolution continuity. + +Important pronoun examples: +- `эту позицию` +- `этот товар` +- `его` +- `по нему` +- `по этой позиции` + +If the previous turn already resolved a concrete object, the next turn must reuse it instead of asking for the anchor again. + +## Reusable answer-object cache + +For follow-up-heavy domains, the analyst should explicitly look for evidence that the product behaves as if it had a reusable resolved object bundle. + +Examples: +- `current_item` +- `current_as_of_date` +- `current_provenance_trace` +- `current_sale_trace` +- `first_purchase_date` +- `supplier_if_known` +- `source_document_if_known` + +If the runtime recomputes everything from scratch and loses the already resolved object, call that out as a state-layer defect. + +## Root-cause layers + +Use one or more of these root-cause layers explicitly: +- `semantic_understanding_gap` +- `runtime_capability_gap` +- `edge_carryover_gap` +- `object_memory_gap` +- `field_mapping_gap` +- `answer_shape_mismatch` +- `ordering_semantics_mismatch` +- `business_utility_gap` +- `loop_coverage_gap` +- `domain_anchor_gap` + +## Minimum machine-readable verdict fields + +The analyst verdict should expose at least: +- `user_intent_summary` +- `expected_direct_answer` +- `actual_direct_answer` +- `direct_answer_ok` +- `business_usefulness_ok` +- `business_utility_score` +- `direct_answer_priority_score` +- `state_continuity_score` +- `answer_shape_score` +- `evidence_clarity_score` +- `root_cause_layers` +- `broken_edge_ids` +- `violated_invariants` + +## Inventory-specific reminders + +For inventory follow-up chains, verify all of these: +- the selected item remains the current focus object after the user clicks a result; +- provenance questions answer supplier/date/document first, not only raw movement rows; +- `когда купили` can reuse the already resolved provenance bundle; +- supplier and organization are not mixed up in the surfaced answer; +- `на эту дату` keeps the original stock date unless the user explicitly changed it. diff --git a/.codex/skills/domain-case-loop/references/case_brief_template.md b/.codex/skills/domain-case-loop/references/case_brief_template.md index 18b3abd..6d0e899 100644 --- a/.codex/skills/domain-case-loop/references/case_brief_template.md +++ b/.codex/skills/domain-case-loop/references/case_brief_template.md @@ -9,6 +9,10 @@ ## Expected business meaning - ... +## Expected direct answer +- first line should say: +- minimum acceptable business answer: + ## Expected capability - ... @@ -31,6 +35,13 @@ - warehouse if relevant - organization if relevant - expected answer shape +- direct-answer-first when the user asked a direct lookup question +- reusable resolved-object continuity when the user asks a follow-up about the same selected object + +## Field truth constraints +- do not confuse supplier with organization +- do not confuse buyer with organization +- do not surface technical document-side fields as business truth without proof ## Contour status - in_contour / outside_current_contour / unknown @@ -53,3 +64,5 @@ - root node works - critical edges on the primary user path work - colloquial and UI-generated follow-up variants work +- direct answer is placed first where expected +- output is business-useful, not only technically grounded diff --git a/.codex/skills/domain-case-loop/references/scenario_tree_acceptance_canon.md b/.codex/skills/domain-case-loop/references/scenario_tree_acceptance_canon.md index 8b44cb6..5365a2d 100644 --- a/.codex/skills/domain-case-loop/references/scenario_tree_acceptance_canon.md +++ b/.codex/skills/domain-case-loop/references/scenario_tree_acceptance_canon.md @@ -12,6 +12,16 @@ The unit of acceptance is a **scenario tree**: If the root works but a critical child transition breaks, the domain is **not** hardened. +## Business-first framing + +Every accepted node or edge must be both technically grounded and business-useful. + +This means: +- the direct answer is surfaced first when the user asked a direct lookup question; +- the answer stays on the requested business object; +- evidence and caveats support the answer instead of replacing it; +- field labels are truthful for business entities such as supplier, buyer, organization, warehouse, and document. + ## Model the domain as a tree For each scenario, define: @@ -36,7 +46,8 @@ A node is considered covered only if all of these are true: - the expected intent / capability is selected; - the answer shape matches the requested business object; - the answer begins with a direct user-facing answer when such an answer is expected; -- the answer is evidence-backed rather than heuristic-masked. +- the answer is evidence-backed rather than heuristic-masked; +- the surfaced business fields are truthful and not mislabeled. Examples: - asking for supplier provenance must answer with the supplier first, not only with raw documents; @@ -53,9 +64,23 @@ Typical invariants: - warehouse survives if the follow-up still targets the same stock slice - organization survives if the previous slice was organization-bound - route family remains in the same business contour unless the user clearly changed intent +- reusable resolved-object state survives when the previous turn already answered a closely related lookup +- pronoun references can reuse the active focus object when the wording supports it If an edge loses a required invariant, that is a real regression even if the target node works in isolation. +## Resolved answer-object continuity + +For follow-up-heavy domains, the analyst should treat resolved business objects as reusable state, not as disposable one-turn artifacts. + +Examples: +- selected inventory item +- resolved supplier provenance bundle +- resolved buyer bundle +- resolved purchase document bundle + +If turn N already resolved such an object and turn N+1 asks a natural follow-up about the same object, the system should reuse that state instead of demanding the same anchor again. + ## Mandatory paraphrase families Every critical node or edge must be validated in a small paraphrase family instead of one curated wording only. @@ -65,11 +90,6 @@ Minimum family: - `colloquial` - `ui_selected_object` -Examples: -- canonical: `От какого поставщика куплен товар X` -- colloquial: `Кто поставил этот товар` -- ui_selected_object: `По выбранному объекту "X": кто это поставил нам` - If canonical works but colloquial or UI-generated follow-up fails, the node/edge is not accepted. ## Acceptance matrix @@ -86,6 +106,8 @@ Minimum matrix columns: - expected capability / recipe - required carryover invariants - expected answer shape +- expected direct answer +- business usefulness expectation - actual outcome - status (`pass`, `partial`, `fail`) - defect class @@ -95,17 +117,25 @@ Minimum matrix columns: Use these classes explicitly: - `semantic_understanding_gap` - `edge_carryover_gap` +- `object_memory_gap` +- `field_mapping_gap` - `answer_shape_mismatch` - `ordering_semantics_mismatch` - `runtime_capability_gap` +- `business_utility_gap` +- `domain_anchor_gap` - `loop_coverage_gap` Definitions: - `semantic_understanding_gap`: the system did not understand the real user meaning - `edge_carryover_gap`: the follow-up lost date / object / scope across steps +- `object_memory_gap`: the system resolved the object once but failed to retain it for the next follow-up +- `field_mapping_gap`: the answer surfaced the wrong business field or mislabeled a field - `answer_shape_mismatch`: the business object in the answer does not match the requested object - `ordering_semantics_mismatch`: ranking / chronology semantics are wrong - `runtime_capability_gap`: the product contour truly lacks the route / intent / capability / extractor / recipe +- `business_utility_gap`: the answer may be grounded but is still not useful as a user-facing result +- `domain_anchor_gap`: the scenario uses a weak or wrong observed anchor, so the tree is semantically mis-specified - `loop_coverage_gap`: the runtime could support the path or nearly support it, but the analyst/orchestrator never treated that path as mandatory acceptance coverage ## Analyst responsibilities @@ -116,6 +146,9 @@ The analyst must: - call out broken edges explicitly; - verify colloquial and UI-generated variants as first-class coverage; - verify direct-answer-first behavior where the user asked a direct lookup question; +- verify business usefulness explicitly, not only technical validity; +- verify field truthfulness for surfaced supplier / buyer / organization labels; +- verify selected-object continuity and reusable object memory; - verify answer granularity and ordering semantics; - lower the score when any critical edge or paraphrase family is broken. @@ -136,7 +169,9 @@ Do not accept a domain when: - selected-object follow-up is broken; - `на эту дату` / `на ту дату` loses the originating date; - the answer shape is wrong for the business question; -- chronology / ranking semantics are inverted. +- chronology / ranking semantics are inverted; +- the direct answer is not surfaced first on direct lookup questions; +- the answer is technically grounded but still business-useless. Accepted requires: - score >= 80 @@ -144,3 +179,5 @@ Accepted requires: - critical path edges pass - canonical + colloquial + UI-selected-object variants pass for critical branches - no silent heuristic masking +- `direct_answer_ok = true` +- `business_usefulness_ok = true` diff --git a/.codex/skills/domain-case-loop/references/verdict_template.md b/.codex/skills/domain-case-loop/references/verdict_template.md index 0edd860..f830e7b 100644 --- a/.codex/skills/domain-case-loop/references/verdict_template.md +++ b/.codex/skills/domain-case-loop/references/verdict_template.md @@ -1,55 +1,74 @@ # Verdict -## 1. Смысл вопроса +## 1. Question meaning ... -## 2. Главный пользовательский путь и дерево сценария +## 2. Primary user path and scenario tree - root: - critical child nodes: - critical edges: - primary user path: -## 3. Что реально посчитано +## 3. Expected direct answer +- what the first line should say: +- minimum acceptable business answer: + +## 4. What the system actually computed ... -## 4. Где расхождение по бизнес-смыслу +## 5. Business mismatch +- did the answer solve the user's real question: +- did the direct answer appear first: +- is the answer usable for an operator/accountant/manager: + +## 6. Route / capability mismatch ... -## 5. Где route / capability mismatch -... +## 7. State continuity and selected-object memory +- selected object continuity: +- date/period continuity: +- reusable answer-object continuity: +- pronoun resolution continuity: -## 6. Evidence quality -- exact / partial / heuristic / technical insufficiency -- why +## 8. Field truth and evidence quality +- supplier vs organization: +- buyer vs organization: +- exact / partial / heuristic / technical insufficiency: +- why: -## 7. P0 defects +## 9. P0 defects - ... -## 8. P1 defects +## 10. P1 defects - ... -## 9. P2 defects +## 11. P2 defects - ... -## 10. Minimal patch directions +## 12. Minimal patch directions - ... -## 11. Acceptance matrix for rerun +## 13. Acceptance matrix for rerun - Node / edge coverage: - Canonical wording: - Colloquial wording: - UI-generated selected-object wording: - Carryover invariants: - Expected answer shape: +- Expected direct answer: +- Business usefulness: - Defect class: -## 12. Acceptance criteria for rerun +## 14. Acceptance criteria for rerun - ... - Include colloquial/slang variants and UI-generated selected-object follow-up variants when they are part of the business flow. - Require the primary user path to pass end-to-end, not only the root node. +- Require direct-answer-first behavior on direct lookup questions. +- Require business-useful output rather than technically-grounded-but-noisy output. +- Require selected-object continuity and reusable answer-object continuity on follow-up chains. -## 13. Quality score +## 15. Quality score - integer from 0 to 100 -## 14. Loop decision +## 16. Loop decision - accepted / continue / partial / blocked / needs_exact_capability diff --git a/AGENTS.md b/AGENTS.md index fc477d1..15156f6 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -26,6 +26,7 @@ Rules: - Do not accept a domain when only the root snapshot works but selected-object or drilldown follow-up edges still fail. - For critical branches, validate at least canonical wording, colloquial wording, and UI-generated selected-object wording when that UX exists. - Treat temporal carryover, selected-object carryover, answer-shape match, and ordering semantics as first-class acceptance invariants rather than optional polish. +- Treat direct-answer-first behavior, business usefulness, selected-object memory, and field truthfulness as first-class analyst criteria rather than optional presentation polish. - If a case falls outside the current routed contour because the route/intent/capability is not wired yet, treat it as domain enablement work for this project, not as automatic out-of-scope rejection. - For new unmarked domains, `needs_exact_capability` means "bootstrap or extend the contour" rather than "close the case as unsupported". - A case can be marked `accepted` only when analyst verdict is at least `80/100`, no unresolved `P0` remains, and the rerun does not mask heuristic output as confirmed. diff --git a/docs/orchestration/active_domain_contract.json b/docs/orchestration/active_domain_contract.json index 9f4979a..6d7e3c4 100644 --- a/docs/orchestration/active_domain_contract.json +++ b/docs/orchestration/active_domain_contract.json @@ -780,6 +780,12 @@ "required_paraphrase_families": ["canonical", "ui_selected_object"], "required_carryover_invariants": ["selected_object", "date_scope", "answer_shape"] }, + "bindings": { + "target_date_historical": "2020-03-31", + "focus_item_historical": "Шкаф картотечный 1000*400*2100", + "observed_supplier_candidate": "Гамма-мебель, ООО", + "observed_customer_candidate": "Департамент капитального ремонта города Москвы" + }, "steps": [ { "step_id": "step_01_account_41_historical", @@ -790,7 +796,7 @@ "title": "Historical account 41 anchor", "question": "Какие товары числятся на 41 счете на дату {{bindings.target_date_historical}}", "analysis_context": { - "as_of_date": "2019-03-31", + "as_of_date": "2020-03-31", "source": "binding_target_date_historical" }, "expected_capability": "confirmed_inventory_on_hand_as_of_date", @@ -823,13 +829,29 @@ "node_role": "supporting_child", "paraphrase_family": "canonical", "title": "Supplier to buyer overlap", - "question": "Какие товары были куплены у поставщика {{bindings.observed_supplier_candidate}} и позже проданы покупателю {{bindings.observed_customer_candidate}}", + "question": "Есть ли документально подтвержденная цепочка: поставщик {{bindings.observed_supplier_candidate}} -> товар {{bindings.focus_item_historical}} -> покупатель {{bindings.observed_customer_candidate}}", "depends_on": ["step_01_account_41_historical", "step_02_selected_item_buyer"] } ] } ] }, + "agent_audit_expectations": { + "direct_answer_first": true, + "business_utility_required": true, + "state_continuity_required": true, + "selected_object_memory_required": true, + "field_truth_checks": [ + "supplier_vs_organization", + "buyer_vs_organization" + ], + "reusable_answer_object_expectations": [ + "current_item", + "current_as_of_date", + "current_provenance_trace", + "current_sale_trace" + ] + }, "acceptance_contract": { "acceptance_unit": "scenario_tree", "do_not_accept_if": [ diff --git a/docs/orchestration/schemas/domain_loop_analyst_verdict.schema.json b/docs/orchestration/schemas/domain_loop_analyst_verdict.schema.json index d49373e..5790869 100644 --- a/docs/orchestration/schemas/domain_loop_analyst_verdict.schema.json +++ b/docs/orchestration/schemas/domain_loop_analyst_verdict.schema.json @@ -5,13 +5,26 @@ "additionalProperties": false, "required": [ "summary", + "user_intent_summary", + "expected_direct_answer", + "actual_direct_answer", "quality_score", + "direct_answer_ok", + "business_usefulness_ok", + "business_utility_score", + "direct_answer_priority_score", + "state_continuity_score", + "answer_shape_score", + "evidence_clarity_score", "loop_decision", "requires_user_decision", "user_decision_type", "user_decision_prompt", "unresolved_p0_count", "regression_detected", + "root_cause_layers", + "broken_edge_ids", + "violated_invariants", "priority_targets", "acceptance_criteria", "notes" @@ -20,11 +33,51 @@ "summary": { "type": "string" }, + "user_intent_summary": { + "type": "string" + }, + "expected_direct_answer": { + "type": "string" + }, + "actual_direct_answer": { + "type": ["string", "null"] + }, "quality_score": { "type": "integer", "minimum": 0, "maximum": 100 }, + "direct_answer_ok": { + "type": "boolean" + }, + "business_usefulness_ok": { + "type": "boolean" + }, + "business_utility_score": { + "type": "integer", + "minimum": 0, + "maximum": 100 + }, + "direct_answer_priority_score": { + "type": "integer", + "minimum": 0, + "maximum": 100 + }, + "state_continuity_score": { + "type": "integer", + "minimum": 0, + "maximum": 100 + }, + "answer_shape_score": { + "type": "integer", + "minimum": 0, + "maximum": 100 + }, + "evidence_clarity_score": { + "type": "integer", + "minimum": 0, + "maximum": 100 + }, "loop_decision": { "type": "string", "enum": ["accepted", "continue", "partial", "blocked", "needs_exact_capability"] @@ -35,7 +88,17 @@ }, "user_decision_type": { "type": "string", - "enum": ["none", "architecture_fork", "important_business_question", "scope_tradeoff", "data_truth_gap", "missing_required_observation", "risky_workaround", "risky_complexity", "other"], + "enum": [ + "none", + "architecture_fork", + "important_business_question", + "scope_tradeoff", + "data_truth_gap", + "missing_required_observation", + "risky_workaround", + "risky_complexity", + "other" + ], "description": "Explain why the loop needs user input. Use none when requires_user_decision is false." }, "user_decision_prompt": { @@ -49,6 +112,37 @@ "regression_detected": { "type": "boolean" }, + "root_cause_layers": { + "type": "array", + "items": { + "type": "string", + "enum": [ + "semantic_understanding_gap", + "runtime_capability_gap", + "edge_carryover_gap", + "object_memory_gap", + "field_mapping_gap", + "answer_shape_mismatch", + "ordering_semantics_mismatch", + "business_utility_gap", + "loop_coverage_gap", + "domain_anchor_gap", + "other" + ] + } + }, + "broken_edge_ids": { + "type": "array", + "items": { + "type": "string" + } + }, + "violated_invariants": { + "type": "array", + "items": { + "type": "string" + } + }, "priority_targets": { "type": "array", "items": { @@ -68,7 +162,23 @@ }, "problem_type": { "type": "string", - "enum": ["route_gap", "capability_gap", "evidence_gap", "presentation_gap", "regression", "other"] + "enum": [ + "route_gap", + "capability_gap", + "evidence_gap", + "presentation_gap", + "semantic_understanding_gap", + "edge_carryover_gap", + "object_memory_gap", + "field_mapping_gap", + "answer_shape_mismatch", + "ordering_semantics_mismatch", + "business_utility_gap", + "loop_coverage_gap", + "domain_anchor_gap", + "regression", + "other" + ] }, "fix_goal": { "type": "string" diff --git a/llm_normalizer/backend/dist/services/addressIntentResolver.js b/llm_normalizer/backend/dist/services/addressIntentResolver.js index 51c0a47..bf6706f 100644 --- a/llm_normalizer/backend/dist/services/addressIntentResolver.js +++ b/llm_normalizer/backend/dist/services/addressIntentResolver.js @@ -1340,6 +1340,17 @@ function hasInventoryPurchaseDocumentsSignal(text) { function hasInventorySaleTraceSignal(text) { return /(?:продаж|покупател|buyer|sale trace|purchase[\s-]?to[\s-]?sale|purchase -> warehouse -> sale|закупка.*продаж)/iu.test(text); } +function hasSelectedObjectInventoryCue(text) { + return /(?:по\s+выбранному\s+объекту|selected\s+object)/iu.test(text); +} +function hasSelectedObjectInventoryProvenanceSignal(text) { + return (hasSelectedObjectInventoryCue(text) && + /(?:кто\s+(?:(?:это|этот\s+товар|эту\s+позицию)\s+)?(?:нам\s+)?поставил|кто\s+(?:нам\s+)?поставил\s+(?:это|этот\s+товар|эту\s+позицию)|от\s+какого\s+поставщика|у\s+какого\s+поставщика|от\s+кого\s+куплен|supplier|vendor|поставщик)/iu.test(text)); +} +function hasSelectedObjectInventoryPurchaseDocumentsSignal(text) { + return (hasSelectedObjectInventoryCue(text) && + /(?:по\s+каким\s+документам\s+(?:это|его|этот\s+товар|эту\s+позицию)\s+купили|по\s+каким\s+документам\s+(?:был\s+)?куплен|какими\s+документами\s+(?:это|его|этот\s+товар|эту\s+позицию)\s+купили|какими\s+документами\s+(?:был\s+)?куплен|purchase\s+documents|documents\s+of\s+purchase|through\s+which\s+documents)/iu.test(text)); +} function hasInventoryProvenanceSignalV2(text) { const hasItemCue = /(?:товар|номенклатур|sku|item|product|остат(?:ок|ки)|склад)/iu.test(text); const hasSupplierCue = /(?:от\s+какого\s+поставщика|у\s+какого\s+поставщика|от\s+кого\s+куплен|кто\s+(?:нам\s+)?поставил|кем\s+поставлен|поставщик|supplier|vendor)/iu.test(text); @@ -1541,6 +1552,13 @@ function resolveAddressIntent(userMessage) { reasons: ["inventory_aging_signal_detected"] }; } + if (hasSelectedObjectInventoryProvenanceSignal(text)) { + return { + intent: "inventory_purchase_provenance_for_item", + confidence: "medium", + reasons: ["inventory_selected_object_provenance_signal_detected"] + }; + } if (hasInventoryProvenanceSignalV2(text)) { return { intent: "inventory_purchase_provenance_for_item", @@ -1555,6 +1573,13 @@ function resolveAddressIntent(userMessage) { reasons: ["inventory_purchase_date_signal_detected"] }; } + if (hasSelectedObjectInventoryPurchaseDocumentsSignal(text)) { + return { + intent: "inventory_purchase_documents_for_item", + confidence: "medium", + reasons: ["inventory_selected_object_purchase_documents_signal_detected"] + }; + } if (hasInventoryPurchaseDocumentsSignalV2(text)) { return { intent: "inventory_purchase_documents_for_item", diff --git a/llm_normalizer/backend/dist/services/addressQueryService.js b/llm_normalizer/backend/dist/services/addressQueryService.js index 1121b02..431529d 100644 --- a/llm_normalizer/backend/dist/services/addressQueryService.js +++ b/llm_normalizer/backend/dist/services/addressQueryService.js @@ -2879,6 +2879,19 @@ class AddressQueryService { const broadenedFactual = (0, composeStage_1.composeFactualReply)(intent.intent, broadenedFilteredRows, composeOptionsFromFilters(autoBroadenedFilters)); const broadenedLimitations = [...filters.warnings, "period_window_auto_broadened_to_available_data"]; const broadenedReasons = [...baseReasons, "period_window_auto_broadened_to_available_data"]; + const broadenedResultSemantics = mergeAddressResultSemantics(deriveAddressResultSemantics({ + intent: intent.intent, + selectedRecipe: broadenedSelection.selected_recipe.recipe_id, + filters: filters.extracted_filters, + responseType: broadenedFactual.responseType, + rowsMatched: broadenedFilteredRows.length + }), broadenedFactual.semantics); + const broadenedRouteExpectationAudit = buildRouteExpectationAudit({ + intent: routeExpectationIntent, + selectedRecipe: broadenedSelection.selected_recipe.recipe_id, + requestedResultMode, + resultMode: broadenedResultSemantics.result_mode + }); return { handled: true, reply_text: injectNoticeAfterLeadLine(broadenedFactual.text, broadenedPrefix), @@ -2921,13 +2934,20 @@ class AddressQueryService { runtime_readiness: "LIVE_QUERYABLE_WITH_LIMITS", limited_reason_category: null, response_type: broadenedFactual.responseType, - ...mergeAddressResultSemantics(deriveAddressResultSemantics({ - intent: intent.intent, - selectedRecipe: broadenedSelection.selected_recipe.recipe_id, - filters: filters.extracted_filters, - responseType: broadenedFactual.responseType, - rowsMatched: broadenedFilteredRows.length - }), broadenedFactual.semantics), + capability_id: capabilityAudit.capabilityId, + capability_layer: capabilityAudit.layer, + capability_route_mode: capabilityAudit.routeMode, + capability_route_enabled: capabilityAudit.enabled, + capability_route_reason: capabilityAudit.reason, + shadow_route_intent: shadowRouteAudit.intent, + shadow_route_selected_recipe: shadowRouteAudit.selectedRecipe, + shadow_route_status: shadowRouteAudit.status, + route_expectation_status: broadenedRouteExpectationAudit.status, + route_expectation_reason: broadenedRouteExpectationAudit.reason, + route_expectation_expected_selected_recipes: broadenedRouteExpectationAudit.expectedSelectedRecipes, + route_expectation_expected_requested_result_modes: broadenedRouteExpectationAudit.expectedRequestedResultModes, + route_expectation_expected_result_modes: broadenedRouteExpectationAudit.expectedResultModes, + ...broadenedResultSemantics, limitations: broadenedLimitations, reasons: withConfirmedBalanceFallbackReason(broadenedReasons, requestedResultMode, broadenedFactual.semantics) } diff --git a/llm_normalizer/backend/dist/services/address_runtime/decomposeStage.js b/llm_normalizer/backend/dist/services/address_runtime/decomposeStage.js index a05298c..a349a0a 100644 --- a/llm_normalizer/backend/dist/services/address_runtime/decomposeStage.js +++ b/llm_normalizer/backend/dist/services/address_runtime/decomposeStage.js @@ -244,6 +244,24 @@ function mapCounterpartyIntentToContractIntent(intent) { } return null; } +function isInventoryIntent(intent) { + return (intent === "inventory_on_hand_as_of_date" || + intent === "inventory_purchase_provenance_for_item" || + intent === "inventory_purchase_documents_for_item" || + intent === "inventory_supplier_stock_overlap_as_of_date" || + intent === "inventory_sale_trace_for_item" || + intent === "inventory_purchase_to_sale_chain" || + intent === "inventory_aging_by_purchase_date"); +} +function hasSelectedObjectInventorySignal(text) { + return /(?:по\s+выбранному\s+объекту|for\s+selected\s+object)/iu.test(String(text ?? "")); +} +function hasInventorySupplierFollowupCue(text) { + return /(?:кто\s+(?:(?:это|этот\s+товар|эту\s+позицию)\s+)?(?:нам\s+)?поставил|кто\s+(?:нам\s+)?поставил\s+(?:это|этот\s+товар|эту\s+позицию)|от\s+какого\s+поставщика|у\s+какого\s+поставщика|от\s+кого\s+куплен|supplier|vendor|поставщик)/iu.test(String(text ?? "")); +} +function hasInventoryPurchaseDocumentsFollowupCue(text) { + return /(?:по\s+каким\s+документам\s+(?:это|его|этот\s+товар|эту\s+позицию)\s+купили|по\s+каким\s+документам\s+(?:был\s+)?куплен|какими\s+документами\s+(?:это|его|этот\s+товар|эту\s+позицию)\s+купили|какими\s+документами\s+(?:был\s+)?куплен|purchase\s+documents|documents\s+of\s+purchase|through\s+which\s+documents)/iu.test(String(text ?? "")); +} function hasAddressFollowupContextSignal(text) { const normalized = String(text ?? "").trim(); if (!normalized) { @@ -612,6 +630,32 @@ function deriveIntentWithFollowupContext(detectedIntent, userMessage, followupCo reasons: [...detectedIntent.reasons, "intent_adjusted_to_balance_followup_context"] }; } + const previousIsInventoryFamily = isInventoryIntent(previousIntent); + const inventorySelectedObjectFollowup = hasSelectedObjectInventorySignal(normalizedMessage) || (previousIsInventoryFamily && hasFollowupSignal); + if (inventorySelectedObjectFollowup && hasInventorySupplierFollowupCue(normalizedMessage)) { + if (detectedIntent.intent === "unknown" || + detectedIntent.intent === "inventory_on_hand_as_of_date" || + detectedIntent.intent === previousIntent) { + return { + intent: "inventory_purchase_provenance_for_item", + confidence: "low", + reasons: [...detectedIntent.reasons, "intent_adjusted_to_inventory_followup_context"] + }; + } + } + if (inventorySelectedObjectFollowup && hasInventoryPurchaseDocumentsFollowupCue(normalizedMessage)) { + if (detectedIntent.intent === "unknown" || + detectedIntent.intent === "list_documents_by_counterparty" || + detectedIntent.intent === "list_documents_by_contract" || + detectedIntent.intent === "inventory_on_hand_as_of_date" || + detectedIntent.intent === previousIntent) { + return { + intent: "inventory_purchase_documents_for_item", + confidence: "low", + reasons: [...detectedIntent.reasons, "intent_adjusted_to_inventory_followup_context"] + }; + } + } if (hasPreviousContract) { if (detectedIntent.intent === "list_contracts_by_counterparty") { if (hasBankSignal(normalizedMessage)) { diff --git a/llm_normalizer/backend/src/services/addressIntentResolver.ts b/llm_normalizer/backend/src/services/addressIntentResolver.ts index 973e9e6..4d7895c 100644 --- a/llm_normalizer/backend/src/services/addressIntentResolver.ts +++ b/llm_normalizer/backend/src/services/addressIntentResolver.ts @@ -1603,6 +1603,28 @@ function hasInventorySaleTraceSignal(text: string): boolean { ); } +function hasSelectedObjectInventoryCue(text: string): boolean { + return /(?:по\s+выбранному\s+объекту|selected\s+object)/iu.test(text); +} + +function hasSelectedObjectInventoryProvenanceSignal(text: string): boolean { + return ( + hasSelectedObjectInventoryCue(text) && + /(?:кто\s+(?:(?:это|этот\s+товар|эту\s+позицию)\s+)?(?:нам\s+)?поставил|кто\s+(?:нам\s+)?поставил\s+(?:это|этот\s+товар|эту\s+позицию)|от\s+какого\s+поставщика|у\s+какого\s+поставщика|от\s+кого\s+куплен|supplier|vendor|поставщик)/iu.test( + text + ) + ); +} + +function hasSelectedObjectInventoryPurchaseDocumentsSignal(text: string): boolean { + return ( + hasSelectedObjectInventoryCue(text) && + /(?:по\s+каким\s+документам\s+(?:это|его|этот\s+товар|эту\s+позицию)\s+купили|по\s+каким\s+документам\s+(?:был\s+)?куплен|какими\s+документами\s+(?:это|его|этот\s+товар|эту\s+позицию)\s+купили|какими\s+документами\s+(?:был\s+)?куплен|purchase\s+documents|documents\s+of\s+purchase|through\s+which\s+documents)/iu.test( + text + ) + ); +} + function hasInventoryProvenanceSignalV2(text: string): boolean { const hasItemCue = /(?:товар|номенклатур|sku|item|product|остат(?:ок|ки)|склад)/iu.test(text); const hasSupplierCue = @@ -1871,6 +1893,14 @@ export function resolveAddressIntent(userMessage: string): AddressIntentResoluti }; } + if (hasSelectedObjectInventoryProvenanceSignal(text)) { + return { + intent: "inventory_purchase_provenance_for_item", + confidence: "medium", + reasons: ["inventory_selected_object_provenance_signal_detected"] + }; + } + if (hasInventoryProvenanceSignalV2(text)) { return { intent: "inventory_purchase_provenance_for_item", @@ -1887,6 +1917,14 @@ export function resolveAddressIntent(userMessage: string): AddressIntentResoluti }; } + if (hasSelectedObjectInventoryPurchaseDocumentsSignal(text)) { + return { + intent: "inventory_purchase_documents_for_item", + confidence: "medium", + reasons: ["inventory_selected_object_purchase_documents_signal_detected"] + }; + } + if (hasInventoryPurchaseDocumentsSignalV2(text)) { return { intent: "inventory_purchase_documents_for_item", diff --git a/llm_normalizer/backend/src/services/addressQueryService.ts b/llm_normalizer/backend/src/services/addressQueryService.ts index 93859f6..6dacff4 100644 --- a/llm_normalizer/backend/src/services/addressQueryService.ts +++ b/llm_normalizer/backend/src/services/addressQueryService.ts @@ -3498,6 +3498,22 @@ export class AddressQueryService { ); const broadenedLimitations = [...filters.warnings, "period_window_auto_broadened_to_available_data"]; const broadenedReasons = [...baseReasons, "period_window_auto_broadened_to_available_data"]; + const broadenedResultSemantics = mergeAddressResultSemantics( + deriveAddressResultSemantics({ + intent: intent.intent, + selectedRecipe: broadenedSelection.selected_recipe.recipe_id, + filters: filters.extracted_filters, + responseType: broadenedFactual.responseType, + rowsMatched: broadenedFilteredRows.length + }), + broadenedFactual.semantics + ); + const broadenedRouteExpectationAudit = buildRouteExpectationAudit({ + intent: routeExpectationIntent, + selectedRecipe: broadenedSelection.selected_recipe.recipe_id, + requestedResultMode, + resultMode: broadenedResultSemantics.result_mode + }); return { handled: true, reply_text: injectNoticeAfterLeadLine(broadenedFactual.text, broadenedPrefix), @@ -3540,16 +3556,21 @@ export class AddressQueryService { runtime_readiness: "LIVE_QUERYABLE_WITH_LIMITS", limited_reason_category: null, response_type: broadenedFactual.responseType, - ...mergeAddressResultSemantics( - deriveAddressResultSemantics({ - intent: intent.intent, - selectedRecipe: broadenedSelection.selected_recipe.recipe_id, - filters: filters.extracted_filters, - responseType: broadenedFactual.responseType, - rowsMatched: broadenedFilteredRows.length - }), - broadenedFactual.semantics - ), + capability_id: capabilityAudit.capabilityId, + capability_layer: capabilityAudit.layer, + capability_route_mode: capabilityAudit.routeMode, + capability_route_enabled: capabilityAudit.enabled, + capability_route_reason: capabilityAudit.reason, + shadow_route_intent: shadowRouteAudit.intent, + shadow_route_selected_recipe: shadowRouteAudit.selectedRecipe, + shadow_route_status: shadowRouteAudit.status, + route_expectation_status: broadenedRouteExpectationAudit.status, + route_expectation_reason: broadenedRouteExpectationAudit.reason, + route_expectation_expected_selected_recipes: broadenedRouteExpectationAudit.expectedSelectedRecipes, + route_expectation_expected_requested_result_modes: + broadenedRouteExpectationAudit.expectedRequestedResultModes, + route_expectation_expected_result_modes: broadenedRouteExpectationAudit.expectedResultModes, + ...broadenedResultSemantics, limitations: broadenedLimitations, reasons: withConfirmedBalanceFallbackReason( broadenedReasons, diff --git a/llm_normalizer/backend/src/services/address_runtime/decomposeStage.ts b/llm_normalizer/backend/src/services/address_runtime/decomposeStage.ts index 8ab5227..3736920 100644 --- a/llm_normalizer/backend/src/services/address_runtime/decomposeStage.ts +++ b/llm_normalizer/backend/src/services/address_runtime/decomposeStage.ts @@ -306,6 +306,34 @@ function mapCounterpartyIntentToContractIntent(intent: AddressIntent): AddressIn return null; } +function isInventoryIntent(intent: AddressIntent | undefined): boolean { + return ( + intent === "inventory_on_hand_as_of_date" || + intent === "inventory_purchase_provenance_for_item" || + intent === "inventory_purchase_documents_for_item" || + intent === "inventory_supplier_stock_overlap_as_of_date" || + intent === "inventory_sale_trace_for_item" || + intent === "inventory_purchase_to_sale_chain" || + intent === "inventory_aging_by_purchase_date" + ); +} + +function hasSelectedObjectInventorySignal(text: string): boolean { + return /(?:по\s+выбранному\s+объекту|for\s+selected\s+object)/iu.test(String(text ?? "")); +} + +function hasInventorySupplierFollowupCue(text: string): boolean { + return /(?:кто\s+(?:(?:это|этот\s+товар|эту\s+позицию)\s+)?(?:нам\s+)?поставил|кто\s+(?:нам\s+)?поставил\s+(?:это|этот\s+товар|эту\s+позицию)|от\s+какого\s+поставщика|у\s+какого\s+поставщика|от\s+кого\s+куплен|supplier|vendor|поставщик)/iu.test( + String(text ?? "") + ); +} + +function hasInventoryPurchaseDocumentsFollowupCue(text: string): boolean { + return /(?:по\s+каким\s+документам\s+(?:это|его|этот\s+товар|эту\s+позицию)\s+купили|по\s+каким\s+документам\s+(?:был\s+)?куплен|какими\s+документами\s+(?:это|его|этот\s+товар|эту\s+позицию)\s+купили|какими\s+документами\s+(?:был\s+)?куплен|purchase\s+documents|documents\s+of\s+purchase|through\s+which\s+documents)/iu.test( + String(text ?? "") + ); +} + export function hasAddressFollowupContextSignal(text: string): boolean { const normalized = String(text ?? "").trim(); if (!normalized) { @@ -752,6 +780,39 @@ function deriveIntentWithFollowupContext( }; } + const previousIsInventoryFamily = isInventoryIntent(previousIntent); + const inventorySelectedObjectFollowup = + hasSelectedObjectInventorySignal(normalizedMessage) || (previousIsInventoryFamily && hasFollowupSignal); + if (inventorySelectedObjectFollowup && hasInventorySupplierFollowupCue(normalizedMessage)) { + if ( + detectedIntent.intent === "unknown" || + detectedIntent.intent === "inventory_on_hand_as_of_date" || + detectedIntent.intent === previousIntent + ) { + return { + intent: "inventory_purchase_provenance_for_item", + confidence: "low", + reasons: [...detectedIntent.reasons, "intent_adjusted_to_inventory_followup_context"] + }; + } + } + + if (inventorySelectedObjectFollowup && hasInventoryPurchaseDocumentsFollowupCue(normalizedMessage)) { + if ( + detectedIntent.intent === "unknown" || + detectedIntent.intent === "list_documents_by_counterparty" || + detectedIntent.intent === "list_documents_by_contract" || + detectedIntent.intent === "inventory_on_hand_as_of_date" || + detectedIntent.intent === previousIntent + ) { + return { + intent: "inventory_purchase_documents_for_item", + confidence: "low", + reasons: [...detectedIntent.reasons, "intent_adjusted_to_inventory_followup_context"] + }; + } + } + if (hasPreviousContract) { if (detectedIntent.intent === "list_contracts_by_counterparty") { if (hasBankSignal(normalizedMessage)) { diff --git a/llm_normalizer/backend/tests/addressInventorySelectedObjectFollowup.test.ts b/llm_normalizer/backend/tests/addressInventorySelectedObjectFollowup.test.ts index 2202a11..432ed42 100644 --- a/llm_normalizer/backend/tests/addressInventorySelectedObjectFollowup.test.ts +++ b/llm_normalizer/backend/tests/addressInventorySelectedObjectFollowup.test.ts @@ -103,6 +103,8 @@ describe("inventory selected-object follow-up", () => { expect(result?.debug.extracted_filters?.as_of_date).toBe("2021-03-31"); expect(result?.debug.extracted_filters?.period_from).toBe("2021-03-01"); expect(result?.debug.extracted_filters?.period_to).toBe("2021-03-31"); + expect(result?.debug.capability_id).toBe("inventory_inventory_purchase_provenance_for_item"); + expect(result?.debug.capability_route_mode).toBe("exact"); expect(result?.debug.reasons).toContain("period_window_auto_broadened_to_available_data"); expect(result?.debug.limitations).toContain("period_window_auto_broadened_to_available_data"); const replyLines = String(result?.reply_text ?? "").split("\n"); @@ -111,4 +113,97 @@ describe("inventory selected-object follow-up", () => { expect(replyLines[1]).toContain("По окну 2021-03-01..2021-03-31 строк не найдено"); expect(executeAddressMcpQueryMock).toHaveBeenCalledTimes(2); }); + + it("handles selected-object supplier slang 'кто это поставил нам' as provenance follow-up", async () => { + executeAddressMcpQueryMock.mockResolvedValueOnce({ + fetched_rows: 1, + matched_rows: 1, + raw_rows: [ + { + Period: "2019-02-11T00:00:00Z", + Registrator: "Поступление товаров и услуг 00000000077 от 11.02.2019 0:00:00", + AccountDt: "41.01", + AccountKt: "60.01", + Amount: 3724.17, + SubcontoDt1: "Столешница 600*3050*26 дуб ниагара", + SubcontoDt3: "Основной склад", + SubcontoKt1: "Торговый дом \\Союз МСК\\", + SubcontoKt2: "Договор поставки № 12 от 01.02.2019", + Organization: "ООО \\Альтернатива Плюс\\" + } + ], + rows: [], + error: null + }); + + const service = new AddressQueryService(); + const result = await service.tryHandle('По выбранному объекту "Столешница 600*3050*26 дуб ниагара": кто это поставил нам', { + followupContext: { + previous_intent: "inventory_on_hand_as_of_date", + previous_filters: { + as_of_date: "2019-03-31", + period_from: "2019-03-01", + period_to: "2019-03-31", + warehouse: "Основной склад", + organization: "ООО \\Альтернатива Плюс\\" + }, + previous_anchor_type: "unknown", + previous_anchor_value: null + } + }); + + expect(result?.handled).toBe(true); + expect(result?.response_type).toBe("FACTUAL_SUMMARY"); + expect(result?.debug.detected_intent).toBe("inventory_purchase_provenance_for_item"); + expect(result?.debug.extracted_filters?.item).toBe("Столешница 600*3050*26 дуб ниагара"); + expect(result?.debug.extracted_filters?.as_of_date).toBe("2019-03-31"); + expect(String(result?.reply_text ?? "")).toContain("Торговый дом \\Союз МСК\\"); + }); + + it("handles selected-object purchase-doc slang 'по каким документам это купили' as exact purchase-doc follow-up", async () => { + executeAddressMcpQueryMock.mockResolvedValueOnce({ + fetched_rows: 1, + matched_rows: 1, + raw_rows: [ + { + Period: "2019-02-11T00:00:00Z", + Registrator: "Поступление товаров и услуг 00000000077 от 11.02.2019 0:00:00", + AccountDt: "41.01", + AccountKt: "60.01", + Amount: 3724.17, + SubcontoDt1: "Столешница 600*3050*26 дуб ниагара", + SubcontoDt3: "Основной склад", + SubcontoKt1: "Торговый дом \\Союз МСК\\", + SubcontoKt2: "Договор поставки № 12 от 01.02.2019", + Organization: "ООО \\Альтернатива Плюс\\" + } + ], + rows: [], + error: null + }); + + const service = new AddressQueryService(); + const result = await service.tryHandle('По выбранному объекту "Столешница 600*3050*26 дуб ниагара": по каким документам это купили', { + followupContext: { + previous_intent: "inventory_purchase_provenance_for_item", + previous_filters: { + as_of_date: "2019-03-31", + period_from: "2019-03-01", + period_to: "2019-03-31", + item: "Столешница 600*3050*26 дуб ниагара", + warehouse: "Основной склад" + }, + previous_anchor_type: "unknown", + previous_anchor_value: null + } + }); + + expect(result?.handled).toBe(true); + expect(result?.response_type).toBe("FACTUAL_LIST"); + expect(result?.debug.detected_intent).toBe("inventory_purchase_documents_for_item"); + expect(result?.debug.selected_recipe).toBe("address_inventory_purchase_documents_for_item_v1"); + expect(result?.debug.extracted_filters?.item).toBe("Столешница 600*3050*26 дуб ниагара"); + expect(result?.debug.extracted_filters?.as_of_date).toBe("2019-03-31"); + expect(String(result?.reply_text ?? "")).toContain("Поступление товаров и услуг 00000000077"); + }); }); diff --git a/llm_normalizer/backend/tests/addressQueryRuntimeM23.test.ts b/llm_normalizer/backend/tests/addressQueryRuntimeM23.test.ts index f40d01f..8ee8c0e 100644 --- a/llm_normalizer/backend/tests/addressQueryRuntimeM23.test.ts +++ b/llm_normalizer/backend/tests/addressQueryRuntimeM23.test.ts @@ -173,6 +173,14 @@ describe("address query shape classifier", () => { expect(filters.item).toBe("Кромка с клеем 33 альмандин 137 м"); }); + it("extracts item anchor from selected-object purchase-doc follow-up without explicit word товар", () => { + const filters = extractAddressFilters( + 'По выбранному объекту "Столешница 600*3050*26 дуб ниагара": по каким документам это купили', + "inventory_purchase_documents_for_item" + ).extracted_filters; + expect(filters.item).toBe("Столешница 600*3050*26 дуб ниагара"); + }); + it("keeps colloquial selected-object supplier follow-up in inventory provenance intent", () => { const mode = detectAddressQuestionMode( 'По выбранному объекту "Кромка с клеем 33 альмандин 137 м": кто поставил этот товар' @@ -184,6 +192,28 @@ describe("address query shape classifier", () => { expect(result.intent).toBe("inventory_purchase_provenance_for_item"); }); + it("keeps selected-object supplier slang with 'кто это поставил нам' in inventory provenance intent", () => { + const mode = detectAddressQuestionMode( + 'По выбранному объекту "Столешница 600*3050*26 дуб ниагара": кто это поставил нам' + ); + const result = resolveAddressIntent( + 'По выбранному объекту "Столешница 600*3050*26 дуб ниагара": кто это поставил нам' + ); + expect(mode.mode).toBe("address_query"); + expect(result.intent).toBe("inventory_purchase_provenance_for_item"); + }); + + it("keeps selected-object purchase-doc slang with 'по каким документам это купили' in purchase-doc intent", () => { + const mode = detectAddressQuestionMode( + 'По выбранному объекту "Столешница 600*3050*26 дуб ниагара": по каким документам это купили' + ); + const result = resolveAddressIntent( + 'По выбранному объекту "Столешница 600*3050*26 дуб ниагара": по каким документам это купили' + ); + expect(mode.mode).toBe("address_query"); + expect(result.intent).toBe("inventory_purchase_documents_for_item"); + }); + it("keeps full supplier anchor with comma suffix for stock-overlap questions", () => { const filters = extractAddressFilters( "Какие товары от поставщика Гамма-мебель, ООО сейчас еще лежат на складе Основной склад?", @@ -3874,6 +3904,49 @@ describe("address query limited taxonomy and stage diagnostics", { timeout: 1500 }); describe("address decompose stage follow-up carryover", () => { + it("promotes selected-object supplier slang follow-up into inventory provenance with inherited date context", () => { + const result = runAddressDecomposeStage('По выбранному объекту "Столешница 600*3050*26 дуб ниагара": кто это поставил нам', { + previous_intent: "inventory_on_hand_as_of_date", + previous_filters: { + as_of_date: "2019-03-31", + period_from: "2019-03-01", + period_to: "2019-03-31", + warehouse: "Основной склад" + }, + previous_anchor_type: "unknown", + previous_anchor_value: null + }); + expect(result).not.toBeNull(); + expect(result?.intent.intent).toBe("inventory_purchase_provenance_for_item"); + expect(result?.filters.extracted_filters.as_of_date).toBe("2019-03-31"); + expect( + result?.baseReasons?.includes("intent_adjusted_to_inventory_followup_context") || + result?.intent.reasons.includes("inventory_selected_object_provenance_signal_detected") + ).toBe(true); + }); + + it("promotes selected-object purchase-doc slang follow-up into inventory purchase documents with inherited date context", () => { + const result = runAddressDecomposeStage('По выбранному объекту "Столешница 600*3050*26 дуб ниагара": по каким документам это купили', { + previous_intent: "inventory_purchase_provenance_for_item", + previous_filters: { + as_of_date: "2019-03-31", + period_from: "2019-03-01", + period_to: "2019-03-31", + item: "Столешница 600*3050*26 дуб ниагара" + }, + previous_anchor_type: "unknown", + previous_anchor_value: null + }); + expect(result).not.toBeNull(); + expect(result?.intent.intent).toBe("inventory_purchase_documents_for_item"); + expect(result?.filters.extracted_filters.item).toBe("Столешница 600*3050*26 дуб ниагара"); + expect(result?.filters.extracted_filters.as_of_date).toBe("2019-03-31"); + expect( + result?.baseReasons?.includes("intent_adjusted_to_inventory_followup_context") || + result?.intent.reasons.includes("inventory_selected_object_purchase_documents_signal_detected") + ).toBe(true); + }); + it("keeps slang all-customers-all-time wording in address lane via resolved intent fallback", () => { const result = runAddressDecomposeStage("выведи всех заков за все время", null); expect(result).not.toBeNull(); diff --git a/scripts/domain_case_loop.py b/scripts/domain_case_loop.py index a2e75f5..053e368 100644 --- a/scripts/domain_case_loop.py +++ b/scripts/domain_case_loop.py @@ -2120,6 +2120,7 @@ def build_analyst_loop_prompt( - `.codex/agents/domain_analyst.toml` - `.codex/skills/domain-case-loop/SKILL.md` - `.codex/skills/domain-case-loop/references/verdict_template.md` + - `.codex/skills/domain-case-loop/references/business_first_analyst_rubric.md` Current loop context: - loop_dir: `{loop_dir}` @@ -2135,11 +2136,13 @@ def build_analyst_loop_prompt( Goal: - evaluate current domain-pack correctness for business meaning, route/capability quality, evidence quality, and absence of silent heuristic masking; + - evaluate business usefulness, direct-answer-first behavior, state continuity, and field truthfulness, not only technical groundedness; - determine whether the gate `quality_score >= {target_score}` is reached; - if not, provide the smallest high-value fix targets for the coder. Rules: - `accepted` is allowed only if quality_score >= {target_score}, unresolved_p0_count = 0, and regression_detected = false; + - `accepted` also requires `direct_answer_ok = true` and `business_usefulness_ok = true`; - `partial` means the pack is usable but exactness, routing, or coverage is still insufficient; - `needs_exact_capability` means the primary blocker is a missing exact route or capability, but the loop should still continue autonomously unless a user decision is required; - `continue` means there is a clear next patch cycle; @@ -2152,6 +2155,10 @@ def build_analyst_loop_prompt( - if `requires_user_decision = true`, fill `user_decision_type` and `user_decision_prompt`; - if the pack is below {target_score} but there is still safe autonomous implementation work, keep `requires_user_decision = false`; - do not request user input merely because the score is still below {target_score}; request it only when the loop would otherwise guess, overfit, or risk architecture drift. + - return machine-readable fields for: `user_intent_summary`, `expected_direct_answer`, `actual_direct_answer`, `direct_answer_ok`, `business_usefulness_ok`, `business_utility_score`, `direct_answer_priority_score`, `state_continuity_score`, `answer_shape_score`, `evidence_clarity_score`, `root_cause_layers`, `broken_edge_ids`, `violated_invariants`; + - if the product found the evidence but failed to retain the selected object, provenance bundle, or another reusable resolved object across turns, classify that as `object_memory_gap` or `edge_carryover_gap`, not as a generic route problem; + - if the surfaced business field looks mislabeled, for example supplier vs organization, classify that as `field_mapping_gap`; + - if the answer is technically grounded but still weak for a manager/accountant/operator, classify that as `business_utility_gap`. Use this UTF-8 evidence bundle as the source of truth for artifact contents. Do not treat shell rendering artifacts as file corruption if the embedded bundle is readable. @@ -2196,6 +2203,9 @@ def build_coder_loop_prompt( - do not present heuristic answers as confirmed; - do not touch unrelated files; - preserve already successful baseline flows. + - use `root_cause_layers`, `broken_edge_ids`, `violated_invariants`, and business-utility scores from the analyst verdict to choose the smallest fix; + - prioritize state continuity, selected-object persistence, direct-answer-first behavior, and field-truth mapping when those are the blocking layers; + - do not broaden scope when the analyst says the defect is mainly `object_memory_gap`, `field_mapping_gap`, `answer_shape_mismatch`, or `business_utility_gap`. Required outputs: - create `{iteration_dir / 'coder_plan.md'}` with a short plan; @@ -2217,12 +2227,21 @@ def evaluate_analyst_gate( quality_score = int(verdict.get("quality_score") or 0) unresolved_p0_count = int(verdict.get("unresolved_p0_count") or 0) regression_detected = bool(verdict.get("regression_detected")) + direct_answer_ok = bool(verdict.get("direct_answer_ok", True)) + business_usefulness_ok = bool(verdict.get("business_usefulness_ok", True)) loop_decision = str(verdict.get("loop_decision") or "").strip() or "continue" requires_user_decision = bool(verdict.get("requires_user_decision")) user_decision_type = str(verdict.get("user_decision_type") or "").strip() or "none" user_decision_prompt_raw = verdict.get("user_decision_prompt") user_decision_prompt = str(user_decision_prompt_raw).strip() if user_decision_prompt_raw else None - accepted = quality_score >= target_score and unresolved_p0_count == 0 and not regression_detected and loop_decision == "accepted" + accepted = ( + quality_score >= target_score + and unresolved_p0_count == 0 + and not regression_detected + and direct_answer_ok + and business_usefulness_ok + and loop_decision == "accepted" + ) return accepted, loop_decision, requires_user_decision, user_decision_type, user_decision_prompt