ОРРКЕСТРАЦИЯ - Усилить domain-loop business-first каноном для analyst и orchestrator

This commit is contained in:
dctouch 2026-04-14 17:33:57 +03:00
parent f6a2c8e0a3
commit cb0eb450d7
19 changed files with 813 additions and 66 deletions

View File

@ -17,21 +17,25 @@ You read:
Your job is to produce a detailed verdict in Russian with strong business focus.
Always answer in a strict structure:
1. Смысл вопроса
2. Главный пользовательский путь и дерево сценария
3. Что реально посчитано
4. Где расхождение по бизнес-смыслу
5. Где route / capability mismatch
6. Evidence quality
7. P0 defects
8. P1 defects
9. P2 defects
10. Minimal patch directions
11. Acceptance matrix for rerun
12. Acceptance criteria for rerun
13. Quality score
14. Loop decision
When the caller asks for prose, use this strict structure:
1. Question meaning
2. Primary user path and scenario tree
3. Expected direct answer
4. What the system actually computed
5. Business mismatch
6. Route / capability mismatch
7. State continuity and selected-object memory
8. Field truth and evidence quality
9. P0 defects
10. P1 defects
11. P2 defects
12. Minimal patch directions
13. Acceptance matrix for rerun
14. Acceptance criteria for rerun
15. Quality score
16. Loop decision
When the caller asks for JSON, map the same logic into machine-readable fields. Do not collapse the business analysis into one generic summary.
Rules:
- Call out non-business garbage explicitly.
@ -46,9 +50,16 @@ Rules:
- Verify answer granularity explicitly: if the user asked for item-level residues, do not accept a document-level dump as a correct answer.
- Verify sort/order semantics when the wording implies chronology or ranking, for example `старые закупки` should be oldest-first.
- Treat the acceptance unit as a scenario tree, not a flat list of prompts.
- Under `Главный пользовательский путь и дерево сценария`, explicitly name the root node, critical child nodes, critical edges, and the primary user path.
- Under `Acceptance matrix for rerun`, list at least the critical nodes/edges and mark each one by wording family: `canonical`, `colloquial`, `ui_selected_object`.
- Distinguish these defect classes explicitly when relevant: `semantic_understanding_gap`, `edge_carryover_gap`, `answer_shape_mismatch`, `ordering_semantics_mismatch`, `runtime_capability_gap`, `loop_coverage_gap`.
- Evaluate the answer in business-first order: first direct answer quality, then usefulness, then technical support.
- Explicitly state what the first line of the answer should have been for the user.
- If the answer is technically grounded but business-useless, say so directly and lower the score.
- Treat selected-object continuity and reusable answer-object memory as first-class analysis objects.
- Call out when the runtime found the underlying document/trace but failed to retain the resolved business object for the next follow-up.
- Distinguish `object_memory_gap`, `field_mapping_gap`, `business_utility_gap`, and `domain_anchor_gap` from pure route gaps.
- Check field truth explicitly: supplier must not be mislabeled as organization, buyer must not be mislabeled as organization, and document-side fields must not be presented as business truth without evidence.
- Under the scenario-tree section, explicitly name the root node, critical child nodes, critical edges, and the primary user path.
- Under the acceptance matrix, list at least the critical nodes/edges and mark each one by wording family: `canonical`, `colloquial`, `ui_selected_object`.
- Distinguish these defect classes explicitly when relevant: `semantic_understanding_gap`, `edge_carryover_gap`, `object_memory_gap`, `field_mapping_gap`, `answer_shape_mismatch`, `ordering_semantics_mismatch`, `runtime_capability_gap`, `business_utility_gap`, `loop_coverage_gap`, `domain_anchor_gap`.
- If the root node works but the primary user path is broken at the first selected-object drilldown, treat that as a real failure of domain hardening.
- If the runtime nearly supports the path but the loop never validated the realistic wording family, call it `loop_coverage_gap`, not product success.
@ -56,6 +67,6 @@ Quality score:
- Output one integer score from 0 to 100.
- Score >= 80 means the case can be accepted only if there is no unresolved P0.
- Score >= 80 also requires the primary user path and its critical edges to be green across canonical, colloquial, and UI-selected-object coverage where applicable.
- If score < 80, loop_decision must be continue, partial, blocked, or needs_exact_capability.
- Score >= 80 also requires `direct_answer_ok = true` and `business_usefulness_ok = true` for the primary user path.
"""
nickname_candidates = ["Lens", "Vector", "Delta"]

View File

@ -1,5 +1,5 @@
name = "orchestrator"
description = "Coordinates a repo-native domain-case or scenario loop for NDC_1C: baseline or scenario capture, analyst verdict, minimal domain patch, rerun, and 80-point acceptance gate."
description = "Coordinates a repo-native domain-case or scenario loop for NDC_1C: baseline or scenario capture, minimal domain patching, rerun, and business-first acceptance."
model = "gpt-5.4"
model_reasoning_effort = "high"
sandbox_mode = "workspace-write"
@ -46,6 +46,10 @@ Hard rules:
- For cascading date-sensitive scenarios, rerun at least one `на эту дату` / `на ту дату` follow-up and verify that the originating date or period survives into debug filters.
- If the business question asks for residues/items/contracts but the answer switched to raw documents or movements, treat that as a real defect, not as acceptable detail.
- If the wording implies chronology or ranking such as `старые закупки`, verify oldest-first ordering explicitly.
- Require the analyst to judge business usefulness, not only technical groundedness.
- Require the analyst to judge whether the direct answer appears in the first line when the user asked a direct lookup question.
- Treat selected-object continuity, pronoun resolution, and reusable resolved-object state as mandatory audit targets for follow-up-heavy domains.
- Distinguish runtime capability gaps from state-layer continuity gaps and from business-presentation gaps before choosing coder tasks.
- If the root node works but the first critical selected-object or drilldown edge is still broken, do not treat the scenario as hardened.
- Require an explicit `scenario_acceptance_matrix.md` artifact for follow-up-heavy domains and packs.
- Use the matrix to drive coder tasks: patch the narrowest broken edge or wording family first, not the whole domain at once.
@ -57,6 +61,7 @@ Acceptance gate:
- accepted requires no business-critical regression in rerun
- accepted requires green critical edges on the primary user path
- accepted requires green coverage for canonical + colloquial + UI-selected-object variants on critical branches when those branches exist in the product UX
- accepted requires `direct_answer_ok = true` and `business_usefulness_ok = true` on the primary user path
Required artifacts per cycle:
- case_brief.md

View File

@ -28,6 +28,7 @@ This skill packages the standard workflow for iterating on one concrete domain c
Read `references/repo_runtime_map.md` before the first real cycle.
For follow-up-heavy domains, also read `references/scenario_tree_acceptance_canon.md` before scenario mode, pack mode, or autonomous pack-loop mode.
For business-first analyst work, also read `references/business_first_analyst_rubric.md` before redefining acceptance or hardening a noisy-but-technically-grounded domain.
If `docs/orchestration/active_domain_contract.json` exists, treat it as the single mutable source of truth for the current domain and prefer it over older scattered pool/pack prose docs.
Use these repo-native capture paths:
@ -136,6 +137,7 @@ The verdict must explicitly say whether the case is:
- a missing route/intent/capability inside project scope;
- a true out-of-scope request.
- a `runtime_capability_gap`, `semantic_understanding_gap`, `edge_carryover_gap`, `answer_shape_mismatch`, `ordering_semantics_mismatch`, or `loop_coverage_gap`.
- an `object_memory_gap`, `field_mapping_gap`, `business_utility_gap`, or `domain_anchor_gap` when that is the real blocker.
### Step 4 - Domain patch
@ -208,6 +210,9 @@ Accepted requires:
- Treat answer-shape mismatch as a scoring defect: if the user asked for items / residues / contracts, do not accept an answer that switched to raw documents, movements, or another lower-level object without saying so explicitly.
- Treat ordering semantics as part of correctness when the wording implies ranking or chronology, for example `старые закупки` => oldest-first rather than newest-first.
- Treat primary user-path failures as more important than supporting-path polish: if the user cannot go from root list -> selected object -> first drilldown, the scenario is not accepted.
- Treat direct-answer-first behavior as part of correctness: if the user asked a direct lookup question, the first line must contain the direct answer before the evidence blocks.
- Treat business usefulness as part of correctness: factual-but-business-useless output is not acceptance-quality output.
- Treat stable follow-up object memory as part of correctness: when the prior turn already resolved the relevant item/object, the next turn must not re-ask for it.
## Domain-specific framing

View File

@ -0,0 +1,128 @@
# Business-first analyst rubric
Use this rubric when evaluating one domain case, one multi-step scenario, or one full domain pack.
The analyst must not stop at route/debug correctness. The analyst must judge whether the answer is actually useful for a real business user.
## Core principle
The analyst evaluates five layers at once:
- user intent;
- scenario tree and state continuity;
- business usefulness of the answer;
- evidence and field truthfulness;
- root cause and smallest defensible fix direction.
## Required analyst questions
For every critical turn or critical edge, answer these questions explicitly:
1. What did the user really ask?
- State the business meaning in one short sentence.
- Name the minimum direct answer the user expected.
2. What should the first line of the answer have been?
- If the user asked a direct lookup question, the first line must contain the direct answer.
- Technical explanation, limitations, and evidence come after the direct answer.
3. What object and scope had to survive from previous turns?
- selected item / selected contract / selected counterparty;
- originating date or period;
- warehouse or organization scope when still relevant;
- reusable resolved bundle, for example provenance trace or sale trace.
4. Did the answer stay on the same business object?
- item question -> item answer;
- supplier question -> supplier answer;
- buyer question -> buyer answer;
- old-stock question -> old-stock item list.
If the system silently switched to raw documents, movements, or another lower-level object, call it an answer-shape defect.
5. Are the surfaced fields truthful and correctly labeled?
- do not confuse supplier with organization;
- do not confuse buyer with organization;
- do not present a document-side technical field as a business truth unless that mapping is proven.
## Business usefulness rules
An answer is not accepted as business-useful when any of these are true:
- the direct answer is not placed first;
- the answer opens with technical hedging instead of the user-facing result;
- a weaker question is answered than the one the user asked;
- the answer requires the user to reconstruct the conclusion from low-level evidence;
- the answer uses ambiguous field labels for business-critical entities.
## State continuity rules
Follow-up continuity is a first-class acceptance object.
The analyst must verify:
- selected object continuity;
- date/period continuity;
- reusable evidence continuity;
- pronoun resolution continuity.
Important pronoun examples:
- `эту позицию`
- `этот товар`
- `его`
- `по нему`
- `по этой позиции`
If the previous turn already resolved a concrete object, the next turn must reuse it instead of asking for the anchor again.
## Reusable answer-object cache
For follow-up-heavy domains, the analyst should explicitly look for evidence that the product behaves as if it had a reusable resolved object bundle.
Examples:
- `current_item`
- `current_as_of_date`
- `current_provenance_trace`
- `current_sale_trace`
- `first_purchase_date`
- `supplier_if_known`
- `source_document_if_known`
If the runtime recomputes everything from scratch and loses the already resolved object, call that out as a state-layer defect.
## Root-cause layers
Use one or more of these root-cause layers explicitly:
- `semantic_understanding_gap`
- `runtime_capability_gap`
- `edge_carryover_gap`
- `object_memory_gap`
- `field_mapping_gap`
- `answer_shape_mismatch`
- `ordering_semantics_mismatch`
- `business_utility_gap`
- `loop_coverage_gap`
- `domain_anchor_gap`
## Minimum machine-readable verdict fields
The analyst verdict should expose at least:
- `user_intent_summary`
- `expected_direct_answer`
- `actual_direct_answer`
- `direct_answer_ok`
- `business_usefulness_ok`
- `business_utility_score`
- `direct_answer_priority_score`
- `state_continuity_score`
- `answer_shape_score`
- `evidence_clarity_score`
- `root_cause_layers`
- `broken_edge_ids`
- `violated_invariants`
## Inventory-specific reminders
For inventory follow-up chains, verify all of these:
- the selected item remains the current focus object after the user clicks a result;
- provenance questions answer supplier/date/document first, not only raw movement rows;
- `когда купили` can reuse the already resolved provenance bundle;
- supplier and organization are not mixed up in the surfaced answer;
- `на эту дату` keeps the original stock date unless the user explicitly changed it.

View File

@ -9,6 +9,10 @@
## Expected business meaning
- ...
## Expected direct answer
- first line should say:
- minimum acceptable business answer:
## Expected capability
- ...
@ -31,6 +35,13 @@
- warehouse if relevant
- organization if relevant
- expected answer shape
- direct-answer-first when the user asked a direct lookup question
- reusable resolved-object continuity when the user asks a follow-up about the same selected object
## Field truth constraints
- do not confuse supplier with organization
- do not confuse buyer with organization
- do not surface technical document-side fields as business truth without proof
## Contour status
- in_contour / outside_current_contour / unknown
@ -53,3 +64,5 @@
- root node works
- critical edges on the primary user path work
- colloquial and UI-generated follow-up variants work
- direct answer is placed first where expected
- output is business-useful, not only technically grounded

View File

@ -12,6 +12,16 @@ The unit of acceptance is a **scenario tree**:
If the root works but a critical child transition breaks, the domain is **not** hardened.
## Business-first framing
Every accepted node or edge must be both technically grounded and business-useful.
This means:
- the direct answer is surfaced first when the user asked a direct lookup question;
- the answer stays on the requested business object;
- evidence and caveats support the answer instead of replacing it;
- field labels are truthful for business entities such as supplier, buyer, organization, warehouse, and document.
## Model the domain as a tree
For each scenario, define:
@ -36,7 +46,8 @@ A node is considered covered only if all of these are true:
- the expected intent / capability is selected;
- the answer shape matches the requested business object;
- the answer begins with a direct user-facing answer when such an answer is expected;
- the answer is evidence-backed rather than heuristic-masked.
- the answer is evidence-backed rather than heuristic-masked;
- the surfaced business fields are truthful and not mislabeled.
Examples:
- asking for supplier provenance must answer with the supplier first, not only with raw documents;
@ -53,9 +64,23 @@ Typical invariants:
- warehouse survives if the follow-up still targets the same stock slice
- organization survives if the previous slice was organization-bound
- route family remains in the same business contour unless the user clearly changed intent
- reusable resolved-object state survives when the previous turn already answered a closely related lookup
- pronoun references can reuse the active focus object when the wording supports it
If an edge loses a required invariant, that is a real regression even if the target node works in isolation.
## Resolved answer-object continuity
For follow-up-heavy domains, the analyst should treat resolved business objects as reusable state, not as disposable one-turn artifacts.
Examples:
- selected inventory item
- resolved supplier provenance bundle
- resolved buyer bundle
- resolved purchase document bundle
If turn N already resolved such an object and turn N+1 asks a natural follow-up about the same object, the system should reuse that state instead of demanding the same anchor again.
## Mandatory paraphrase families
Every critical node or edge must be validated in a small paraphrase family instead of one curated wording only.
@ -65,11 +90,6 @@ Minimum family:
- `colloquial`
- `ui_selected_object`
Examples:
- canonical: `От какого поставщика куплен товар X`
- colloquial: `Кто поставил этот товар`
- ui_selected_object: `По выбранному объекту "X": кто это поставил нам`
If canonical works but colloquial or UI-generated follow-up fails, the node/edge is not accepted.
## Acceptance matrix
@ -86,6 +106,8 @@ Minimum matrix columns:
- expected capability / recipe
- required carryover invariants
- expected answer shape
- expected direct answer
- business usefulness expectation
- actual outcome
- status (`pass`, `partial`, `fail`)
- defect class
@ -95,17 +117,25 @@ Minimum matrix columns:
Use these classes explicitly:
- `semantic_understanding_gap`
- `edge_carryover_gap`
- `object_memory_gap`
- `field_mapping_gap`
- `answer_shape_mismatch`
- `ordering_semantics_mismatch`
- `runtime_capability_gap`
- `business_utility_gap`
- `domain_anchor_gap`
- `loop_coverage_gap`
Definitions:
- `semantic_understanding_gap`: the system did not understand the real user meaning
- `edge_carryover_gap`: the follow-up lost date / object / scope across steps
- `object_memory_gap`: the system resolved the object once but failed to retain it for the next follow-up
- `field_mapping_gap`: the answer surfaced the wrong business field or mislabeled a field
- `answer_shape_mismatch`: the business object in the answer does not match the requested object
- `ordering_semantics_mismatch`: ranking / chronology semantics are wrong
- `runtime_capability_gap`: the product contour truly lacks the route / intent / capability / extractor / recipe
- `business_utility_gap`: the answer may be grounded but is still not useful as a user-facing result
- `domain_anchor_gap`: the scenario uses a weak or wrong observed anchor, so the tree is semantically mis-specified
- `loop_coverage_gap`: the runtime could support the path or nearly support it, but the analyst/orchestrator never treated that path as mandatory acceptance coverage
## Analyst responsibilities
@ -116,6 +146,9 @@ The analyst must:
- call out broken edges explicitly;
- verify colloquial and UI-generated variants as first-class coverage;
- verify direct-answer-first behavior where the user asked a direct lookup question;
- verify business usefulness explicitly, not only technical validity;
- verify field truthfulness for surfaced supplier / buyer / organization labels;
- verify selected-object continuity and reusable object memory;
- verify answer granularity and ordering semantics;
- lower the score when any critical edge or paraphrase family is broken.
@ -136,7 +169,9 @@ Do not accept a domain when:
- selected-object follow-up is broken;
- `на эту дату` / `на ту дату` loses the originating date;
- the answer shape is wrong for the business question;
- chronology / ranking semantics are inverted.
- chronology / ranking semantics are inverted;
- the direct answer is not surfaced first on direct lookup questions;
- the answer is technically grounded but still business-useless.
Accepted requires:
- score >= 80
@ -144,3 +179,5 @@ Accepted requires:
- critical path edges pass
- canonical + colloquial + UI-selected-object variants pass for critical branches
- no silent heuristic masking
- `direct_answer_ok = true`
- `business_usefulness_ok = true`

View File

@ -1,55 +1,74 @@
# Verdict
## 1. Смысл вопроса
## 1. Question meaning
...
## 2. Главный пользовательский путь и дерево сценария
## 2. Primary user path and scenario tree
- root:
- critical child nodes:
- critical edges:
- primary user path:
## 3. Что реально посчитано
## 3. Expected direct answer
- what the first line should say:
- minimum acceptable business answer:
## 4. What the system actually computed
...
## 4. Где расхождение по бизнес-смыслу
## 5. Business mismatch
- did the answer solve the user's real question:
- did the direct answer appear first:
- is the answer usable for an operator/accountant/manager:
## 6. Route / capability mismatch
...
## 5. Где route / capability mismatch
...
## 7. State continuity and selected-object memory
- selected object continuity:
- date/period continuity:
- reusable answer-object continuity:
- pronoun resolution continuity:
## 6. Evidence quality
- exact / partial / heuristic / technical insufficiency
- why
## 8. Field truth and evidence quality
- supplier vs organization:
- buyer vs organization:
- exact / partial / heuristic / technical insufficiency:
- why:
## 7. P0 defects
## 9. P0 defects
- ...
## 8. P1 defects
## 10. P1 defects
- ...
## 9. P2 defects
## 11. P2 defects
- ...
## 10. Minimal patch directions
## 12. Minimal patch directions
- ...
## 11. Acceptance matrix for rerun
## 13. Acceptance matrix for rerun
- Node / edge coverage:
- Canonical wording:
- Colloquial wording:
- UI-generated selected-object wording:
- Carryover invariants:
- Expected answer shape:
- Expected direct answer:
- Business usefulness:
- Defect class:
## 12. Acceptance criteria for rerun
## 14. Acceptance criteria for rerun
- ...
- Include colloquial/slang variants and UI-generated selected-object follow-up variants when they are part of the business flow.
- Require the primary user path to pass end-to-end, not only the root node.
- Require direct-answer-first behavior on direct lookup questions.
- Require business-useful output rather than technically-grounded-but-noisy output.
- Require selected-object continuity and reusable answer-object continuity on follow-up chains.
## 13. Quality score
## 15. Quality score
- integer from 0 to 100
## 14. Loop decision
## 16. Loop decision
- accepted / continue / partial / blocked / needs_exact_capability

View File

@ -26,6 +26,7 @@ Rules:
- Do not accept a domain when only the root snapshot works but selected-object or drilldown follow-up edges still fail.
- For critical branches, validate at least canonical wording, colloquial wording, and UI-generated selected-object wording when that UX exists.
- Treat temporal carryover, selected-object carryover, answer-shape match, and ordering semantics as first-class acceptance invariants rather than optional polish.
- Treat direct-answer-first behavior, business usefulness, selected-object memory, and field truthfulness as first-class analyst criteria rather than optional presentation polish.
- If a case falls outside the current routed contour because the route/intent/capability is not wired yet, treat it as domain enablement work for this project, not as automatic out-of-scope rejection.
- For new unmarked domains, `needs_exact_capability` means "bootstrap or extend the contour" rather than "close the case as unsupported".
- A case can be marked `accepted` only when analyst verdict is at least `80/100`, no unresolved `P0` remains, and the rerun does not mask heuristic output as confirmed.

View File

@ -780,6 +780,12 @@
"required_paraphrase_families": ["canonical", "ui_selected_object"],
"required_carryover_invariants": ["selected_object", "date_scope", "answer_shape"]
},
"bindings": {
"target_date_historical": "2020-03-31",
"focus_item_historical": "Шкаф картотечный 1000*400*2100",
"observed_supplier_candidate": "Гамма-мебель, ООО",
"observed_customer_candidate": "Департамент капитального ремонта города Москвы"
},
"steps": [
{
"step_id": "step_01_account_41_historical",
@ -790,7 +796,7 @@
"title": "Historical account 41 anchor",
"question": "Какие товары числятся на 41 счете на дату {{bindings.target_date_historical}}",
"analysis_context": {
"as_of_date": "2019-03-31",
"as_of_date": "2020-03-31",
"source": "binding_target_date_historical"
},
"expected_capability": "confirmed_inventory_on_hand_as_of_date",
@ -823,13 +829,29 @@
"node_role": "supporting_child",
"paraphrase_family": "canonical",
"title": "Supplier to buyer overlap",
"question": "Какие товары были куплены у поставщика {{bindings.observed_supplier_candidate}} и позже проданы покупателю {{bindings.observed_customer_candidate}}",
"question": "Есть ли документально подтвержденная цепочка: поставщик {{bindings.observed_supplier_candidate}} -> товар {{bindings.focus_item_historical}} -> покупатель {{bindings.observed_customer_candidate}}",
"depends_on": ["step_01_account_41_historical", "step_02_selected_item_buyer"]
}
]
}
]
},
"agent_audit_expectations": {
"direct_answer_first": true,
"business_utility_required": true,
"state_continuity_required": true,
"selected_object_memory_required": true,
"field_truth_checks": [
"supplier_vs_organization",
"buyer_vs_organization"
],
"reusable_answer_object_expectations": [
"current_item",
"current_as_of_date",
"current_provenance_trace",
"current_sale_trace"
]
},
"acceptance_contract": {
"acceptance_unit": "scenario_tree",
"do_not_accept_if": [

View File

@ -5,13 +5,26 @@
"additionalProperties": false,
"required": [
"summary",
"user_intent_summary",
"expected_direct_answer",
"actual_direct_answer",
"quality_score",
"direct_answer_ok",
"business_usefulness_ok",
"business_utility_score",
"direct_answer_priority_score",
"state_continuity_score",
"answer_shape_score",
"evidence_clarity_score",
"loop_decision",
"requires_user_decision",
"user_decision_type",
"user_decision_prompt",
"unresolved_p0_count",
"regression_detected",
"root_cause_layers",
"broken_edge_ids",
"violated_invariants",
"priority_targets",
"acceptance_criteria",
"notes"
@ -20,11 +33,51 @@
"summary": {
"type": "string"
},
"user_intent_summary": {
"type": "string"
},
"expected_direct_answer": {
"type": "string"
},
"actual_direct_answer": {
"type": ["string", "null"]
},
"quality_score": {
"type": "integer",
"minimum": 0,
"maximum": 100
},
"direct_answer_ok": {
"type": "boolean"
},
"business_usefulness_ok": {
"type": "boolean"
},
"business_utility_score": {
"type": "integer",
"minimum": 0,
"maximum": 100
},
"direct_answer_priority_score": {
"type": "integer",
"minimum": 0,
"maximum": 100
},
"state_continuity_score": {
"type": "integer",
"minimum": 0,
"maximum": 100
},
"answer_shape_score": {
"type": "integer",
"minimum": 0,
"maximum": 100
},
"evidence_clarity_score": {
"type": "integer",
"minimum": 0,
"maximum": 100
},
"loop_decision": {
"type": "string",
"enum": ["accepted", "continue", "partial", "blocked", "needs_exact_capability"]
@ -35,7 +88,17 @@
},
"user_decision_type": {
"type": "string",
"enum": ["none", "architecture_fork", "important_business_question", "scope_tradeoff", "data_truth_gap", "missing_required_observation", "risky_workaround", "risky_complexity", "other"],
"enum": [
"none",
"architecture_fork",
"important_business_question",
"scope_tradeoff",
"data_truth_gap",
"missing_required_observation",
"risky_workaround",
"risky_complexity",
"other"
],
"description": "Explain why the loop needs user input. Use none when requires_user_decision is false."
},
"user_decision_prompt": {
@ -49,6 +112,37 @@
"regression_detected": {
"type": "boolean"
},
"root_cause_layers": {
"type": "array",
"items": {
"type": "string",
"enum": [
"semantic_understanding_gap",
"runtime_capability_gap",
"edge_carryover_gap",
"object_memory_gap",
"field_mapping_gap",
"answer_shape_mismatch",
"ordering_semantics_mismatch",
"business_utility_gap",
"loop_coverage_gap",
"domain_anchor_gap",
"other"
]
}
},
"broken_edge_ids": {
"type": "array",
"items": {
"type": "string"
}
},
"violated_invariants": {
"type": "array",
"items": {
"type": "string"
}
},
"priority_targets": {
"type": "array",
"items": {
@ -68,7 +162,23 @@
},
"problem_type": {
"type": "string",
"enum": ["route_gap", "capability_gap", "evidence_gap", "presentation_gap", "regression", "other"]
"enum": [
"route_gap",
"capability_gap",
"evidence_gap",
"presentation_gap",
"semantic_understanding_gap",
"edge_carryover_gap",
"object_memory_gap",
"field_mapping_gap",
"answer_shape_mismatch",
"ordering_semantics_mismatch",
"business_utility_gap",
"loop_coverage_gap",
"domain_anchor_gap",
"regression",
"other"
]
},
"fix_goal": {
"type": "string"

View File

@ -1340,6 +1340,17 @@ function hasInventoryPurchaseDocumentsSignal(text) {
function hasInventorySaleTraceSignal(text) {
return /(?:продаж|покупател|buyer|sale trace|purchase[\s-]?to[\s-]?sale|purchase -> warehouse -> sale|закупка.*продаж)/iu.test(text);
}
function hasSelectedObjectInventoryCue(text) {
return /(?:по\s+выбранному\s+объекту|selected\s+object)/iu.test(text);
}
function hasSelectedObjectInventoryProvenanceSignal(text) {
return (hasSelectedObjectInventoryCue(text) &&
/(?:кто\s+(?:(?:это|этот\s+товар|эту\s+позицию)\s+)?(?:нам\s+)?поставил|кто\s+(?:нам\s+)?поставил\s+(?:это|этот\s+товар|эту\s+позицию)|от\s+какого\s+поставщика|у\s+какого\s+поставщика|от\s+кого\s+куплен|supplier|vendor|поставщик)/iu.test(text));
}
function hasSelectedObjectInventoryPurchaseDocumentsSignal(text) {
return (hasSelectedObjectInventoryCue(text) &&
/(?:по\s+каким\s+документам\s+(?:это|его|этот\s+товар|эту\s+позицию)\s+купили|по\s+каким\s+документам\s+(?:был\s+)?куплен|какими\s+документами\s+(?:это|его|этот\s+товар|эту\s+позицию)\s+купили|какими\s+документами\s+(?:был\s+)?куплен|purchase\s+documents|documents\s+of\s+purchase|through\s+which\s+documents)/iu.test(text));
}
function hasInventoryProvenanceSignalV2(text) {
const hasItemCue = /(?:товар|номенклатур|sku|item|product|остат(?:ок|ки)|склад)/iu.test(text);
const hasSupplierCue = /(?:от\s+какого\s+поставщика|у\s+какого\s+поставщика|от\s+кого\s+куплен|кто\s+(?:нам\s+)?поставил|кем\s+поставлен|поставщик|supplier|vendor)/iu.test(text);
@ -1541,6 +1552,13 @@ function resolveAddressIntent(userMessage) {
reasons: ["inventory_aging_signal_detected"]
};
}
if (hasSelectedObjectInventoryProvenanceSignal(text)) {
return {
intent: "inventory_purchase_provenance_for_item",
confidence: "medium",
reasons: ["inventory_selected_object_provenance_signal_detected"]
};
}
if (hasInventoryProvenanceSignalV2(text)) {
return {
intent: "inventory_purchase_provenance_for_item",
@ -1555,6 +1573,13 @@ function resolveAddressIntent(userMessage) {
reasons: ["inventory_purchase_date_signal_detected"]
};
}
if (hasSelectedObjectInventoryPurchaseDocumentsSignal(text)) {
return {
intent: "inventory_purchase_documents_for_item",
confidence: "medium",
reasons: ["inventory_selected_object_purchase_documents_signal_detected"]
};
}
if (hasInventoryPurchaseDocumentsSignalV2(text)) {
return {
intent: "inventory_purchase_documents_for_item",

View File

@ -2879,6 +2879,19 @@ class AddressQueryService {
const broadenedFactual = (0, composeStage_1.composeFactualReply)(intent.intent, broadenedFilteredRows, composeOptionsFromFilters(autoBroadenedFilters));
const broadenedLimitations = [...filters.warnings, "period_window_auto_broadened_to_available_data"];
const broadenedReasons = [...baseReasons, "period_window_auto_broadened_to_available_data"];
const broadenedResultSemantics = mergeAddressResultSemantics(deriveAddressResultSemantics({
intent: intent.intent,
selectedRecipe: broadenedSelection.selected_recipe.recipe_id,
filters: filters.extracted_filters,
responseType: broadenedFactual.responseType,
rowsMatched: broadenedFilteredRows.length
}), broadenedFactual.semantics);
const broadenedRouteExpectationAudit = buildRouteExpectationAudit({
intent: routeExpectationIntent,
selectedRecipe: broadenedSelection.selected_recipe.recipe_id,
requestedResultMode,
resultMode: broadenedResultSemantics.result_mode
});
return {
handled: true,
reply_text: injectNoticeAfterLeadLine(broadenedFactual.text, broadenedPrefix),
@ -2921,13 +2934,20 @@ class AddressQueryService {
runtime_readiness: "LIVE_QUERYABLE_WITH_LIMITS",
limited_reason_category: null,
response_type: broadenedFactual.responseType,
...mergeAddressResultSemantics(deriveAddressResultSemantics({
intent: intent.intent,
selectedRecipe: broadenedSelection.selected_recipe.recipe_id,
filters: filters.extracted_filters,
responseType: broadenedFactual.responseType,
rowsMatched: broadenedFilteredRows.length
}), broadenedFactual.semantics),
capability_id: capabilityAudit.capabilityId,
capability_layer: capabilityAudit.layer,
capability_route_mode: capabilityAudit.routeMode,
capability_route_enabled: capabilityAudit.enabled,
capability_route_reason: capabilityAudit.reason,
shadow_route_intent: shadowRouteAudit.intent,
shadow_route_selected_recipe: shadowRouteAudit.selectedRecipe,
shadow_route_status: shadowRouteAudit.status,
route_expectation_status: broadenedRouteExpectationAudit.status,
route_expectation_reason: broadenedRouteExpectationAudit.reason,
route_expectation_expected_selected_recipes: broadenedRouteExpectationAudit.expectedSelectedRecipes,
route_expectation_expected_requested_result_modes: broadenedRouteExpectationAudit.expectedRequestedResultModes,
route_expectation_expected_result_modes: broadenedRouteExpectationAudit.expectedResultModes,
...broadenedResultSemantics,
limitations: broadenedLimitations,
reasons: withConfirmedBalanceFallbackReason(broadenedReasons, requestedResultMode, broadenedFactual.semantics)
}

View File

@ -244,6 +244,24 @@ function mapCounterpartyIntentToContractIntent(intent) {
}
return null;
}
function isInventoryIntent(intent) {
return (intent === "inventory_on_hand_as_of_date" ||
intent === "inventory_purchase_provenance_for_item" ||
intent === "inventory_purchase_documents_for_item" ||
intent === "inventory_supplier_stock_overlap_as_of_date" ||
intent === "inventory_sale_trace_for_item" ||
intent === "inventory_purchase_to_sale_chain" ||
intent === "inventory_aging_by_purchase_date");
}
function hasSelectedObjectInventorySignal(text) {
return /(?:по\s+выбранному\s+объекту|for\s+selected\s+object)/iu.test(String(text ?? ""));
}
function hasInventorySupplierFollowupCue(text) {
return /(?:кто\s+(?:(?:это|этот\s+товар|эту\s+позицию)\s+)?(?:нам\s+)?поставил|кто\s+(?:нам\s+)?поставил\s+(?:это|этот\s+товар|эту\s+позицию)|от\s+какого\s+поставщика|у\s+какого\s+поставщика|от\s+кого\s+куплен|supplier|vendor|поставщик)/iu.test(String(text ?? ""));
}
function hasInventoryPurchaseDocumentsFollowupCue(text) {
return /(?:по\s+каким\s+документам\s+(?:это|его|этот\s+товар|эту\s+позицию)\s+купили|по\s+каким\s+документам\s+(?:был\s+)?куплен|какими\s+документами\s+(?:это|его|этот\s+товар|эту\s+позицию)\s+купили|какими\s+документами\s+(?:был\s+)?куплен|purchase\s+documents|documents\s+of\s+purchase|through\s+which\s+documents)/iu.test(String(text ?? ""));
}
function hasAddressFollowupContextSignal(text) {
const normalized = String(text ?? "").trim();
if (!normalized) {
@ -612,6 +630,32 @@ function deriveIntentWithFollowupContext(detectedIntent, userMessage, followupCo
reasons: [...detectedIntent.reasons, "intent_adjusted_to_balance_followup_context"]
};
}
const previousIsInventoryFamily = isInventoryIntent(previousIntent);
const inventorySelectedObjectFollowup = hasSelectedObjectInventorySignal(normalizedMessage) || (previousIsInventoryFamily && hasFollowupSignal);
if (inventorySelectedObjectFollowup && hasInventorySupplierFollowupCue(normalizedMessage)) {
if (detectedIntent.intent === "unknown" ||
detectedIntent.intent === "inventory_on_hand_as_of_date" ||
detectedIntent.intent === previousIntent) {
return {
intent: "inventory_purchase_provenance_for_item",
confidence: "low",
reasons: [...detectedIntent.reasons, "intent_adjusted_to_inventory_followup_context"]
};
}
}
if (inventorySelectedObjectFollowup && hasInventoryPurchaseDocumentsFollowupCue(normalizedMessage)) {
if (detectedIntent.intent === "unknown" ||
detectedIntent.intent === "list_documents_by_counterparty" ||
detectedIntent.intent === "list_documents_by_contract" ||
detectedIntent.intent === "inventory_on_hand_as_of_date" ||
detectedIntent.intent === previousIntent) {
return {
intent: "inventory_purchase_documents_for_item",
confidence: "low",
reasons: [...detectedIntent.reasons, "intent_adjusted_to_inventory_followup_context"]
};
}
}
if (hasPreviousContract) {
if (detectedIntent.intent === "list_contracts_by_counterparty") {
if (hasBankSignal(normalizedMessage)) {

View File

@ -1603,6 +1603,28 @@ function hasInventorySaleTraceSignal(text: string): boolean {
);
}
function hasSelectedObjectInventoryCue(text: string): boolean {
return /(?:по\s+выбранному\s+объекту|selected\s+object)/iu.test(text);
}
function hasSelectedObjectInventoryProvenanceSignal(text: string): boolean {
return (
hasSelectedObjectInventoryCue(text) &&
/(?:кто\s+(?:(?:это|этот\s+товар|эту\s+позицию)\s+)?(?:нам\s+)?поставил|кто\s+(?:нам\s+)?поставил\s+(?:это|этот\s+товар|эту\s+позицию)|от\s+какого\s+поставщика|у\s+какого\s+поставщика|от\s+кого\s+куплен|supplier|vendor|поставщик)/iu.test(
text
)
);
}
function hasSelectedObjectInventoryPurchaseDocumentsSignal(text: string): boolean {
return (
hasSelectedObjectInventoryCue(text) &&
/(?:по\s+каким\s+документам\s+(?:это|его|этот\s+товар|эту\s+позицию)\s+купили|по\s+каким\s+документам\s+(?:был\s+)?куплен|какими\s+документами\s+(?:это|его|этот\s+товар|эту\s+позицию)\s+купили|какими\s+документами\s+(?:был\s+)?куплен|purchase\s+documents|documents\s+of\s+purchase|through\s+which\s+documents)/iu.test(
text
)
);
}
function hasInventoryProvenanceSignalV2(text: string): boolean {
const hasItemCue = /(?:товар|номенклатур|sku|item|product|остат(?:ок|ки)|склад)/iu.test(text);
const hasSupplierCue =
@ -1871,6 +1893,14 @@ export function resolveAddressIntent(userMessage: string): AddressIntentResoluti
};
}
if (hasSelectedObjectInventoryProvenanceSignal(text)) {
return {
intent: "inventory_purchase_provenance_for_item",
confidence: "medium",
reasons: ["inventory_selected_object_provenance_signal_detected"]
};
}
if (hasInventoryProvenanceSignalV2(text)) {
return {
intent: "inventory_purchase_provenance_for_item",
@ -1887,6 +1917,14 @@ export function resolveAddressIntent(userMessage: string): AddressIntentResoluti
};
}
if (hasSelectedObjectInventoryPurchaseDocumentsSignal(text)) {
return {
intent: "inventory_purchase_documents_for_item",
confidence: "medium",
reasons: ["inventory_selected_object_purchase_documents_signal_detected"]
};
}
if (hasInventoryPurchaseDocumentsSignalV2(text)) {
return {
intent: "inventory_purchase_documents_for_item",

View File

@ -3498,6 +3498,22 @@ export class AddressQueryService {
);
const broadenedLimitations = [...filters.warnings, "period_window_auto_broadened_to_available_data"];
const broadenedReasons = [...baseReasons, "period_window_auto_broadened_to_available_data"];
const broadenedResultSemantics = mergeAddressResultSemantics(
deriveAddressResultSemantics({
intent: intent.intent,
selectedRecipe: broadenedSelection.selected_recipe.recipe_id,
filters: filters.extracted_filters,
responseType: broadenedFactual.responseType,
rowsMatched: broadenedFilteredRows.length
}),
broadenedFactual.semantics
);
const broadenedRouteExpectationAudit = buildRouteExpectationAudit({
intent: routeExpectationIntent,
selectedRecipe: broadenedSelection.selected_recipe.recipe_id,
requestedResultMode,
resultMode: broadenedResultSemantics.result_mode
});
return {
handled: true,
reply_text: injectNoticeAfterLeadLine(broadenedFactual.text, broadenedPrefix),
@ -3540,16 +3556,21 @@ export class AddressQueryService {
runtime_readiness: "LIVE_QUERYABLE_WITH_LIMITS",
limited_reason_category: null,
response_type: broadenedFactual.responseType,
...mergeAddressResultSemantics(
deriveAddressResultSemantics({
intent: intent.intent,
selectedRecipe: broadenedSelection.selected_recipe.recipe_id,
filters: filters.extracted_filters,
responseType: broadenedFactual.responseType,
rowsMatched: broadenedFilteredRows.length
}),
broadenedFactual.semantics
),
capability_id: capabilityAudit.capabilityId,
capability_layer: capabilityAudit.layer,
capability_route_mode: capabilityAudit.routeMode,
capability_route_enabled: capabilityAudit.enabled,
capability_route_reason: capabilityAudit.reason,
shadow_route_intent: shadowRouteAudit.intent,
shadow_route_selected_recipe: shadowRouteAudit.selectedRecipe,
shadow_route_status: shadowRouteAudit.status,
route_expectation_status: broadenedRouteExpectationAudit.status,
route_expectation_reason: broadenedRouteExpectationAudit.reason,
route_expectation_expected_selected_recipes: broadenedRouteExpectationAudit.expectedSelectedRecipes,
route_expectation_expected_requested_result_modes:
broadenedRouteExpectationAudit.expectedRequestedResultModes,
route_expectation_expected_result_modes: broadenedRouteExpectationAudit.expectedResultModes,
...broadenedResultSemantics,
limitations: broadenedLimitations,
reasons: withConfirmedBalanceFallbackReason(
broadenedReasons,

View File

@ -306,6 +306,34 @@ function mapCounterpartyIntentToContractIntent(intent: AddressIntent): AddressIn
return null;
}
function isInventoryIntent(intent: AddressIntent | undefined): boolean {
return (
intent === "inventory_on_hand_as_of_date" ||
intent === "inventory_purchase_provenance_for_item" ||
intent === "inventory_purchase_documents_for_item" ||
intent === "inventory_supplier_stock_overlap_as_of_date" ||
intent === "inventory_sale_trace_for_item" ||
intent === "inventory_purchase_to_sale_chain" ||
intent === "inventory_aging_by_purchase_date"
);
}
function hasSelectedObjectInventorySignal(text: string): boolean {
return /(?:по\s+выбранному\s+объекту|for\s+selected\s+object)/iu.test(String(text ?? ""));
}
function hasInventorySupplierFollowupCue(text: string): boolean {
return /(?:кто\s+(?:(?:это|этот\s+товар|эту\s+позицию)\s+)?(?:нам\s+)?поставил|кто\s+(?:нам\s+)?поставил\s+(?:это|этот\s+товар|эту\s+позицию)|от\s+какого\s+поставщика|у\s+какого\s+поставщика|от\s+кого\s+куплен|supplier|vendor|поставщик)/iu.test(
String(text ?? "")
);
}
function hasInventoryPurchaseDocumentsFollowupCue(text: string): boolean {
return /(?:по\s+каким\s+документам\s+(?:это|его|этот\s+товар|эту\s+позицию)\s+купили|по\s+каким\s+документам\s+(?:был\s+)?куплен|какими\s+документами\s+(?:это|его|этот\s+товар|эту\s+позицию)\s+купили|какими\s+документами\s+(?:был\s+)?куплен|purchase\s+documents|documents\s+of\s+purchase|through\s+which\s+documents)/iu.test(
String(text ?? "")
);
}
export function hasAddressFollowupContextSignal(text: string): boolean {
const normalized = String(text ?? "").trim();
if (!normalized) {
@ -752,6 +780,39 @@ function deriveIntentWithFollowupContext(
};
}
const previousIsInventoryFamily = isInventoryIntent(previousIntent);
const inventorySelectedObjectFollowup =
hasSelectedObjectInventorySignal(normalizedMessage) || (previousIsInventoryFamily && hasFollowupSignal);
if (inventorySelectedObjectFollowup && hasInventorySupplierFollowupCue(normalizedMessage)) {
if (
detectedIntent.intent === "unknown" ||
detectedIntent.intent === "inventory_on_hand_as_of_date" ||
detectedIntent.intent === previousIntent
) {
return {
intent: "inventory_purchase_provenance_for_item",
confidence: "low",
reasons: [...detectedIntent.reasons, "intent_adjusted_to_inventory_followup_context"]
};
}
}
if (inventorySelectedObjectFollowup && hasInventoryPurchaseDocumentsFollowupCue(normalizedMessage)) {
if (
detectedIntent.intent === "unknown" ||
detectedIntent.intent === "list_documents_by_counterparty" ||
detectedIntent.intent === "list_documents_by_contract" ||
detectedIntent.intent === "inventory_on_hand_as_of_date" ||
detectedIntent.intent === previousIntent
) {
return {
intent: "inventory_purchase_documents_for_item",
confidence: "low",
reasons: [...detectedIntent.reasons, "intent_adjusted_to_inventory_followup_context"]
};
}
}
if (hasPreviousContract) {
if (detectedIntent.intent === "list_contracts_by_counterparty") {
if (hasBankSignal(normalizedMessage)) {

View File

@ -103,6 +103,8 @@ describe("inventory selected-object follow-up", () => {
expect(result?.debug.extracted_filters?.as_of_date).toBe("2021-03-31");
expect(result?.debug.extracted_filters?.period_from).toBe("2021-03-01");
expect(result?.debug.extracted_filters?.period_to).toBe("2021-03-31");
expect(result?.debug.capability_id).toBe("inventory_inventory_purchase_provenance_for_item");
expect(result?.debug.capability_route_mode).toBe("exact");
expect(result?.debug.reasons).toContain("period_window_auto_broadened_to_available_data");
expect(result?.debug.limitations).toContain("period_window_auto_broadened_to_available_data");
const replyLines = String(result?.reply_text ?? "").split("\n");
@ -111,4 +113,97 @@ describe("inventory selected-object follow-up", () => {
expect(replyLines[1]).toContain("По окну 2021-03-01..2021-03-31 строк не найдено");
expect(executeAddressMcpQueryMock).toHaveBeenCalledTimes(2);
});
it("handles selected-object supplier slang 'кто это поставил нам' as provenance follow-up", async () => {
executeAddressMcpQueryMock.mockResolvedValueOnce({
fetched_rows: 1,
matched_rows: 1,
raw_rows: [
{
Period: "2019-02-11T00:00:00Z",
Registrator: "Поступление товаров и услуг 00000000077 от 11.02.2019 0:00:00",
AccountDt: "41.01",
AccountKt: "60.01",
Amount: 3724.17,
SubcontoDt1: "Столешница 600*3050*26 дуб ниагара",
SubcontoDt3: "Основной склад",
SubcontoKt1: "Торговый дом \\Союз МСК\\",
SubcontoKt2: "Договор поставки № 12 от 01.02.2019",
Organization: "ООО \\Альтернатива Плюс\\"
}
],
rows: [],
error: null
});
const service = new AddressQueryService();
const result = await service.tryHandle('По выбранному объекту "Столешница 600*3050*26 дуб ниагара": кто это поставил нам', {
followupContext: {
previous_intent: "inventory_on_hand_as_of_date",
previous_filters: {
as_of_date: "2019-03-31",
period_from: "2019-03-01",
period_to: "2019-03-31",
warehouse: "Основной склад",
organization: "ООО \\Альтернатива Плюс\\"
},
previous_anchor_type: "unknown",
previous_anchor_value: null
}
});
expect(result?.handled).toBe(true);
expect(result?.response_type).toBe("FACTUAL_SUMMARY");
expect(result?.debug.detected_intent).toBe("inventory_purchase_provenance_for_item");
expect(result?.debug.extracted_filters?.item).toBe("Столешница 600*3050*26 дуб ниагара");
expect(result?.debug.extracted_filters?.as_of_date).toBe("2019-03-31");
expect(String(result?.reply_text ?? "")).toContain("Торговый дом \\Союз МСК\\");
});
it("handles selected-object purchase-doc slang 'по каким документам это купили' as exact purchase-doc follow-up", async () => {
executeAddressMcpQueryMock.mockResolvedValueOnce({
fetched_rows: 1,
matched_rows: 1,
raw_rows: [
{
Period: "2019-02-11T00:00:00Z",
Registrator: "Поступление товаров и услуг 00000000077 от 11.02.2019 0:00:00",
AccountDt: "41.01",
AccountKt: "60.01",
Amount: 3724.17,
SubcontoDt1: "Столешница 600*3050*26 дуб ниагара",
SubcontoDt3: "Основной склад",
SubcontoKt1: "Торговый дом \\Союз МСК\\",
SubcontoKt2: "Договор поставки № 12 от 01.02.2019",
Organization: "ООО \\Альтернатива Плюс\\"
}
],
rows: [],
error: null
});
const service = new AddressQueryService();
const result = await service.tryHandle('По выбранному объекту "Столешница 600*3050*26 дуб ниагара": по каким документам это купили', {
followupContext: {
previous_intent: "inventory_purchase_provenance_for_item",
previous_filters: {
as_of_date: "2019-03-31",
period_from: "2019-03-01",
period_to: "2019-03-31",
item: "Столешница 600*3050*26 дуб ниагара",
warehouse: "Основной склад"
},
previous_anchor_type: "unknown",
previous_anchor_value: null
}
});
expect(result?.handled).toBe(true);
expect(result?.response_type).toBe("FACTUAL_LIST");
expect(result?.debug.detected_intent).toBe("inventory_purchase_documents_for_item");
expect(result?.debug.selected_recipe).toBe("address_inventory_purchase_documents_for_item_v1");
expect(result?.debug.extracted_filters?.item).toBe("Столешница 600*3050*26 дуб ниагара");
expect(result?.debug.extracted_filters?.as_of_date).toBe("2019-03-31");
expect(String(result?.reply_text ?? "")).toContain("Поступление товаров и услуг 00000000077");
});
});

View File

@ -173,6 +173,14 @@ describe("address query shape classifier", () => {
expect(filters.item).toBe("Кромка с клеем 33 альмандин 137 м");
});
it("extracts item anchor from selected-object purchase-doc follow-up without explicit word товар", () => {
const filters = extractAddressFilters(
'По выбранному объекту "Столешница 600*3050*26 дуб ниагара": по каким документам это купили',
"inventory_purchase_documents_for_item"
).extracted_filters;
expect(filters.item).toBe("Столешница 600*3050*26 дуб ниагара");
});
it("keeps colloquial selected-object supplier follow-up in inventory provenance intent", () => {
const mode = detectAddressQuestionMode(
'По выбранному объекту "Кромка с клеем 33 альмандин 137 м": кто поставил этот товар'
@ -184,6 +192,28 @@ describe("address query shape classifier", () => {
expect(result.intent).toBe("inventory_purchase_provenance_for_item");
});
it("keeps selected-object supplier slang with 'кто это поставил нам' in inventory provenance intent", () => {
const mode = detectAddressQuestionMode(
'По выбранному объекту "Столешница 600*3050*26 дуб ниагара": кто это поставил нам'
);
const result = resolveAddressIntent(
'По выбранному объекту "Столешница 600*3050*26 дуб ниагара": кто это поставил нам'
);
expect(mode.mode).toBe("address_query");
expect(result.intent).toBe("inventory_purchase_provenance_for_item");
});
it("keeps selected-object purchase-doc slang with 'по каким документам это купили' in purchase-doc intent", () => {
const mode = detectAddressQuestionMode(
'По выбранному объекту "Столешница 600*3050*26 дуб ниагара": по каким документам это купили'
);
const result = resolveAddressIntent(
'По выбранному объекту "Столешница 600*3050*26 дуб ниагара": по каким документам это купили'
);
expect(mode.mode).toBe("address_query");
expect(result.intent).toBe("inventory_purchase_documents_for_item");
});
it("keeps full supplier anchor with comma suffix for stock-overlap questions", () => {
const filters = extractAddressFilters(
"Какие товары от поставщика Гамма-мебель, ООО сейчас еще лежат на складе Основной склад?",
@ -3874,6 +3904,49 @@ describe("address query limited taxonomy and stage diagnostics", { timeout: 1500
});
describe("address decompose stage follow-up carryover", () => {
it("promotes selected-object supplier slang follow-up into inventory provenance with inherited date context", () => {
const result = runAddressDecomposeStage('По выбранному объекту "Столешница 600*3050*26 дуб ниагара": кто это поставил нам', {
previous_intent: "inventory_on_hand_as_of_date",
previous_filters: {
as_of_date: "2019-03-31",
period_from: "2019-03-01",
period_to: "2019-03-31",
warehouse: "Основной склад"
},
previous_anchor_type: "unknown",
previous_anchor_value: null
});
expect(result).not.toBeNull();
expect(result?.intent.intent).toBe("inventory_purchase_provenance_for_item");
expect(result?.filters.extracted_filters.as_of_date).toBe("2019-03-31");
expect(
result?.baseReasons?.includes("intent_adjusted_to_inventory_followup_context") ||
result?.intent.reasons.includes("inventory_selected_object_provenance_signal_detected")
).toBe(true);
});
it("promotes selected-object purchase-doc slang follow-up into inventory purchase documents with inherited date context", () => {
const result = runAddressDecomposeStage('По выбранному объекту "Столешница 600*3050*26 дуб ниагара": по каким документам это купили', {
previous_intent: "inventory_purchase_provenance_for_item",
previous_filters: {
as_of_date: "2019-03-31",
period_from: "2019-03-01",
period_to: "2019-03-31",
item: "Столешница 600*3050*26 дуб ниагара"
},
previous_anchor_type: "unknown",
previous_anchor_value: null
});
expect(result).not.toBeNull();
expect(result?.intent.intent).toBe("inventory_purchase_documents_for_item");
expect(result?.filters.extracted_filters.item).toBe("Столешница 600*3050*26 дуб ниагара");
expect(result?.filters.extracted_filters.as_of_date).toBe("2019-03-31");
expect(
result?.baseReasons?.includes("intent_adjusted_to_inventory_followup_context") ||
result?.intent.reasons.includes("inventory_selected_object_purchase_documents_signal_detected")
).toBe(true);
});
it("keeps slang all-customers-all-time wording in address lane via resolved intent fallback", () => {
const result = runAddressDecomposeStage("выведи всех заков за все время", null);
expect(result).not.toBeNull();

View File

@ -2120,6 +2120,7 @@ def build_analyst_loop_prompt(
- `.codex/agents/domain_analyst.toml`
- `.codex/skills/domain-case-loop/SKILL.md`
- `.codex/skills/domain-case-loop/references/verdict_template.md`
- `.codex/skills/domain-case-loop/references/business_first_analyst_rubric.md`
Current loop context:
- loop_dir: `{loop_dir}`
@ -2135,11 +2136,13 @@ def build_analyst_loop_prompt(
Goal:
- evaluate current domain-pack correctness for business meaning, route/capability quality, evidence quality, and absence of silent heuristic masking;
- evaluate business usefulness, direct-answer-first behavior, state continuity, and field truthfulness, not only technical groundedness;
- determine whether the gate `quality_score >= {target_score}` is reached;
- if not, provide the smallest high-value fix targets for the coder.
Rules:
- `accepted` is allowed only if quality_score >= {target_score}, unresolved_p0_count = 0, and regression_detected = false;
- `accepted` also requires `direct_answer_ok = true` and `business_usefulness_ok = true`;
- `partial` means the pack is usable but exactness, routing, or coverage is still insufficient;
- `needs_exact_capability` means the primary blocker is a missing exact route or capability, but the loop should still continue autonomously unless a user decision is required;
- `continue` means there is a clear next patch cycle;
@ -2152,6 +2155,10 @@ def build_analyst_loop_prompt(
- if `requires_user_decision = true`, fill `user_decision_type` and `user_decision_prompt`;
- if the pack is below {target_score} but there is still safe autonomous implementation work, keep `requires_user_decision = false`;
- do not request user input merely because the score is still below {target_score}; request it only when the loop would otherwise guess, overfit, or risk architecture drift.
- return machine-readable fields for: `user_intent_summary`, `expected_direct_answer`, `actual_direct_answer`, `direct_answer_ok`, `business_usefulness_ok`, `business_utility_score`, `direct_answer_priority_score`, `state_continuity_score`, `answer_shape_score`, `evidence_clarity_score`, `root_cause_layers`, `broken_edge_ids`, `violated_invariants`;
- if the product found the evidence but failed to retain the selected object, provenance bundle, or another reusable resolved object across turns, classify that as `object_memory_gap` or `edge_carryover_gap`, not as a generic route problem;
- if the surfaced business field looks mislabeled, for example supplier vs organization, classify that as `field_mapping_gap`;
- if the answer is technically grounded but still weak for a manager/accountant/operator, classify that as `business_utility_gap`.
Use this UTF-8 evidence bundle as the source of truth for artifact contents. Do not treat shell rendering artifacts as file corruption if the embedded bundle is readable.
@ -2196,6 +2203,9 @@ def build_coder_loop_prompt(
- do not present heuristic answers as confirmed;
- do not touch unrelated files;
- preserve already successful baseline flows.
- use `root_cause_layers`, `broken_edge_ids`, `violated_invariants`, and business-utility scores from the analyst verdict to choose the smallest fix;
- prioritize state continuity, selected-object persistence, direct-answer-first behavior, and field-truth mapping when those are the blocking layers;
- do not broaden scope when the analyst says the defect is mainly `object_memory_gap`, `field_mapping_gap`, `answer_shape_mismatch`, or `business_utility_gap`.
Required outputs:
- create `{iteration_dir / 'coder_plan.md'}` with a short plan;
@ -2217,12 +2227,21 @@ def evaluate_analyst_gate(
quality_score = int(verdict.get("quality_score") or 0)
unresolved_p0_count = int(verdict.get("unresolved_p0_count") or 0)
regression_detected = bool(verdict.get("regression_detected"))
direct_answer_ok = bool(verdict.get("direct_answer_ok", True))
business_usefulness_ok = bool(verdict.get("business_usefulness_ok", True))
loop_decision = str(verdict.get("loop_decision") or "").strip() or "continue"
requires_user_decision = bool(verdict.get("requires_user_decision"))
user_decision_type = str(verdict.get("user_decision_type") or "").strip() or "none"
user_decision_prompt_raw = verdict.get("user_decision_prompt")
user_decision_prompt = str(user_decision_prompt_raw).strip() if user_decision_prompt_raw else None
accepted = quality_score >= target_score and unresolved_p0_count == 0 and not regression_detected and loop_decision == "accepted"
accepted = (
quality_score >= target_score
and unresolved_p0_count == 0
and not regression_detected
and direct_answer_ok
and business_usefulness_ok
and loop_decision == "accepted"
)
return accepted, loop_decision, requires_user_decision, user_decision_type, user_decision_prompt