# Scenario-tree acceptance canon ## Core idea For follow-up-heavy business domains, the unit of acceptance is not a flat list of isolated questions. The unit of acceptance is a **scenario tree**: - a root business question; - one or more critical child drilldowns; - explicit transitions between steps; - explicit semantic carryover between steps. If the root works but a critical child transition breaks, the domain is **not** hardened. ## Business-first framing Every accepted node or edge must be both technically grounded and business-useful. This means: - the direct answer is surfaced first when the user asked a direct lookup question; - the answer stays on the requested business object; - evidence and caveats support the answer instead of replacing it; - field labels are truthful for business entities such as supplier, buyer, organization, warehouse, and document. ## Model the domain as a tree For each scenario, define: - `root node` - `critical child nodes` - `critical edges` - `primary user path` Example for inventory: - root: stock snapshot on date - child: selected item -> supplier provenance - child: selected item -> purchase documents - child: selected item -> aging on the same date - child: selected item -> sale trace - child: selected item -> pronoun follow-up purchase documents The primary user path is the path a real user is most likely to take first, not the prettiest canonical wording. ## Node acceptance A node is considered covered only if all of these are true: - the business meaning is understood correctly; - the expected intent / capability is selected; - the answer shape matches the requested business object; - the answer begins with a direct user-facing answer when such an answer is expected; - the answer is evidence-backed rather than heuristic-masked; - the surfaced business fields are truthful and not mislabeled. Examples: - asking for supplier provenance must answer with the supplier first, not only with raw documents; - asking for old stock must answer with item-level old-stock positions, not with a raw document dump; - asking for residues/items/contracts must not silently downgrade to lower-level movements. ## Edge acceptance Each critical edge must define its required carryover invariants. Typical invariants: - selected object survives from previous assistant output - stable focus object survives as the active business object - originating date / period survives into follow-up filters - warehouse survives if the follow-up still targets the same stock slice - organization survives if the previous slice was organization-bound - route family remains in the same business contour unless the user clearly changed intent - reusable resolved-object state survives when the previous turn already answered a closely related lookup - pronoun references can reuse the active focus object when the wording supports it - follow-up action resolution stays on the same business object, for example item -> purchase documents rather than counterparty -> documents If an edge loses a required invariant, that is a real regression even if the target node works in isolation. ## Resolved answer-object continuity For follow-up-heavy domains, the analyst should treat resolved business objects as reusable state, not as disposable one-turn artifacts. Examples: - selected inventory item - resolved supplier provenance bundle - resolved buyer bundle - resolved purchase document bundle If turn N already resolved such an object and turn N+1 asks a natural follow-up about the same object, the system should reuse that state instead of demanding the same anchor again. If turn N already resolved supplier/date/document provenance and turn N+1 asks for one adjacent field such as `когда купили ее` or `покажи документы по этой позиции`, the system should prefer bundle reuse before re-entering a broad generic router. ## Mandatory paraphrase families Every critical node or edge must be validated in a small paraphrase family instead of one curated wording only. Minimum family: - `canonical` - `colloquial` - `ui_selected_object` - `pronoun_followup` when the UX already established a selected object or active item If canonical works but colloquial, UI-generated, or pronoun-only follow-up fails, the node/edge is not accepted. ## Acceptance matrix The analyst must produce or update a `scenario_acceptance_matrix.md` artifact for every multi-step scenario or pack. Minimum matrix columns: - scenario id - node id or edge id - user path role (`root`, `critical_child`, `supporting`) - wording family (`canonical`, `colloquial`, `ui_selected_object`) - expected business meaning - expected intent - expected capability / recipe - required carryover invariants - expected answer shape - expected direct answer - business usefulness expectation - actual outcome - status (`pass`, `partial`, `fail`) - defect class ## Defect classes Use these classes explicitly: - `semantic_understanding_gap` - `edge_carryover_gap` - `object_memory_gap` - `followup_action_resolution_gap` - `bundle_reuse_gap` - `field_mapping_gap` - `answer_shape_mismatch` - `ordering_semantics_mismatch` - `runtime_capability_gap` - `business_utility_gap` - `domain_anchor_gap` - `loop_coverage_gap` Definitions: - `semantic_understanding_gap`: the system did not understand the real user meaning - `edge_carryover_gap`: the follow-up lost date / object / scope across steps - `object_memory_gap`: the system resolved the object once but failed to retain it for the next follow-up - `followup_action_resolution_gap`: the system kept the business object but resolved the wrong action over that object, for example item-follow-up -> counterparty-documents - `bundle_reuse_gap`: the system resolved a reusable supplier/date/document bundle once but failed to reuse it for an adjacent follow-up - `field_mapping_gap`: the answer surfaced the wrong business field or mislabeled a field - `answer_shape_mismatch`: the business object in the answer does not match the requested object - `ordering_semantics_mismatch`: ranking / chronology semantics are wrong - `runtime_capability_gap`: the product contour truly lacks the route / intent / capability / extractor / recipe - `business_utility_gap`: the answer may be grounded but is still not useful as a user-facing result - `domain_anchor_gap`: the scenario uses a weak or wrong observed anchor, so the tree is semantically mis-specified - `loop_coverage_gap`: the runtime could support the path or nearly support it, but the analyst/orchestrator never treated that path as mandatory acceptance coverage ## Analyst responsibilities The analyst must: - review the scenario tree, not just individual turns; - compare expected and actual user path transitions; - call out broken edges explicitly; - verify colloquial and UI-generated variants as first-class coverage; - verify direct-answer-first behavior where the user asked a direct lookup question; - verify business usefulness explicitly, not only technical validity; - verify field truthfulness for surfaced supplier / buyer / organization labels; - verify selected-object continuity and reusable object memory; - verify focus-object continuity, pronoun follow-up continuity, and follow-up action resolution on the active business object; - verify answer granularity and ordering semantics; - lower the score when any critical edge or paraphrase family is broken. ## Orchestrator responsibilities The orchestrator must: - define the tree before iterating deeply; - prioritize the primary user path first; - rerun at least one colloquial variant and one UI-selected-object variant for each critical branch; - rerun at least one short pronoun follow-up such as `по ней` / `по этой позиции` when the product UX already established a selected object; - treat a broken critical edge as an unfinished scenario even if the root node works; - route coder work to the narrowest broken edge or node rather than issuing broad “improve the domain” tasks. ## Stop and acceptance rules Do not accept a domain when: - only the root node works; - only one curated phrasing works; - selected-object follow-up is broken; - pronoun-only selected-object follow-up is broken or misrouted to another business object; - `на эту дату` / `на ту дату` loses the originating date; - the answer shape is wrong for the business question; - chronology / ranking semantics are inverted; - the direct answer is not surfaced first on direct lookup questions; - the answer is technically grounded but still business-useless. Accepted requires: - score >= 80 - no unresolved P0 - critical path edges pass - canonical + colloquial + UI-selected-object variants pass for critical branches - no silent heuristic masking - `direct_answer_ok = true` - `business_usefulness_ok = true`