5.4 KiB
Scenario-tree acceptance canon
Core idea
For follow-up-heavy business domains, the unit of acceptance is not a flat list of isolated questions.
The unit of acceptance is a scenario tree:
- a root business question;
- one or more critical child drilldowns;
- explicit transitions between steps;
- explicit semantic carryover between steps.
If the root works but a critical child transition breaks, the domain is not hardened.
Model the domain as a tree
For each scenario, define:
root nodecritical child nodescritical edgesprimary user path
Example for inventory:
- root: stock snapshot on date
- child: selected item -> supplier provenance
- child: selected item -> purchase documents
- child: selected item -> aging on the same date
- child: selected item -> sale trace
The primary user path is the path a real user is most likely to take first, not the prettiest canonical wording.
Node acceptance
A node is considered covered only if all of these are true:
- the business meaning is understood correctly;
- the expected intent / capability is selected;
- the answer shape matches the requested business object;
- the answer begins with a direct user-facing answer when such an answer is expected;
- the answer is evidence-backed rather than heuristic-masked.
Examples:
- asking for supplier provenance must answer with the supplier first, not only with raw documents;
- asking for old stock must answer with item-level old-stock positions, not with a raw document dump;
- asking for residues/items/contracts must not silently downgrade to lower-level movements.
Edge acceptance
Each critical edge must define its required carryover invariants.
Typical invariants:
- selected object survives from previous assistant output
- originating date / period survives into follow-up filters
- warehouse survives if the follow-up still targets the same stock slice
- organization survives if the previous slice was organization-bound
- route family remains in the same business contour unless the user clearly changed intent
If an edge loses a required invariant, that is a real regression even if the target node works in isolation.
Mandatory paraphrase families
Every critical node or edge must be validated in a small paraphrase family instead of one curated wording only.
Minimum family:
canonicalcolloquialui_selected_object
Examples:
- canonical:
От какого поставщика куплен товар X - colloquial:
Кто поставил этот товар - ui_selected_object:
По выбранному объекту "X": кто это поставил нам
If canonical works but colloquial or UI-generated follow-up fails, the node/edge is not accepted.
Acceptance matrix
The analyst must produce or update a scenario_acceptance_matrix.md artifact for every multi-step scenario or pack.
Minimum matrix columns:
- scenario id
- node id or edge id
- user path role (
root,critical_child,supporting) - wording family (
canonical,colloquial,ui_selected_object) - expected business meaning
- expected intent
- expected capability / recipe
- required carryover invariants
- expected answer shape
- actual outcome
- status (
pass,partial,fail) - defect class
Defect classes
Use these classes explicitly:
semantic_understanding_gapedge_carryover_gapanswer_shape_mismatchordering_semantics_mismatchruntime_capability_gaploop_coverage_gap
Definitions:
semantic_understanding_gap: the system did not understand the real user meaningedge_carryover_gap: the follow-up lost date / object / scope across stepsanswer_shape_mismatch: the business object in the answer does not match the requested objectordering_semantics_mismatch: ranking / chronology semantics are wrongruntime_capability_gap: the product contour truly lacks the route / intent / capability / extractor / recipeloop_coverage_gap: the runtime could support the path or nearly support it, but the analyst/orchestrator never treated that path as mandatory acceptance coverage
Analyst responsibilities
The analyst must:
- review the scenario tree, not just individual turns;
- compare expected and actual user path transitions;
- call out broken edges explicitly;
- verify colloquial and UI-generated variants as first-class coverage;
- verify direct-answer-first behavior where the user asked a direct lookup question;
- verify answer granularity and ordering semantics;
- lower the score when any critical edge or paraphrase family is broken.
Orchestrator responsibilities
The orchestrator must:
- define the tree before iterating deeply;
- prioritize the primary user path first;
- rerun at least one colloquial variant and one UI-selected-object variant for each critical branch;
- treat a broken critical edge as an unfinished scenario even if the root node works;
- route coder work to the narrowest broken edge or node rather than issuing broad “improve the domain” tasks.
Stop and acceptance rules
Do not accept a domain when:
- only the root node works;
- only one curated phrasing works;
- selected-object follow-up is broken;
на эту дату/на ту датуloses the originating date;- the answer shape is wrong for the business question;
- chronology / ranking semantics are inverted.
Accepted requires:
- score >= 80
- no unresolved P0
- critical path edges pass
- canonical + colloquial + UI-selected-object variants pass for critical branches
- no silent heuristic masking