NODEDC_1C/.codex/skills/domain-case-loop/references/scenario_tree_acceptance_ca...

8.7 KiB
Raw Permalink Blame History

Scenario-tree acceptance canon

Core idea

For follow-up-heavy business domains, the unit of acceptance is not a flat list of isolated questions.

The unit of acceptance is a scenario tree:

  • a root business question;
  • one or more critical child drilldowns;
  • explicit transitions between steps;
  • explicit semantic carryover between steps.

If the root works but a critical child transition breaks, the domain is not hardened.

Business-first framing

Every accepted node or edge must be both technically grounded and business-useful.

This means:

  • the direct answer is surfaced first when the user asked a direct lookup question;
  • the answer stays on the requested business object;
  • evidence and caveats support the answer instead of replacing it;
  • field labels are truthful for business entities such as supplier, buyer, organization, warehouse, and document.

Model the domain as a tree

For each scenario, define:

  • root node
  • critical child nodes
  • critical edges
  • primary user path

Example for inventory:

  • root: stock snapshot on date
  • child: selected item -> supplier provenance
  • child: selected item -> purchase documents
  • child: selected item -> aging on the same date
  • child: selected item -> sale trace
  • child: selected item -> pronoun follow-up purchase documents

The primary user path is the path a real user is most likely to take first, not the prettiest canonical wording.

Node acceptance

A node is considered covered only if all of these are true:

  • the business meaning is understood correctly;
  • the expected intent / capability is selected;
  • the answer shape matches the requested business object;
  • the answer begins with a direct user-facing answer when such an answer is expected;
  • the answer is evidence-backed rather than heuristic-masked;
  • the surfaced business fields are truthful and not mislabeled.

Examples:

  • asking for supplier provenance must answer with the supplier first, not only with raw documents;
  • asking for old stock must answer with item-level old-stock positions, not with a raw document dump;
  • asking for residues/items/contracts must not silently downgrade to lower-level movements.

Edge acceptance

Each critical edge must define its required carryover invariants.

Typical invariants:

  • selected object survives from previous assistant output
  • stable focus object survives as the active business object
  • originating date / period survives into follow-up filters
  • warehouse survives if the follow-up still targets the same stock slice
  • organization survives if the previous slice was organization-bound
  • route family remains in the same business contour unless the user clearly changed intent
  • reusable resolved-object state survives when the previous turn already answered a closely related lookup
  • pronoun references can reuse the active focus object when the wording supports it
  • follow-up action resolution stays on the same business object, for example item -> purchase documents rather than counterparty -> documents

If an edge loses a required invariant, that is a real regression even if the target node works in isolation.

Resolved answer-object continuity

For follow-up-heavy domains, the analyst should treat resolved business objects as reusable state, not as disposable one-turn artifacts.

Examples:

  • selected inventory item
  • resolved supplier provenance bundle
  • resolved buyer bundle
  • resolved purchase document bundle

If turn N already resolved such an object and turn N+1 asks a natural follow-up about the same object, the system should reuse that state instead of demanding the same anchor again. If turn N already resolved supplier/date/document provenance and turn N+1 asks for one adjacent field such as когда купили ее or покажи документы по этой позиции, the system should prefer bundle reuse before re-entering a broad generic router.

Mandatory paraphrase families

Every critical node or edge must be validated in a small paraphrase family instead of one curated wording only.

Minimum family:

  • canonical
  • colloquial
  • ui_selected_object
  • pronoun_followup when the UX already established a selected object or active item

If canonical works but colloquial, UI-generated, or pronoun-only follow-up fails, the node/edge is not accepted.

Acceptance matrix

The analyst must produce or update a scenario_acceptance_matrix.md artifact for every multi-step scenario or pack.

Minimum matrix columns:

  • scenario id
  • node id or edge id
  • user path role (root, critical_child, supporting)
  • wording family (canonical, colloquial, ui_selected_object)
  • expected business meaning
  • expected intent
  • expected capability / recipe
  • required carryover invariants
  • expected answer shape
  • expected direct answer
  • business usefulness expectation
  • actual outcome
  • status (pass, partial, fail)
  • defect class

Defect classes

Use these classes explicitly:

  • semantic_understanding_gap
  • edge_carryover_gap
  • object_memory_gap
  • followup_action_resolution_gap
  • bundle_reuse_gap
  • field_mapping_gap
  • answer_shape_mismatch
  • ordering_semantics_mismatch
  • runtime_capability_gap
  • business_utility_gap
  • domain_anchor_gap
  • loop_coverage_gap

Definitions:

  • semantic_understanding_gap: the system did not understand the real user meaning
  • edge_carryover_gap: the follow-up lost date / object / scope across steps
  • object_memory_gap: the system resolved the object once but failed to retain it for the next follow-up
  • followup_action_resolution_gap: the system kept the business object but resolved the wrong action over that object, for example item-follow-up -> counterparty-documents
  • bundle_reuse_gap: the system resolved a reusable supplier/date/document bundle once but failed to reuse it for an adjacent follow-up
  • field_mapping_gap: the answer surfaced the wrong business field or mislabeled a field
  • answer_shape_mismatch: the business object in the answer does not match the requested object
  • ordering_semantics_mismatch: ranking / chronology semantics are wrong
  • runtime_capability_gap: the product contour truly lacks the route / intent / capability / extractor / recipe
  • business_utility_gap: the answer may be grounded but is still not useful as a user-facing result
  • domain_anchor_gap: the scenario uses a weak or wrong observed anchor, so the tree is semantically mis-specified
  • loop_coverage_gap: the runtime could support the path or nearly support it, but the analyst/orchestrator never treated that path as mandatory acceptance coverage

Analyst responsibilities

The analyst must:

  • review the scenario tree, not just individual turns;
  • compare expected and actual user path transitions;
  • call out broken edges explicitly;
  • verify colloquial and UI-generated variants as first-class coverage;
  • verify direct-answer-first behavior where the user asked a direct lookup question;
  • verify business usefulness explicitly, not only technical validity;
  • verify field truthfulness for surfaced supplier / buyer / organization labels;
  • verify selected-object continuity and reusable object memory;
  • verify focus-object continuity, pronoun follow-up continuity, and follow-up action resolution on the active business object;
  • verify answer granularity and ordering semantics;
  • lower the score when any critical edge or paraphrase family is broken.

Orchestrator responsibilities

The orchestrator must:

  • define the tree before iterating deeply;
  • prioritize the primary user path first;
  • rerun at least one colloquial variant and one UI-selected-object variant for each critical branch;
  • rerun at least one short pronoun follow-up such as по ней / по этой позиции when the product UX already established a selected object;
  • treat a broken critical edge as an unfinished scenario even if the root node works;
  • route coder work to the narrowest broken edge or node rather than issuing broad “improve the domain” tasks.

Stop and acceptance rules

Do not accept a domain when:

  • only the root node works;
  • only one curated phrasing works;
  • selected-object follow-up is broken;
  • pronoun-only selected-object follow-up is broken or misrouted to another business object;
  • на эту дату / на ту дату loses the originating date;
  • the answer shape is wrong for the business question;
  • chronology / ranking semantics are inverted;
  • the direct answer is not surfaced first on direct lookup questions;
  • the answer is technically grounded but still business-useless.

Accepted requires:

  • score >= 80
  • no unresolved P0
  • critical path edges pass
  • canonical + colloquial + UI-selected-object variants pass for critical branches
  • no silent heuristic masking
  • direct_answer_ok = true
  • business_usefulness_ok = true