4.4 KiB
Business-first analyst rubric
Use this rubric when evaluating one domain case, one multi-step scenario, or one full domain pack.
The analyst must not stop at route/debug correctness. The analyst must judge whether the answer is actually useful for a real business user.
Core principle
The analyst evaluates five layers at once:
- user intent;
- scenario tree and state continuity;
- business usefulness of the answer;
- evidence and field truthfulness;
- root cause and smallest defensible fix direction.
Required analyst questions
For every critical turn or critical edge, answer these questions explicitly:
- What did the user really ask?
- State the business meaning in one short sentence.
- Name the minimum direct answer the user expected.
- What should the first line of the answer have been?
- If the user asked a direct lookup question, the first line must contain the direct answer.
- Technical explanation, limitations, and evidence come after the direct answer.
- What object and scope had to survive from previous turns?
- selected item / selected contract / selected counterparty;
- originating date or period;
- warehouse or organization scope when still relevant;
- reusable resolved bundle, for example provenance trace or sale trace.
- Did the answer stay on the same business object?
- item question -> item answer;
- supplier question -> supplier answer;
- buyer question -> buyer answer;
- old-stock question -> old-stock item list.
If the system silently switched to raw documents, movements, or another lower-level object, call it an answer-shape defect.
- Are the surfaced fields truthful and correctly labeled?
- do not confuse supplier with organization;
- do not confuse buyer with organization;
- do not present a document-side technical field as a business truth unless that mapping is proven.
Business usefulness rules
An answer is not accepted as business-useful when any of these are true:
- the direct answer is not placed first;
- the answer opens with technical hedging instead of the user-facing result;
- a weaker question is answered than the one the user asked;
- the answer requires the user to reconstruct the conclusion from low-level evidence;
- the answer uses ambiguous field labels for business-critical entities.
State continuity rules
Follow-up continuity is a first-class acceptance object.
The analyst must verify:
- selected object continuity;
- date/period continuity;
- reusable evidence continuity;
- pronoun resolution continuity.
Important pronoun examples:
эту позициюэтот товарегопо немупо этой позиции
If the previous turn already resolved a concrete object, the next turn must reuse it instead of asking for the anchor again.
Reusable answer-object cache
For follow-up-heavy domains, the analyst should explicitly look for evidence that the product behaves as if it had a reusable resolved object bundle.
Examples:
current_itemcurrent_as_of_datecurrent_provenance_tracecurrent_sale_tracefirst_purchase_datesupplier_if_knownsource_document_if_known
If the runtime recomputes everything from scratch and loses the already resolved object, call that out as a state-layer defect.
Root-cause layers
Use one or more of these root-cause layers explicitly:
semantic_understanding_gapruntime_capability_gapedge_carryover_gapobject_memory_gapfield_mapping_gapanswer_shape_mismatchordering_semantics_mismatchbusiness_utility_gaploop_coverage_gapdomain_anchor_gap
Minimum machine-readable verdict fields
The analyst verdict should expose at least:
user_intent_summaryexpected_direct_answeractual_direct_answerdirect_answer_okbusiness_usefulness_okbusiness_utility_scoredirect_answer_priority_scorestate_continuity_scoreanswer_shape_scoreevidence_clarity_scoreroot_cause_layersbroken_edge_idsviolated_invariants
Inventory-specific reminders
For inventory follow-up chains, verify all of these:
- the selected item remains the current focus object after the user clicks a result;
- provenance questions answer supplier/date/document first, not only raw movement rows;
когда купилиcan reuse the already resolved provenance bundle;- supplier and organization are not mixed up in the surfaced answer;
на эту датуkeeps the original stock date unless the user explicitly changed it.