242 lines
15 KiB
Markdown
242 lines
15 KiB
Markdown
---
|
||
name: domain-case-loop
|
||
description: "Use this skill when a user wants to iteratively refine one NDC_1C domain case or one linked multi-step domain scenario through a multi-agent loop: automated capture, JSON analysis, minimal domain patch, rerun, and before/after verdict."
|
||
---
|
||
|
||
# Domain case loop
|
||
|
||
This skill packages the standard workflow for iterating on one concrete domain case or one linked multi-step domain scenario in NDC_1C.
|
||
|
||
## Use this skill when
|
||
|
||
- the user wants to improve one domain question end-to-end;
|
||
- the answer exists but is noisy, heuristic, partial, or business-useless;
|
||
- the route is wrong even if the wording looks better;
|
||
- there is a gap between exact compute intent and actual fallback output;
|
||
- there are follow-up / continuation bugs that corrupt business context.
|
||
- the user has a cascade of linked questions that should reuse one assistant session and semantic state.
|
||
- the bug appears only in colloquial/slang wording or in UI-generated follow-up phrasing such as `По выбранному объекту "...": ...`.
|
||
|
||
## Do not use this skill when
|
||
|
||
- the user is asking for a broad architecture rewrite;
|
||
- there is no concrete domain case or no reproducible input;
|
||
- the task is only prose editing with no technical/domain component;
|
||
- the task is a generic repo cleanup unrelated to domain capability behavior.
|
||
|
||
## Repo-specific runtime map
|
||
|
||
Read `references/repo_runtime_map.md` before the first real cycle.
|
||
For follow-up-heavy domains, also read `references/scenario_tree_acceptance_canon.md` before scenario mode, pack mode, or autonomous pack-loop mode.
|
||
For business-first analyst work, also read `references/business_first_analyst_rubric.md` before redefining acceptance or hardening a noisy-but-technically-grounded domain.
|
||
If `docs/orchestration/active_domain_contract.json` exists, treat it as the single mutable source of truth for the current domain and prefer it over older scattered pool/pack prose docs.
|
||
|
||
Use these repo-native capture paths:
|
||
- automated capture: `python scripts/domain_case_loop.py run-case ...`
|
||
- linked multi-step capture: `python scripts/domain_case_loop.py run-scenario --manifest path/to/manifest.json`
|
||
- full domain question pool capture: `python scripts/domain_case_loop.py run-pack --manifest path/to/pack.json`
|
||
- autonomous full-pack loop: `python scripts/domain_case_loop.py run-pack-loop --manifest path/to/pack.json`
|
||
- import existing technical export: `python scripts/domain_case_loop.py import-export ...`
|
||
- `run-case` defaults to the repo's live local profile: `local / qwen2.5-14b-instruct-1m / http://127.0.0.1:1234/v1`
|
||
- override with `--llm-provider`, `--llm-model`, `--llm-base-url`, `--llm-api-key` when needed
|
||
- `run-pack-loop` defaults to `gpt-5.4` for the independent business analyst and `lead-handoff` repair mode; opt into the old autonomous coder loop only with `--repair-mode auto-coder`
|
||
|
||
## Workflow
|
||
|
||
### Scenario mode
|
||
|
||
Use scenario mode when the user brings a linked chain such as:
|
||
- "what is on stock now"
|
||
- "who supplied this item"
|
||
- "which documents bought it"
|
||
- "was it later sold"
|
||
|
||
In scenario mode:
|
||
- model the domain as a scenario tree, not as a flat list of prompts;
|
||
- define one `root` plus critical child drilldowns and the primary user path;
|
||
- treat `selected-object` follow-up branches as first-class business paths when the UI exposes selectable entities;
|
||
- create `scenario_manifest.json` first;
|
||
- keep one shared `session_id`;
|
||
- capture each step under `artifacts/domain_runs/<scenario_id>/steps/<step_id>/`;
|
||
- preserve semantic carryover via explicit `scenario_state.json`, not vague model memory;
|
||
- require a `scenario_acceptance_matrix.md` artifact that records node/edge coverage and paraphrase-family coverage.
|
||
|
||
Use `references/scenario_manifest_template.json`.
|
||
|
||
### Pack mode
|
||
|
||
Use pack mode when the user brings a whole domain pool and wants grouped orchestration rather than one isolated chain.
|
||
|
||
In pack mode:
|
||
- group the question pool into several coherent scenarios;
|
||
- define the root and critical branches inside each scenario instead of validating only isolated prompts;
|
||
- capture each scenario under `artifacts/domain_runs/<pack_id>/scenarios/<scenario_id>/`;
|
||
- write aggregate `pack_state.json` and `pack_summary.md`;
|
||
- aggregate scenario acceptance through node/edge coverage rather than a raw question count;
|
||
- treat unresolved scenarios as enablement backlog, not as a reason to drop the domain.
|
||
|
||
### Autonomous pack-loop mode
|
||
|
||
Use pack-loop mode when the user wants the system to run live replay, produce a strong business-first analyst verdict, and continue toward repair evidence until the analyst gate is reached or the loop hits a real blocker.
|
||
|
||
In autonomous pack-loop mode:
|
||
- run `python scripts/domain_case_loop.py run-pack-loop --manifest ...`;
|
||
- keep each iteration under `artifacts/domain_runs/<loop_id>/iterations/<iteration_id>/`;
|
||
- read `analyst_verdict.json` before any coder patch;
|
||
- by default, stop after the analyst verdict with `business_audit.md` and `lead_coder_handoff.md` so Lead Codex repairs code in the main context;
|
||
- let an autonomous coder patch only when `--repair-mode auto-coder` is explicitly selected, and only against the highest-value domain targets from the current analyst verdict;
|
||
- stop only on `accepted`, `blocked`, explicit `requires_user_decision = true`, or `max_iterations`;
|
||
- do not stop just because the analyst returns `needs_exact_capability` or `partial` if autonomous domain enablement work still remains.
|
||
- treat `quality score >= 80` as the target gate, not as permission to keep pushing through hard blockers, missing essential observations, or unsafe fixes.
|
||
- for follow-up-heavy domains, include conversational variants, slang/typo variants, and UI-generated selected-object follow-ups in the acceptance slice instead of validating only one canonical wording.
|
||
- do not mark a domain path as hardened only because the root node works; critical edges and drilldowns must pass as well.
|
||
- treat broken tree edges, missing carryover, or wrong answer shape as blockers for acceptance even when the underlying root intent is already exact.
|
||
|
||
### Step 1 - Normalize the case
|
||
|
||
Create `artifacts/domain_runs/<case_id>/case_brief.md` with:
|
||
- domain name
|
||
- raw user question
|
||
- expected business meaning
|
||
- expected exact capability
|
||
- expected result mode
|
||
- primary user path
|
||
- required paraphrase families
|
||
- required carryover invariants
|
||
- known constraints
|
||
- acceptance criteria draft
|
||
|
||
Use `references/case_brief_template.md`.
|
||
|
||
### Step 2 - Capture baseline
|
||
|
||
Preferred path:
|
||
- run `python scripts/domain_case_loop.py run-case ...`
|
||
|
||
Fallback path:
|
||
- if the user already has a copied technical export markdown, run `python scripts/domain_case_loop.py import-export ...`
|
||
|
||
Required artifacts:
|
||
- `baseline_output.md`
|
||
- `baseline_debug.json`
|
||
- `baseline_turn.json`
|
||
|
||
### Step 3 - Analyst verdict
|
||
|
||
Spawn `domain_analyst` and provide:
|
||
- `case_brief.md`
|
||
- `baseline_turn.json`
|
||
- `baseline_output.md`
|
||
- `baseline_debug.json`
|
||
- `scenario_acceptance_matrix.md` when the case is follow-up-heavy or scenario-based
|
||
- optional relevant code excerpts or file paths
|
||
|
||
Require a full verdict using `references/verdict_template.md`.
|
||
|
||
The verdict must explicitly say whether the case is:
|
||
- an existing in-contour regression;
|
||
- a missing route/intent/capability inside project scope;
|
||
- a true out-of-scope request.
|
||
- a `runtime_capability_gap`, `semantic_understanding_gap`, `edge_carryover_gap`, `answer_shape_mismatch`, `ordering_semantics_mismatch`, or `loop_coverage_gap`.
|
||
- an `object_memory_gap`, `followup_action_resolution_gap`, `bundle_reuse_gap`, `field_mapping_gap`, `business_utility_gap`, or `domain_anchor_gap` when that is the real blocker.
|
||
|
||
### Step 4 - Domain patch
|
||
|
||
Spawn `domain_coder` with:
|
||
- the case brief
|
||
- the analyst verdict
|
||
- baseline artifacts
|
||
|
||
Require:
|
||
- a minimal patch
|
||
- zero architecture drift
|
||
- rerun after changes
|
||
- if the domain is in project scope but outside the current contour, convert the verdict into capability enablement work instead of closing the case as unsupported
|
||
|
||
### Step 5 - Rerun
|
||
|
||
Capture:
|
||
- `rerun_output.md`
|
||
- `rerun_debug.json`
|
||
- `rerun_turn.json`
|
||
- `patch_summary.md`
|
||
- updated `scenario_acceptance_matrix.md` when the rerun belongs to a scenario or pack
|
||
|
||
### Step 6 - Before/after analysis
|
||
|
||
Spawn `domain_analyst` again for:
|
||
- before/after comparison
|
||
- final status recommendation
|
||
- quality score from 0 to 100
|
||
|
||
### Step 7 - Final status
|
||
|
||
Write `final_status.md` with one of:
|
||
- accepted
|
||
- partial
|
||
- blocked
|
||
- needs_exact_capability
|
||
|
||
`needs_exact_capability` is the default status when the business/domain request is valid for the project, but the current contour is missing the route, intent, capability, or domain bootstrap needed to answer it.
|
||
|
||
`needs_exact_capability` does not automatically stop autonomous pack-loop mode. Treat it as "continue domain enablement work" unless the analyst explicitly marks `requires_user_decision = true`, the runtime is truly blocked, or the loop hits `max_iterations`.
|
||
|
||
Autonomous pack-loop mode should stop early and ask the user when at least one of these is true:
|
||
- a required observation anchor is missing and cannot be recovered safely from artifacts, 1C, or the current scenario state;
|
||
- the next patch would introduce a hack, brittle workaround, hidden heuristic masking, or another low-trust shortcut;
|
||
- the next patch would cause risky architecture drift, disproportionate complexity, or a contour expansion with unclear blast radius;
|
||
- a business-critical ambiguity or scope tradeoff cannot be resolved from repo context and artifacts alone.
|
||
|
||
Accepted requires:
|
||
- quality score >= 80
|
||
- no unresolved P0 defects
|
||
- no silent heuristic masking
|
||
- critical scenario-tree edges on the primary user path are green
|
||
- canonical, colloquial, and UI-selected-object variants are green for critical branches
|
||
|
||
## Hard rules
|
||
|
||
- Do not count heuristic candidates as confirmed business answers.
|
||
- If exact data should exist in 1C/MCP, prefer exact route work over prompt cosmetics.
|
||
- If exact data does not exist yet in the reachable contour, return a technical insufficiency with a crisp blocker.
|
||
- If the user case belongs to a project-relevant domain but is outside the current contour, do not treat that as a terminal rejection. Treat it as domain enablement work and record the missing route/intent/capability explicitly.
|
||
- Raise `requires_user_decision = true` when the loop would otherwise have to guess a missing anchor, choose between materially different risky implementations, or push through a hacky/suspicious fix path.
|
||
- Never fabricate 1C data.
|
||
- Keep domain fixes minimal and localized.
|
||
- Preserve successful baseline scenarios.
|
||
- Treat follow-up continuity as a state-machine problem, not a wording problem.
|
||
- Do not accept a domain as hardened if only canonical phrasing works while colloquial or UI-generated follow-up phrasing still breaks the exact contour.
|
||
- Do not accept a domain as hardened if the root node works but a critical selected-object or drilldown edge still breaks.
|
||
- Treat temporal carryover loss in a cascading scenario as a real regression: if the user says `на эту дату` / `на ту дату`, the analyst must verify that the exact carried date or period survived into `extracted_filters`.
|
||
- Treat answer-shape mismatch as a scoring defect: if the user asked for items / residues / contracts, do not accept an answer that switched to raw documents, movements, or another lower-level object without saying so explicitly.
|
||
- Treat ordering semantics as part of correctness when the wording implies ranking or chronology, for example `старые закупки` => oldest-first rather than newest-first.
|
||
- Treat primary user-path failures as more important than supporting-path polish: if the user cannot go from root list -> selected object -> first drilldown, the scenario is not accepted.
|
||
- Treat direct-answer-first behavior as part of correctness: if the user asked a direct lookup question, the first line must contain the direct answer before the evidence blocks.
|
||
- Treat business usefulness as part of correctness: factual-but-business-useless output is not acceptance-quality output.
|
||
- Treat stable follow-up object memory as part of correctness: when the prior turn already resolved the relevant item/object, the next turn must not re-ask for it.
|
||
- Treat object-centric dialog state as part of correctness: short follow-ups like `по ней`, `по этой позиции`, `когда купили ее`, `покажи документы по этой позиции` must resolve against the active selected item before broader routing guesses.
|
||
- Treat reusable supplier/date/document bundles as part of correctness: adjacent follow-ups over the same item should reuse a resolved provenance bundle when available.
|
||
|
||
- Treat action-first follow-up behavior as part of correctness: when the user asks `кто`, `когда`, `каким документом`, or `покажи документы` over a selected object, the answer must begin with that action's result rather than with a generic trace narrative.
|
||
- Treat answer layering as part of correctness: user-facing answer first, proof second, service or methodological notes last.
|
||
- Treat stable `answer_object` state as part of correctness: once supplier/date/document facts are already resolved, adjacent narrow follow-ups should derive from that bundle instead of replaying a full search.
|
||
- Treat narrow selected-object micro-actions as compact answers by default: `кто`, `когда`, `каким документом`, `покажи документы`, `сумма`, `все закупки` should return the requested fact first and should not open with a generic multi-block trace packet.
|
||
- Treat temporal honesty as part of correctness: if the exact requested window has no evidence and the runtime auto-broadens to nearest available rows, the answer must separate the exact-window outcome from the out-of-window evidence.
|
||
- Treat supplier/buyer field truth as part of correctness: do not surface `organization` as `supplier` or `buyer` without proven mapping.
|
||
- Do not accept top-of-answer system scaffolding such as `status`, `what was considered`, `row counts`, or `exact contour` above the user-facing answer on business-critical turns.
|
||
- Do not accept numbered block scaffolding such as `Блок 1/2/3` in narrow business follow-ups unless the user explicitly asked for a structured report.
|
||
|
||
## Domain-specific framing
|
||
|
||
For this repository:
|
||
- architecture must remain unchanged;
|
||
- 1C/MCP is the primary source of truth;
|
||
- analyst output must be detailed and business-readable;
|
||
- answers should be suitable for product hardening, not just debugging notes;
|
||
- machine-readable turn artifacts are first-class inputs for analysis.
|
||
- New user domains may be unmarked in the current repo. Missing markup is expected and should be handled as enablement, not as a reason to stop the loop.
|
||
|
||
## Recommended artifact set
|
||
|
||
Use the artifact layout from `references/artifact_layout.md`.
|