NODEDC_1C/.codex/skills/domain-case-loop/SKILL.md

242 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
name: domain-case-loop
description: "Use this skill when a user wants to iteratively refine one NDC_1C domain case or one linked multi-step domain scenario through a multi-agent loop: automated capture, JSON analysis, minimal domain patch, rerun, and before/after verdict."
---
# Domain case loop
This skill packages the standard workflow for iterating on one concrete domain case or one linked multi-step domain scenario in NDC_1C.
## Use this skill when
- the user wants to improve one domain question end-to-end;
- the answer exists but is noisy, heuristic, partial, or business-useless;
- the route is wrong even if the wording looks better;
- there is a gap between exact compute intent and actual fallback output;
- there are follow-up / continuation bugs that corrupt business context.
- the user has a cascade of linked questions that should reuse one assistant session and semantic state.
- the bug appears only in colloquial/slang wording or in UI-generated follow-up phrasing such as `По выбранному объекту "...": ...`.
## Do not use this skill when
- the user is asking for a broad architecture rewrite;
- there is no concrete domain case or no reproducible input;
- the task is only prose editing with no technical/domain component;
- the task is a generic repo cleanup unrelated to domain capability behavior.
## Repo-specific runtime map
Read `references/repo_runtime_map.md` before the first real cycle.
For follow-up-heavy domains, also read `references/scenario_tree_acceptance_canon.md` before scenario mode, pack mode, or autonomous pack-loop mode.
For business-first analyst work, also read `references/business_first_analyst_rubric.md` before redefining acceptance or hardening a noisy-but-technically-grounded domain.
If `docs/orchestration/active_domain_contract.json` exists, treat it as the single mutable source of truth for the current domain and prefer it over older scattered pool/pack prose docs.
Use these repo-native capture paths:
- automated capture: `python scripts/domain_case_loop.py run-case ...`
- linked multi-step capture: `python scripts/domain_case_loop.py run-scenario --manifest path/to/manifest.json`
- full domain question pool capture: `python scripts/domain_case_loop.py run-pack --manifest path/to/pack.json`
- autonomous full-pack loop: `python scripts/domain_case_loop.py run-pack-loop --manifest path/to/pack.json`
- import existing technical export: `python scripts/domain_case_loop.py import-export ...`
- `run-case` defaults to the repo's live local profile: `local / qwen2.5-14b-instruct-1m / http://127.0.0.1:1234/v1`
- override with `--llm-provider`, `--llm-model`, `--llm-base-url`, `--llm-api-key` when needed
- `run-pack-loop` defaults to `gpt-5.4` for the independent business analyst and `lead-handoff` repair mode; opt into the old autonomous coder loop only with `--repair-mode auto-coder`
## Workflow
### Scenario mode
Use scenario mode when the user brings a linked chain such as:
- "what is on stock now"
- "who supplied this item"
- "which documents bought it"
- "was it later sold"
In scenario mode:
- model the domain as a scenario tree, not as a flat list of prompts;
- define one `root` plus critical child drilldowns and the primary user path;
- treat `selected-object` follow-up branches as first-class business paths when the UI exposes selectable entities;
- create `scenario_manifest.json` first;
- keep one shared `session_id`;
- capture each step under `artifacts/domain_runs/<scenario_id>/steps/<step_id>/`;
- preserve semantic carryover via explicit `scenario_state.json`, not vague model memory;
- require a `scenario_acceptance_matrix.md` artifact that records node/edge coverage and paraphrase-family coverage.
Use `references/scenario_manifest_template.json`.
### Pack mode
Use pack mode when the user brings a whole domain pool and wants grouped orchestration rather than one isolated chain.
In pack mode:
- group the question pool into several coherent scenarios;
- define the root and critical branches inside each scenario instead of validating only isolated prompts;
- capture each scenario under `artifacts/domain_runs/<pack_id>/scenarios/<scenario_id>/`;
- write aggregate `pack_state.json` and `pack_summary.md`;
- aggregate scenario acceptance through node/edge coverage rather than a raw question count;
- treat unresolved scenarios as enablement backlog, not as a reason to drop the domain.
### Autonomous pack-loop mode
Use pack-loop mode when the user wants the system to run live replay, produce a strong business-first analyst verdict, and continue toward repair evidence until the analyst gate is reached or the loop hits a real blocker.
In autonomous pack-loop mode:
- run `python scripts/domain_case_loop.py run-pack-loop --manifest ...`;
- keep each iteration under `artifacts/domain_runs/<loop_id>/iterations/<iteration_id>/`;
- read `analyst_verdict.json` before any coder patch;
- by default, stop after the analyst verdict with `business_audit.md` and `lead_coder_handoff.md` so Lead Codex repairs code in the main context;
- let an autonomous coder patch only when `--repair-mode auto-coder` is explicitly selected, and only against the highest-value domain targets from the current analyst verdict;
- stop only on `accepted`, `blocked`, explicit `requires_user_decision = true`, or `max_iterations`;
- do not stop just because the analyst returns `needs_exact_capability` or `partial` if autonomous domain enablement work still remains.
- treat `quality score >= 80` as the target gate, not as permission to keep pushing through hard blockers, missing essential observations, or unsafe fixes.
- for follow-up-heavy domains, include conversational variants, slang/typo variants, and UI-generated selected-object follow-ups in the acceptance slice instead of validating only one canonical wording.
- do not mark a domain path as hardened only because the root node works; critical edges and drilldowns must pass as well.
- treat broken tree edges, missing carryover, or wrong answer shape as blockers for acceptance even when the underlying root intent is already exact.
### Step 1 - Normalize the case
Create `artifacts/domain_runs/<case_id>/case_brief.md` with:
- domain name
- raw user question
- expected business meaning
- expected exact capability
- expected result mode
- primary user path
- required paraphrase families
- required carryover invariants
- known constraints
- acceptance criteria draft
Use `references/case_brief_template.md`.
### Step 2 - Capture baseline
Preferred path:
- run `python scripts/domain_case_loop.py run-case ...`
Fallback path:
- if the user already has a copied technical export markdown, run `python scripts/domain_case_loop.py import-export ...`
Required artifacts:
- `baseline_output.md`
- `baseline_debug.json`
- `baseline_turn.json`
### Step 3 - Analyst verdict
Spawn `domain_analyst` and provide:
- `case_brief.md`
- `baseline_turn.json`
- `baseline_output.md`
- `baseline_debug.json`
- `scenario_acceptance_matrix.md` when the case is follow-up-heavy or scenario-based
- optional relevant code excerpts or file paths
Require a full verdict using `references/verdict_template.md`.
The verdict must explicitly say whether the case is:
- an existing in-contour regression;
- a missing route/intent/capability inside project scope;
- a true out-of-scope request.
- a `runtime_capability_gap`, `semantic_understanding_gap`, `edge_carryover_gap`, `answer_shape_mismatch`, `ordering_semantics_mismatch`, or `loop_coverage_gap`.
- an `object_memory_gap`, `followup_action_resolution_gap`, `bundle_reuse_gap`, `field_mapping_gap`, `business_utility_gap`, or `domain_anchor_gap` when that is the real blocker.
### Step 4 - Domain patch
Spawn `domain_coder` with:
- the case brief
- the analyst verdict
- baseline artifacts
Require:
- a minimal patch
- zero architecture drift
- rerun after changes
- if the domain is in project scope but outside the current contour, convert the verdict into capability enablement work instead of closing the case as unsupported
### Step 5 - Rerun
Capture:
- `rerun_output.md`
- `rerun_debug.json`
- `rerun_turn.json`
- `patch_summary.md`
- updated `scenario_acceptance_matrix.md` when the rerun belongs to a scenario or pack
### Step 6 - Before/after analysis
Spawn `domain_analyst` again for:
- before/after comparison
- final status recommendation
- quality score from 0 to 100
### Step 7 - Final status
Write `final_status.md` with one of:
- accepted
- partial
- blocked
- needs_exact_capability
`needs_exact_capability` is the default status when the business/domain request is valid for the project, but the current contour is missing the route, intent, capability, or domain bootstrap needed to answer it.
`needs_exact_capability` does not automatically stop autonomous pack-loop mode. Treat it as "continue domain enablement work" unless the analyst explicitly marks `requires_user_decision = true`, the runtime is truly blocked, or the loop hits `max_iterations`.
Autonomous pack-loop mode should stop early and ask the user when at least one of these is true:
- a required observation anchor is missing and cannot be recovered safely from artifacts, 1C, or the current scenario state;
- the next patch would introduce a hack, brittle workaround, hidden heuristic masking, or another low-trust shortcut;
- the next patch would cause risky architecture drift, disproportionate complexity, or a contour expansion with unclear blast radius;
- a business-critical ambiguity or scope tradeoff cannot be resolved from repo context and artifacts alone.
Accepted requires:
- quality score >= 80
- no unresolved P0 defects
- no silent heuristic masking
- critical scenario-tree edges on the primary user path are green
- canonical, colloquial, and UI-selected-object variants are green for critical branches
## Hard rules
- Do not count heuristic candidates as confirmed business answers.
- If exact data should exist in 1C/MCP, prefer exact route work over prompt cosmetics.
- If exact data does not exist yet in the reachable contour, return a technical insufficiency with a crisp blocker.
- If the user case belongs to a project-relevant domain but is outside the current contour, do not treat that as a terminal rejection. Treat it as domain enablement work and record the missing route/intent/capability explicitly.
- Raise `requires_user_decision = true` when the loop would otherwise have to guess a missing anchor, choose between materially different risky implementations, or push through a hacky/suspicious fix path.
- Never fabricate 1C data.
- Keep domain fixes minimal and localized.
- Preserve successful baseline scenarios.
- Treat follow-up continuity as a state-machine problem, not a wording problem.
- Do not accept a domain as hardened if only canonical phrasing works while colloquial or UI-generated follow-up phrasing still breaks the exact contour.
- Do not accept a domain as hardened if the root node works but a critical selected-object or drilldown edge still breaks.
- Treat temporal carryover loss in a cascading scenario as a real regression: if the user says `на эту дату` / `на ту дату`, the analyst must verify that the exact carried date or period survived into `extracted_filters`.
- Treat answer-shape mismatch as a scoring defect: if the user asked for items / residues / contracts, do not accept an answer that switched to raw documents, movements, or another lower-level object without saying so explicitly.
- Treat ordering semantics as part of correctness when the wording implies ranking or chronology, for example `старые закупки` => oldest-first rather than newest-first.
- Treat primary user-path failures as more important than supporting-path polish: if the user cannot go from root list -> selected object -> first drilldown, the scenario is not accepted.
- Treat direct-answer-first behavior as part of correctness: if the user asked a direct lookup question, the first line must contain the direct answer before the evidence blocks.
- Treat business usefulness as part of correctness: factual-but-business-useless output is not acceptance-quality output.
- Treat stable follow-up object memory as part of correctness: when the prior turn already resolved the relevant item/object, the next turn must not re-ask for it.
- Treat object-centric dialog state as part of correctness: short follow-ups like `по ней`, `по этой позиции`, `когда купили ее`, `покажи документы по этой позиции` must resolve against the active selected item before broader routing guesses.
- Treat reusable supplier/date/document bundles as part of correctness: adjacent follow-ups over the same item should reuse a resolved provenance bundle when available.
- Treat action-first follow-up behavior as part of correctness: when the user asks `кто`, `когда`, `каким документом`, or `покажи документы` over a selected object, the answer must begin with that action's result rather than with a generic trace narrative.
- Treat answer layering as part of correctness: user-facing answer first, proof second, service or methodological notes last.
- Treat stable `answer_object` state as part of correctness: once supplier/date/document facts are already resolved, adjacent narrow follow-ups should derive from that bundle instead of replaying a full search.
- Treat narrow selected-object micro-actions as compact answers by default: `кто`, `когда`, `каким документом`, `покажи документы`, `сумма`, `все закупки` should return the requested fact first and should not open with a generic multi-block trace packet.
- Treat temporal honesty as part of correctness: if the exact requested window has no evidence and the runtime auto-broadens to nearest available rows, the answer must separate the exact-window outcome from the out-of-window evidence.
- Treat supplier/buyer field truth as part of correctness: do not surface `organization` as `supplier` or `buyer` without proven mapping.
- Do not accept top-of-answer system scaffolding such as `status`, `what was considered`, `row counts`, or `exact contour` above the user-facing answer on business-critical turns.
- Do not accept numbered block scaffolding such as `Блок 1/2/3` in narrow business follow-ups unless the user explicitly asked for a structured report.
## Domain-specific framing
For this repository:
- architecture must remain unchanged;
- 1C/MCP is the primary source of truth;
- analyst output must be detailed and business-readable;
- answers should be suitable for product hardening, not just debugging notes;
- machine-readable turn artifacts are first-class inputs for analysis.
- New user domains may be unmarked in the current repo. Missing markup is expected and should be handled as enablement, not as a reason to stop the loop.
## Recommended artifact set
Use the artifact layout from `references/artifact_layout.md`.