NODEDC_1C/.codex/skills/domain-case-loop/SKILL.md

15 KiB
Raw Permalink Blame History

name description
domain-case-loop Use this skill when a user wants to iteratively refine one NDC_1C domain case or one linked multi-step domain scenario through a multi-agent loop: automated capture, JSON analysis, minimal domain patch, rerun, and before/after verdict.

Domain case loop

This skill packages the standard workflow for iterating on one concrete domain case or one linked multi-step domain scenario in NDC_1C.

Use this skill when

  • the user wants to improve one domain question end-to-end;
  • the answer exists but is noisy, heuristic, partial, or business-useless;
  • the route is wrong even if the wording looks better;
  • there is a gap between exact compute intent and actual fallback output;
  • there are follow-up / continuation bugs that corrupt business context.
  • the user has a cascade of linked questions that should reuse one assistant session and semantic state.
  • the bug appears only in colloquial/slang wording or in UI-generated follow-up phrasing such as По выбранному объекту "...": ....

Do not use this skill when

  • the user is asking for a broad architecture rewrite;
  • there is no concrete domain case or no reproducible input;
  • the task is only prose editing with no technical/domain component;
  • the task is a generic repo cleanup unrelated to domain capability behavior.

Repo-specific runtime map

Read references/repo_runtime_map.md before the first real cycle. For follow-up-heavy domains, also read references/scenario_tree_acceptance_canon.md before scenario mode, pack mode, or autonomous pack-loop mode. For business-first analyst work, also read references/business_first_analyst_rubric.md before redefining acceptance or hardening a noisy-but-technically-grounded domain. If docs/orchestration/active_domain_contract.json exists, treat it as the single mutable source of truth for the current domain and prefer it over older scattered pool/pack prose docs.

Use these repo-native capture paths:

  • automated capture: python scripts/domain_case_loop.py run-case ...
  • linked multi-step capture: python scripts/domain_case_loop.py run-scenario --manifest path/to/manifest.json
  • full domain question pool capture: python scripts/domain_case_loop.py run-pack --manifest path/to/pack.json
  • autonomous full-pack loop: python scripts/domain_case_loop.py run-pack-loop --manifest path/to/pack.json
  • import existing technical export: python scripts/domain_case_loop.py import-export ...
  • run-case defaults to the repo's live local profile: local / qwen2.5-14b-instruct-1m / http://127.0.0.1:1234/v1
  • override with --llm-provider, --llm-model, --llm-base-url, --llm-api-key when needed
  • run-pack-loop defaults to gpt-5.4 for the independent business analyst and lead-handoff repair mode; opt into the old autonomous coder loop only with --repair-mode auto-coder

Workflow

Scenario mode

Use scenario mode when the user brings a linked chain such as:

  • "what is on stock now"
  • "who supplied this item"
  • "which documents bought it"
  • "was it later sold"

In scenario mode:

  • model the domain as a scenario tree, not as a flat list of prompts;
  • define one root plus critical child drilldowns and the primary user path;
  • treat selected-object follow-up branches as first-class business paths when the UI exposes selectable entities;
  • create scenario_manifest.json first;
  • keep one shared session_id;
  • capture each step under artifacts/domain_runs/<scenario_id>/steps/<step_id>/;
  • preserve semantic carryover via explicit scenario_state.json, not vague model memory;
  • require a scenario_acceptance_matrix.md artifact that records node/edge coverage and paraphrase-family coverage.

Use references/scenario_manifest_template.json.

Pack mode

Use pack mode when the user brings a whole domain pool and wants grouped orchestration rather than one isolated chain.

In pack mode:

  • group the question pool into several coherent scenarios;
  • define the root and critical branches inside each scenario instead of validating only isolated prompts;
  • capture each scenario under artifacts/domain_runs/<pack_id>/scenarios/<scenario_id>/;
  • write aggregate pack_state.json and pack_summary.md;
  • aggregate scenario acceptance through node/edge coverage rather than a raw question count;
  • treat unresolved scenarios as enablement backlog, not as a reason to drop the domain.

Autonomous pack-loop mode

Use pack-loop mode when the user wants the system to run live replay, produce a strong business-first analyst verdict, and continue toward repair evidence until the analyst gate is reached or the loop hits a real blocker.

In autonomous pack-loop mode:

  • run python scripts/domain_case_loop.py run-pack-loop --manifest ...;
  • keep each iteration under artifacts/domain_runs/<loop_id>/iterations/<iteration_id>/;
  • read analyst_verdict.json before any coder patch;
  • by default, stop after the analyst verdict with business_audit.md and lead_coder_handoff.md so Lead Codex repairs code in the main context;
  • let an autonomous coder patch only when --repair-mode auto-coder is explicitly selected, and only against the highest-value domain targets from the current analyst verdict;
  • stop only on accepted, blocked, explicit requires_user_decision = true, or max_iterations;
  • do not stop just because the analyst returns needs_exact_capability or partial if autonomous domain enablement work still remains.
  • treat quality score >= 80 as the target gate, not as permission to keep pushing through hard blockers, missing essential observations, or unsafe fixes.
  • for follow-up-heavy domains, include conversational variants, slang/typo variants, and UI-generated selected-object follow-ups in the acceptance slice instead of validating only one canonical wording.
  • do not mark a domain path as hardened only because the root node works; critical edges and drilldowns must pass as well.
  • treat broken tree edges, missing carryover, or wrong answer shape as blockers for acceptance even when the underlying root intent is already exact.

Step 1 - Normalize the case

Create artifacts/domain_runs/<case_id>/case_brief.md with:

  • domain name
  • raw user question
  • expected business meaning
  • expected exact capability
  • expected result mode
  • primary user path
  • required paraphrase families
  • required carryover invariants
  • known constraints
  • acceptance criteria draft

Use references/case_brief_template.md.

Step 2 - Capture baseline

Preferred path:

  • run python scripts/domain_case_loop.py run-case ...

Fallback path:

  • if the user already has a copied technical export markdown, run python scripts/domain_case_loop.py import-export ...

Required artifacts:

  • baseline_output.md
  • baseline_debug.json
  • baseline_turn.json

Step 3 - Analyst verdict

Spawn domain_analyst and provide:

  • case_brief.md
  • baseline_turn.json
  • baseline_output.md
  • baseline_debug.json
  • scenario_acceptance_matrix.md when the case is follow-up-heavy or scenario-based
  • optional relevant code excerpts or file paths

Require a full verdict using references/verdict_template.md.

The verdict must explicitly say whether the case is:

  • an existing in-contour regression;
  • a missing route/intent/capability inside project scope;
  • a true out-of-scope request.
  • a runtime_capability_gap, semantic_understanding_gap, edge_carryover_gap, answer_shape_mismatch, ordering_semantics_mismatch, or loop_coverage_gap.
  • an object_memory_gap, followup_action_resolution_gap, bundle_reuse_gap, field_mapping_gap, business_utility_gap, or domain_anchor_gap when that is the real blocker.

Step 4 - Domain patch

Spawn domain_coder with:

  • the case brief
  • the analyst verdict
  • baseline artifacts

Require:

  • a minimal patch
  • zero architecture drift
  • rerun after changes
  • if the domain is in project scope but outside the current contour, convert the verdict into capability enablement work instead of closing the case as unsupported

Step 5 - Rerun

Capture:

  • rerun_output.md
  • rerun_debug.json
  • rerun_turn.json
  • patch_summary.md
  • updated scenario_acceptance_matrix.md when the rerun belongs to a scenario or pack

Step 6 - Before/after analysis

Spawn domain_analyst again for:

  • before/after comparison
  • final status recommendation
  • quality score from 0 to 100

Step 7 - Final status

Write final_status.md with one of:

  • accepted
  • partial
  • blocked
  • needs_exact_capability

needs_exact_capability is the default status when the business/domain request is valid for the project, but the current contour is missing the route, intent, capability, or domain bootstrap needed to answer it.

needs_exact_capability does not automatically stop autonomous pack-loop mode. Treat it as "continue domain enablement work" unless the analyst explicitly marks requires_user_decision = true, the runtime is truly blocked, or the loop hits max_iterations.

Autonomous pack-loop mode should stop early and ask the user when at least one of these is true:

  • a required observation anchor is missing and cannot be recovered safely from artifacts, 1C, or the current scenario state;
  • the next patch would introduce a hack, brittle workaround, hidden heuristic masking, or another low-trust shortcut;
  • the next patch would cause risky architecture drift, disproportionate complexity, or a contour expansion with unclear blast radius;
  • a business-critical ambiguity or scope tradeoff cannot be resolved from repo context and artifacts alone.

Accepted requires:

  • quality score >= 80
  • no unresolved P0 defects
  • no silent heuristic masking
  • critical scenario-tree edges on the primary user path are green
  • canonical, colloquial, and UI-selected-object variants are green for critical branches

Hard rules

  • Do not count heuristic candidates as confirmed business answers.

  • If exact data should exist in 1C/MCP, prefer exact route work over prompt cosmetics.

  • If exact data does not exist yet in the reachable contour, return a technical insufficiency with a crisp blocker.

  • If the user case belongs to a project-relevant domain but is outside the current contour, do not treat that as a terminal rejection. Treat it as domain enablement work and record the missing route/intent/capability explicitly.

  • Raise requires_user_decision = true when the loop would otherwise have to guess a missing anchor, choose between materially different risky implementations, or push through a hacky/suspicious fix path.

  • Never fabricate 1C data.

  • Keep domain fixes minimal and localized.

  • Preserve successful baseline scenarios.

  • Treat follow-up continuity as a state-machine problem, not a wording problem.

  • Do not accept a domain as hardened if only canonical phrasing works while colloquial or UI-generated follow-up phrasing still breaks the exact contour.

  • Do not accept a domain as hardened if the root node works but a critical selected-object or drilldown edge still breaks.

  • Treat temporal carryover loss in a cascading scenario as a real regression: if the user says на эту дату / на ту дату, the analyst must verify that the exact carried date or period survived into extracted_filters.

  • Treat answer-shape mismatch as a scoring defect: if the user asked for items / residues / contracts, do not accept an answer that switched to raw documents, movements, or another lower-level object without saying so explicitly.

  • Treat ordering semantics as part of correctness when the wording implies ranking or chronology, for example старые закупки => oldest-first rather than newest-first.

  • Treat primary user-path failures as more important than supporting-path polish: if the user cannot go from root list -> selected object -> first drilldown, the scenario is not accepted.

  • Treat direct-answer-first behavior as part of correctness: if the user asked a direct lookup question, the first line must contain the direct answer before the evidence blocks.

  • Treat business usefulness as part of correctness: factual-but-business-useless output is not acceptance-quality output.

  • Treat stable follow-up object memory as part of correctness: when the prior turn already resolved the relevant item/object, the next turn must not re-ask for it.

  • Treat object-centric dialog state as part of correctness: short follow-ups like по ней, по этой позиции, когда купили ее, покажи документы по этой позиции must resolve against the active selected item before broader routing guesses.

  • Treat reusable supplier/date/document bundles as part of correctness: adjacent follow-ups over the same item should reuse a resolved provenance bundle when available.

  • Treat action-first follow-up behavior as part of correctness: when the user asks кто, когда, каким документом, or покажи документы over a selected object, the answer must begin with that action's result rather than with a generic trace narrative.

  • Treat answer layering as part of correctness: user-facing answer first, proof second, service or methodological notes last.

  • Treat stable answer_object state as part of correctness: once supplier/date/document facts are already resolved, adjacent narrow follow-ups should derive from that bundle instead of replaying a full search.

  • Treat narrow selected-object micro-actions as compact answers by default: кто, когда, каким документом, покажи документы, сумма, все закупки should return the requested fact first and should not open with a generic multi-block trace packet.

  • Treat temporal honesty as part of correctness: if the exact requested window has no evidence and the runtime auto-broadens to nearest available rows, the answer must separate the exact-window outcome from the out-of-window evidence.

  • Treat supplier/buyer field truth as part of correctness: do not surface organization as supplier or buyer without proven mapping.

  • Do not accept top-of-answer system scaffolding such as status, what was considered, row counts, or exact contour above the user-facing answer on business-critical turns.

  • Do not accept numbered block scaffolding such as Блок 1/2/3 in narrow business follow-ups unless the user explicitly asked for a structured report.

Domain-specific framing

For this repository:

  • architecture must remain unchanged;
  • 1C/MCP is the primary source of truth;
  • analyst output must be detailed and business-readable;
  • answers should be suitable for product hardening, not just debugging notes;
  • machine-readable turn artifacts are first-class inputs for analysis.
  • New user domains may be unmarked in the current repo. Missing markup is expected and should be handled as enablement, not as a reason to stop the loop.

Use the artifact layout from references/artifact_layout.md.