# Domain Scenario Loop - Repo Adapter ## Purpose This repository now supports two outer-loop capture modes: - `run-case` for one concrete domain question; - `run-scenario` for a linked multi-step domain chain that should reuse one assistant session. - `run-pack` for a whole domain question pool grouped into several scenarios. - `run-pack-loop` for a strong analyst review loop over a whole domain pack, with Lead Codex repair handoff by default. `run-scenario` is the preferred capture mode for domains where the user's next question depends on the previous result set. `run-pack` is the preferred capture mode when the user brings a full domain pool that should be kept in one aggregate backlog. ## Runtime contract The scenario runner does not introduce a new product runtime. It reuses: - `POST /api/assistant/message` - `GET /api/assistant/session/:session_id` - current backend LLM/profile configuration - current address/deep routing inside the product ## Artifact contract Scenario artifacts live under: `artifacts/domain_runs//` Top-level artifacts: - `scenario_brief.md` - `scenario_manifest.json` - `scenario_state.json` - `scenario_summary.md` - `scenario_output.md` - `final_status.md` Per-step artifacts: - `steps//output.md` - `steps//debug.json` - `steps//turn.json` - `steps//session.json` - `steps//assistant_response.json` - `steps//step_state.json` Pack artifacts live under: `artifacts/domain_runs//` - `pack_manifest.json` - `pack_state.json` - `pack_summary.md` - `final_status.md` - `scenarios//...` ## AGENT autorun save gate `scripts/save_agent_semantic_run.py` is a post-validation persistence tool, not a replay executor. The normal path is: 1. build/update the truth-harness spec; 2. run `python scripts/domain_truth_harness.py run-live --spec ... --output-dir artifacts/domain_runs/`; 3. inspect `truth_review.md`, `business_review.md`, `pack_state.json`, and `final_status.md`; 4. save to GUI autoruns only with `python scripts/save_agent_semantic_run.py --spec ... --validated-run-dir artifacts/domain_runs/`. The save gate requires: - `pack_state.final_status = accepted`; - `pack_state.acceptance_gate_passed = true`; - `truth_review.summary.overall_status = pass`; - `business_review.overall_business_status = pass`; - zero unresolved P0 and zero business-answer failures. If a pack must be saved as a deliberate manual draft before live acceptance, use `--allow-unvalidated --unvalidated-reason ""`. That path is explicitly marked as unvalidated and must not be treated as semantic proof. ## Stage-level AGENT loop `scripts/stage_agent_loop.py` wraps the domain pack loop into the development-stage workflow: 1. take the current global/local stage manifest; 2. run `scripts/domain_case_loop.py run-pack-loop` for that stage pack; 3. let the loop run pack replay and a business-first analyst verdict; if the gate is not accepted, write `business_audit.md` and `lead_coder_handoff.md` instead of launching a weak coder by default; 4. if accepted, persist the validated AGENT pack into GUI autoruns through `scripts/save_agent_semantic_run.py --validated-run-dir`; 5. write `stage_loop_summary.json` and `stage_loop_handoff.md` for the final human visual confirmation. The stage manifest schema is `docs/orchestration/schemas/stage_agent_loop_manifest.schema.json`. The default stage gate is intentionally stricter than a narrow case gate: `target_score = 88`, no unresolved P0/P1 repair targets, accepted analyst verdict, clean business usefulness, direct-answer, temporal-honesty, field-truth, and answer-layering flags. Canonical commands: ```powershell python scripts/stage_agent_loop.py plan --manifest docs/orchestration/.json python scripts/stage_agent_loop.py run --manifest docs/orchestration/.json python scripts/stage_agent_loop.py review-questions --manifest docs/orchestration/.json python scripts/stage_agent_loop.py ingest-gui-run --manifest docs/orchestration/.json --run-id assistant-stage1- python scripts/stage_agent_loop.py prepare-repair --manifest docs/orchestration/.json python scripts/stage_agent_loop.py run-repair --manifest docs/orchestration/.json --dry-run python scripts/stage_agent_loop.py status --manifest docs/orchestration/.json python scripts/stage_agent_loop.py continue --manifest docs/orchestration/.json python scripts/stage_agent_loop.py summarize --manifest docs/orchestration/.json ``` This is the intended path for "implement the stage, generate/check stage questions, analyze business answers, patch code, rerun, then ask the user for final visual confirmation". The default repair mode is `lead-handoff`. In this mode the expensive replay still runs live and the independent analyst still produces the strict business verdict, but code repair stays with the main Lead Codex context. The loop stops with `next_action = lead_coder_repair_required`, plus: - `business_audit.md` for the user-facing semantic/business verdict; - `lead_coder_handoff.md/json` for the concrete repair target, candidate files, and validation path; - `stage_context_capsule.md/json` for the current stage contract, question quality, loop status, and operating model. `auto-coder` remains available only as an explicit opt-in experiment: ```powershell python scripts/stage_agent_loop.py run --manifest docs/orchestration/.json --repair-mode auto-coder ``` That path must not be treated as the normal high-trust repair mode for this project. Before launching an expensive live replay, run `review-questions`. It reads the stage pack, resolves `{{bindings.*}}` placeholders, checks scenario/follow-up density, direct-answer shape declarations, domain coverage, stale-scope canaries, dependency order, duplicates, mojibake in generated Russian questions, and estimated Windows artifact path length. It writes: - `question_generation_review.json`; - `question_generation_review.md`. A strong question review is not semantic proof that the assistant answers correctly. It is the pre-flight gate that says the generated questions are worth spending a live replay on. ## GUI run review bridge When a manual or GUI autorun already exists, `scripts/review_assistant_stage1_run.py` turns the run id into the same machine-readable review surface. Canonical command: ```powershell python scripts/review_assistant_stage1_run.py assistant-stage1- --print-summary ``` The script resolves: - `llm_normalizer/reports/assistant-stage1-.md`; - `llm_normalizer/data/assistant_sessions/assistant-stage1--*.json`. It writes: - `artifacts/domain_runs/gui_run_reviews/assistant-stage1-/run_review.json`; - `artifacts/domain_runs/gui_run_reviews/assistant-stage1-/run_review.md`; - `conversation_pairs.json`; - `question_quality_review.json`; - `repair_targets.json`. This bridge is intentionally business-first: - the user's question and visible assistant answer are reviewed before route ids and debug fields; - noisy direct answers, missing first-line answers, technical garbage, and over-broad business answers become findings; - generated question packs get a deterministic quality review for follow-up density, direct questions, report-style analysis, domain diversity, duplicates, and weak business anchors. Use this bridge when the operator would otherwise say "чекни прогон `assistant-stage1-...`". The expected next step is no longer manual eyeballing first; it is: review by id, inspect `run_review.md`, map `repair_targets.json` into the current stage loop, patch, and rerun. For stage work, prefer the integrated command: ```powershell python scripts/stage_agent_loop.py ingest-gui-run --manifest docs/orchestration/.json --run-id assistant-stage1- ``` It stores the GUI review under `artifacts/domain_runs/stage_agent_loops//gui_run_reviews//`, updates `stage_loop_summary.json`, and writes the next stage action: - `continue_repair_from_gui_review_p0` when the GUI run exposes business-wrong or missing direct-answer defects; - `continue_repair_from_gui_review_p1` when the run is semantically usable but still noisy, over-broad, or poorly layered; - `manual_gui_confirmation_or_stage_close` when the GUI run is clean enough for final human confirmation. `stage_loop_summary.json` also includes `next_step_guidance.command_templates`, so the next operator or agent pass can continue from machine-readable commands instead of re-inferring the workflow from prose. Use `python scripts/stage_agent_loop.py status --manifest docs/orchestration/.json` as the cheap read-only checkpoint before continuing a stage. It prints the current next action, closing gate, latest GUI run, latest repair coder status, latest repair validation status, and cold-start continuation artifacts such as `domain_pack_loop.command.txt` without modifying artifacts. Use `python scripts/stage_agent_loop.py continue --manifest docs/orchestration/.json` as the safe one-command continuation layer. From a cold start it materializes `domain_pack_loop.command.txt` without launching the long live loop; after a GUI review it can prepare a repair iteration and materialize `run-repair --dry-run` automatically; it will not run the real coder pass unless `--execute-repair` is passed, and it waits for a `--run-id assistant-stage1-` when the next required step is post-repair rerun/ingest validation. It also writes `stage_repair_handoff.md/json` next to the stage summary. That handoff is the preferred input for the next coder pass: it lists primary repair targets and sample user-facing failures without forcing the coder to reread the entire GUI conversation first. For live stage-pack failures, prefer `lead_coder_handoff.md` over immediately preparing a coder pass. The intent is: strong business audit first, Lead Codex code repair second, same replay/GUI validation third. To prepare the next repair iteration from that handoff, run: ```powershell python scripts/stage_agent_loop.py prepare-repair --manifest docs/orchestration/.json ``` This writes `repair_iterations//repair_iteration_plan.json`, `repair_prompt.md`, and `repair_checklist.md`. The plan enriches GUI repair targets with candidate runtime files and rerun instructions, so the next coder pass can start from a bounded business defect instead of a full transcript archaeology dig. To materialize or execute the coder command for that repair iteration, run: ```powershell python scripts/stage_agent_loop.py run-repair --manifest docs/orchestration/.json --dry-run ``` `--dry-run` writes `repair_coder.command.txt`, records `repair_execution_summary.json`, updates `stage_loop_summary.json`, and prints the exact non-interactive Codex command without changing code. Without `--dry-run`, it executes the coder command with the prepared `repair_prompt.md`, writes `repair_coder_result.json`, captures stdout/stderr, records `repair_execution_summary.json`, and updates the stage next action to rerun/ingest, inspect, or stop for a decision depending on the coder status. After a real coder patch, rerun the same semantic pack or GUI session and ingest the new `assistant-stage1-`. When the coder result is `patched`, the next `ingest-gui-run` is treated as post-repair validation for that repair iteration. `stage_loop_summary.json` records `latest_repair_validation` and `repair_validation_history`, including the validation run id, remaining P0/P1 findings, and whether the repair was actually accepted after replay. A patch without this rerun/ingest evidence is not a closed stage. The stage closing gate enforces that rule even when the inner pack loop reports `accepted`: `loop_accepted_gate` preserves the raw loop verdict, but stage-level `accepted_gate` stays `false` with `stage_closing_gate.status = blocked_pending_repair_validation` until the latest patched repair has a matching successful validation run. ## Placeholder contract Scenario questions can reference earlier step outputs with placeholders such as: - `{{step_01_inventory.entries[0].item}}` - `{{semantic_memory.active_result_set_id}}` This keeps carryover explicit and machine-readable. ## Status contract Scenario capture uses four operational statuses: - `accepted` - `partial` - `blocked` - `needs_exact_capability` `partial` means the scenario executed, but one or more steps still need route hardening, evidence hardening, or presentation hardening. `needs_exact_capability` means the scenario is valid for the project, but the current contour still lacks the exact route or capability needed to answer it. In autonomous pack-loop mode, `partial` and `needs_exact_capability` are non-terminal by default. The loop should continue domain enablement work until one of these happens: - analyst quality reaches the configured acceptance gate, normally `>= 80`; - the analyst marks `requires_user_decision = true` because the next step would otherwise require guessing a missing required observation, making an architecture-risky change, accepting a hacky/brittle workaround, or choosing a business-critical tradeoff without enough evidence; - the runtime is truly blocked; - the loop reaches `max_iterations`.