# Domain Scenario Loop - Repo Adapter ## Purpose This repository now supports two outer-loop capture modes: - `run-case` for one concrete domain question; - `run-scenario` for a linked multi-step domain chain that should reuse one assistant session. - `run-pack` for a whole domain question pool grouped into several scenarios. - `run-pack-loop` for an autonomous analyst/coder loop over a whole domain pack. `run-scenario` is the preferred capture mode for domains where the user's next question depends on the previous result set. `run-pack` is the preferred capture mode when the user brings a full domain pool that should be kept in one aggregate backlog. ## Runtime contract The scenario runner does not introduce a new product runtime. It reuses: - `POST /api/assistant/message` - `GET /api/assistant/session/:session_id` - current backend LLM/profile configuration - current address/deep routing inside the product ## Artifact contract Scenario artifacts live under: `artifacts/domain_runs//` Top-level artifacts: - `scenario_brief.md` - `scenario_manifest.json` - `scenario_state.json` - `scenario_summary.md` - `scenario_output.md` - `final_status.md` Per-step artifacts: - `steps//output.md` - `steps//debug.json` - `steps//turn.json` - `steps//session.json` - `steps//assistant_response.json` - `steps//step_state.json` Pack artifacts live under: `artifacts/domain_runs//` - `pack_manifest.json` - `pack_state.json` - `pack_summary.md` - `final_status.md` - `scenarios//...` ## AGENT autorun save gate `scripts/save_agent_semantic_run.py` is a post-validation persistence tool, not a replay executor. The normal path is: 1. build/update the truth-harness spec; 2. run `python scripts/domain_truth_harness.py run-live --spec ... --output-dir artifacts/domain_runs/`; 3. inspect `truth_review.md`, `business_review.md`, `pack_state.json`, and `final_status.md`; 4. save to GUI autoruns only with `python scripts/save_agent_semantic_run.py --spec ... --validated-run-dir artifacts/domain_runs/`. The save gate requires: - `pack_state.final_status = accepted`; - `pack_state.acceptance_gate_passed = true`; - `truth_review.summary.overall_status = pass`; - `business_review.overall_business_status = pass`; - zero unresolved P0 and zero business-answer failures. If a pack must be saved as a deliberate manual draft before live acceptance, use `--allow-unvalidated --unvalidated-reason ""`. That path is explicitly marked as unvalidated and must not be treated as semantic proof. ## Stage-level AGENT loop `scripts/stage_agent_loop.py` wraps the domain pack loop into the development-stage workflow: 1. take the current global/local stage manifest; 2. run `scripts/domain_case_loop.py run-pack-loop` for that stage pack; 3. let the loop iterate through pack replay, business-first analyst verdict, coder patch, and rerun until the objective gate is accepted, blocked, or a real user decision is required; 4. if accepted, persist the validated AGENT pack into GUI autoruns through `scripts/save_agent_semantic_run.py --validated-run-dir`; 5. write `stage_loop_summary.json` and `stage_loop_handoff.md` for the final human visual confirmation. The stage manifest schema is `docs/orchestration/schemas/stage_agent_loop_manifest.schema.json`. The default stage gate is intentionally stricter than a narrow case gate: `target_score = 88`, no unresolved P0/P1 repair targets, accepted analyst verdict, clean business usefulness, direct-answer, temporal-honesty, field-truth, and answer-layering flags. Canonical commands: ```powershell python scripts/stage_agent_loop.py plan --manifest docs/orchestration/.json python scripts/stage_agent_loop.py run --manifest docs/orchestration/.json python scripts/stage_agent_loop.py ingest-gui-run --manifest docs/orchestration/.json --run-id assistant-stage1- python scripts/stage_agent_loop.py prepare-repair --manifest docs/orchestration/.json python scripts/stage_agent_loop.py run-repair --manifest docs/orchestration/.json --dry-run python scripts/stage_agent_loop.py status --manifest docs/orchestration/.json python scripts/stage_agent_loop.py summarize --manifest docs/orchestration/.json ``` This is the intended path for “implement the stage, generate/check stage questions, analyze business answers, patch code, rerun, then ask the user for final visual confirmation”. ## GUI run review bridge When a manual or GUI autorun already exists, `scripts/review_assistant_stage1_run.py` turns the run id into the same machine-readable review surface. Canonical command: ```powershell python scripts/review_assistant_stage1_run.py assistant-stage1- --print-summary ``` The script resolves: - `llm_normalizer/reports/assistant-stage1-.md`; - `llm_normalizer/data/assistant_sessions/assistant-stage1--*.json`. It writes: - `artifacts/domain_runs/gui_run_reviews/assistant-stage1-/run_review.json`; - `artifacts/domain_runs/gui_run_reviews/assistant-stage1-/run_review.md`; - `conversation_pairs.json`; - `question_quality_review.json`; - `repair_targets.json`. This bridge is intentionally business-first: - the user's question and visible assistant answer are reviewed before route ids and debug fields; - noisy direct answers, missing first-line answers, technical garbage, and over-broad business answers become findings; - generated question packs get a deterministic quality review for follow-up density, direct questions, report-style analysis, domain diversity, duplicates, and weak business anchors. Use this bridge when the operator would otherwise say “чекни прогон `assistant-stage1-...`”. The expected next step is no longer manual eyeballing first; it is: review by id, inspect `run_review.md`, map `repair_targets.json` into the current stage loop, patch, and rerun. For stage work, prefer the integrated command: ```powershell python scripts/stage_agent_loop.py ingest-gui-run --manifest docs/orchestration/.json --run-id assistant-stage1- ``` It stores the GUI review under `artifacts/domain_runs/stage_agent_loops//gui_run_reviews//`, updates `stage_loop_summary.json`, and writes the next stage action: - `continue_repair_from_gui_review_p0` when the GUI run exposes business-wrong or missing direct-answer defects; - `continue_repair_from_gui_review_p1` when the run is semantically usable but still noisy, over-broad, or poorly layered; - `manual_gui_confirmation_or_stage_close` when the GUI run is clean enough for final human confirmation. `stage_loop_summary.json` also includes `next_step_guidance.command_templates`, so the next operator or agent pass can continue from machine-readable commands instead of re-inferring the workflow from prose. Use `python scripts/stage_agent_loop.py status --manifest docs/orchestration/.json` as the cheap read-only checkpoint before continuing a stage. It prints the current next action, closing gate, latest GUI run, latest repair coder status, and latest repair validation status without modifying artifacts. It also writes `stage_repair_handoff.md/json` next to the stage summary. That handoff is the preferred input for the next coder pass: it lists primary repair targets and sample user-facing failures without forcing the coder to reread the entire GUI conversation first. To prepare the next repair iteration from that handoff, run: ```powershell python scripts/stage_agent_loop.py prepare-repair --manifest docs/orchestration/.json ``` This writes `repair_iterations//repair_iteration_plan.json`, `repair_prompt.md`, and `repair_checklist.md`. The plan enriches GUI repair targets with candidate runtime files and rerun instructions, so the next coder pass can start from a bounded business defect instead of a full transcript archaeology dig. To materialize or execute the coder command for that repair iteration, run: ```powershell python scripts/stage_agent_loop.py run-repair --manifest docs/orchestration/.json --dry-run ``` `--dry-run` writes `repair_coder.command.txt`, records `repair_execution_summary.json`, updates `stage_loop_summary.json`, and prints the exact non-interactive Codex command without changing code. Without `--dry-run`, it executes the coder command with the prepared `repair_prompt.md`, writes `repair_coder_result.json`, captures stdout/stderr, records `repair_execution_summary.json`, and updates the stage next action to rerun/ingest, inspect, or stop for a decision depending on the coder status. After a real coder patch, rerun the same semantic pack or GUI session and ingest the new `assistant-stage1-`. When the coder result is `patched`, the next `ingest-gui-run` is treated as post-repair validation for that repair iteration. `stage_loop_summary.json` records `latest_repair_validation` and `repair_validation_history`, including the validation run id, remaining P0/P1 findings, and whether the repair was actually accepted after replay. A patch without this rerun/ingest evidence is not a closed stage. The stage closing gate enforces that rule even when the inner pack loop reports `accepted`: `loop_accepted_gate` preserves the raw loop verdict, but stage-level `accepted_gate` stays `false` with `stage_closing_gate.status = blocked_pending_repair_validation` until the latest patched repair has a matching successful validation run. ## Placeholder contract Scenario questions can reference earlier step outputs with placeholders such as: - `{{step_01_inventory.entries[0].item}}` - `{{semantic_memory.active_result_set_id}}` This keeps carryover explicit and machine-readable. ## Status contract Scenario capture uses four operational statuses: - `accepted` - `partial` - `blocked` - `needs_exact_capability` `partial` means the scenario executed, but one or more steps still need route hardening, evidence hardening, or presentation hardening. `needs_exact_capability` means the scenario is valid for the project, but the current contour still lacks the exact route or capability needed to answer it. In autonomous pack-loop mode, `partial` and `needs_exact_capability` are non-terminal by default. The loop should continue domain enablement work until one of these happens: - analyst quality reaches the configured acceptance gate, normally `>= 80`; - the analyst marks `requires_user_decision = true` because the next step would otherwise require guessing a missing required observation, making an architecture-risky change, accepting a hacky/brittle workaround, or choosing a business-critical tradeoff without enough evidence; - the runtime is truly blocked; - the loop reaches `max_iterations`.