# Domain Scenario Loop - Repo Adapter

## Purpose

This repository now supports two outer-loop capture modes:
- `run-case` for one concrete domain question;
- `run-scenario` for a linked multi-step domain chain that should reuse one assistant session.
- `run-pack` for a whole domain question pool grouped into several scenarios.
- `run-pack-loop` for an autonomous analyst/coder loop over a whole domain pack.

`run-scenario` is the preferred capture mode for domains where the user's next question depends on the previous result set.
`run-pack` is the preferred capture mode when the user brings a full domain pool that should be kept in one aggregate backlog.

## Runtime contract

The scenario runner does not introduce a new product runtime.

It reuses:
- `POST /api/assistant/message`
- `GET /api/assistant/session/:session_id`
- current backend LLM/profile configuration
- current address/deep routing inside the product

## Artifact contract

Scenario artifacts live under:

`artifacts/domain_runs/<scenario_id>/`

Top-level artifacts:
- `scenario_brief.md`
- `scenario_manifest.json`
- `scenario_state.json`
- `scenario_summary.md`
- `scenario_output.md`
- `final_status.md`

Per-step artifacts:
- `steps/<step_id>/output.md`
- `steps/<step_id>/debug.json`
- `steps/<step_id>/turn.json`
- `steps/<step_id>/session.json`
- `steps/<step_id>/assistant_response.json`
- `steps/<step_id>/step_state.json`

Pack artifacts live under:

`artifacts/domain_runs/<pack_id>/`

- `pack_manifest.json`
- `pack_state.json`
- `pack_summary.md`
- `final_status.md`
- `scenarios/<scenario_id>/...`

## AGENT autorun save gate

`scripts/save_agent_semantic_run.py` is a post-validation persistence tool, not a replay executor.
The normal path is:

1. build/update the truth-harness spec;
2. run `python scripts/domain_truth_harness.py run-live --spec ... --output-dir artifacts/domain_runs/<run_id>`;
3. inspect `truth_review.md`, `business_review.md`, `pack_state.json`, and `final_status.md`;
4. save to GUI autoruns only with `python scripts/save_agent_semantic_run.py --spec ... --validated-run-dir artifacts/domain_runs/<run_id>`.

The save gate requires:
- `pack_state.final_status = accepted`;
- `pack_state.acceptance_gate_passed = true`;
- `truth_review.summary.overall_status = pass`;
- `business_review.overall_business_status = pass`;
- zero unresolved P0 and zero business-answer failures.

If a pack must be saved as a deliberate manual draft before live acceptance, use
`--allow-unvalidated --unvalidated-reason "<why this is intentionally not accepted>"`.
That path is explicitly marked as unvalidated and must not be treated as semantic proof.

## Stage-level AGENT loop

`scripts/stage_agent_loop.py` wraps the domain pack loop into the development-stage workflow:

1. take the current global/local stage manifest;
2. run `scripts/domain_case_loop.py run-pack-loop` for that stage pack;
3. let the loop iterate through pack replay, business-first analyst verdict, coder patch, and rerun until the objective gate is accepted, blocked, or a real user decision is required;
4. if accepted, persist the validated AGENT pack into GUI autoruns through `scripts/save_agent_semantic_run.py --validated-run-dir`;
5. write `stage_loop_summary.json` and `stage_loop_handoff.md` for the final human visual confirmation.

The stage manifest schema is `docs/orchestration/schemas/stage_agent_loop_manifest.schema.json`.
The default stage gate is intentionally stricter than a narrow case gate: `target_score = 88`, no unresolved P0/P1 repair targets, accepted analyst verdict, clean business usefulness, direct-answer, temporal-honesty, field-truth, and answer-layering flags.

Canonical commands:

```powershell
python scripts/stage_agent_loop.py plan --manifest docs/orchestration/<stage_loop>.json
python scripts/stage_agent_loop.py run --manifest docs/orchestration/<stage_loop>.json
python scripts/stage_agent_loop.py ingest-gui-run --manifest docs/orchestration/<stage_loop>.json --run-id assistant-stage1-<id>
python scripts/stage_agent_loop.py prepare-repair --manifest docs/orchestration/<stage_loop>.json
python scripts/stage_agent_loop.py run-repair --manifest docs/orchestration/<stage_loop>.json --dry-run
python scripts/stage_agent_loop.py status --manifest docs/orchestration/<stage_loop>.json
python scripts/stage_agent_loop.py summarize --manifest docs/orchestration/<stage_loop>.json
```

This is the intended path for “implement the stage, generate/check stage questions, analyze business answers, patch code, rerun, then ask the user for final visual confirmation”.

## GUI run review bridge

When a manual or GUI autorun already exists, `scripts/review_assistant_stage1_run.py` turns the run id into the same machine-readable review surface.

Canonical command:

```powershell
python scripts/review_assistant_stage1_run.py assistant-stage1-<id> --print-summary
```

The script resolves:
- `llm_normalizer/reports/assistant-stage1-<id>.md`;
- `llm_normalizer/data/assistant_sessions/assistant-stage1-<id>-*.json`.

It writes:
- `artifacts/domain_runs/gui_run_reviews/assistant-stage1-<id>/run_review.json`;
- `artifacts/domain_runs/gui_run_reviews/assistant-stage1-<id>/run_review.md`;
- `conversation_pairs.json`;
- `question_quality_review.json`;
- `repair_targets.json`.

This bridge is intentionally business-first:
- the user's question and visible assistant answer are reviewed before route ids and debug fields;
- noisy direct answers, missing first-line answers, technical garbage, and over-broad business answers become findings;
- generated question packs get a deterministic quality review for follow-up density, direct questions, report-style analysis, domain diversity, duplicates, and weak business anchors.

Use this bridge when the operator would otherwise say “чекни прогон `assistant-stage1-...`”. The expected next step is no longer manual eyeballing first; it is: review by id, inspect `run_review.md`, map `repair_targets.json` into the current stage loop, patch, and rerun.

For stage work, prefer the integrated command:

```powershell
python scripts/stage_agent_loop.py ingest-gui-run --manifest docs/orchestration/<stage_loop>.json --run-id assistant-stage1-<id>
```

It stores the GUI review under `artifacts/domain_runs/stage_agent_loops/<stage_id>/gui_run_reviews/<run_id>/`, updates `stage_loop_summary.json`, and writes the next stage action:
- `continue_repair_from_gui_review_p0` when the GUI run exposes business-wrong or missing direct-answer defects;
- `continue_repair_from_gui_review_p1` when the run is semantically usable but still noisy, over-broad, or poorly layered;
- `manual_gui_confirmation_or_stage_close` when the GUI run is clean enough for final human confirmation.

`stage_loop_summary.json` also includes `next_step_guidance.command_templates`, so the next operator or agent pass can continue from machine-readable commands instead of re-inferring the workflow from prose.

Use `python scripts/stage_agent_loop.py status --manifest docs/orchestration/<stage_loop>.json` as the cheap read-only checkpoint before continuing a stage. It prints the current next action, closing gate, latest GUI run, latest repair coder status, and latest repair validation status without modifying artifacts.

It also writes `stage_repair_handoff.md/json` next to the stage summary. That handoff is the preferred input for the next coder pass: it lists primary repair targets and sample user-facing failures without forcing the coder to reread the entire GUI conversation first.

To prepare the next repair iteration from that handoff, run:

```powershell
python scripts/stage_agent_loop.py prepare-repair --manifest docs/orchestration/<stage_loop>.json
```

This writes `repair_iterations/<iteration_id>/repair_iteration_plan.json`, `repair_prompt.md`, and `repair_checklist.md`. The plan enriches GUI repair targets with candidate runtime files and rerun instructions, so the next coder pass can start from a bounded business defect instead of a full transcript archaeology dig.

To materialize or execute the coder command for that repair iteration, run:

```powershell
python scripts/stage_agent_loop.py run-repair --manifest docs/orchestration/<stage_loop>.json --dry-run
```

`--dry-run` writes `repair_coder.command.txt`, records `repair_execution_summary.json`, updates `stage_loop_summary.json`, and prints the exact non-interactive Codex command without changing code. Without `--dry-run`, it executes the coder command with the prepared `repair_prompt.md`, writes `repair_coder_result.json`, captures stdout/stderr, records `repair_execution_summary.json`, and updates the stage next action to rerun/ingest, inspect, or stop for a decision depending on the coder status. After a real coder patch, rerun the same semantic pack or GUI session and ingest the new `assistant-stage1-<id>`.

When the coder result is `patched`, the next `ingest-gui-run` is treated as post-repair validation for that repair iteration. `stage_loop_summary.json` records `latest_repair_validation` and `repair_validation_history`, including the validation run id, remaining P0/P1 findings, and whether the repair was actually accepted after replay. A patch without this rerun/ingest evidence is not a closed stage.

The stage closing gate enforces that rule even when the inner pack loop reports `accepted`: `loop_accepted_gate` preserves the raw loop verdict, but stage-level `accepted_gate` stays `false` with `stage_closing_gate.status = blocked_pending_repair_validation` until the latest patched repair has a matching successful validation run.

## Placeholder contract

Scenario questions can reference earlier step outputs with placeholders such as:

- `{{step_01_inventory.entries[0].item}}`
- `{{semantic_memory.active_result_set_id}}`

This keeps carryover explicit and machine-readable.

## Status contract

Scenario capture uses four operational statuses:
- `accepted`
- `partial`
- `blocked`
- `needs_exact_capability`

`partial` means the scenario executed, but one or more steps still need route hardening, evidence hardening, or presentation hardening.
`needs_exact_capability` means the scenario is valid for the project, but the current contour still lacks the exact route or capability needed to answer it.

In autonomous pack-loop mode, `partial` and `needs_exact_capability` are non-terminal by default. The loop should continue domain enablement work until one of these happens:
- analyst quality reaches the configured acceptance gate, normally `>= 80`;
- the analyst marks `requires_user_decision = true` because the next step would otherwise require guessing a missing required observation, making an architecture-risky change, accepting a hacky/brittle workaround, or choosing a business-critical tradeoff without enough evidence;
- the runtime is truly blocked;
- the loop reaches `max_iterations`.