8.4 KiB
Domain Scenario Loop - Repo Adapter
Purpose
This repository now supports two outer-loop capture modes:
run-casefor one concrete domain question;run-scenariofor a linked multi-step domain chain that should reuse one assistant session.run-packfor a whole domain question pool grouped into several scenarios.run-pack-loopfor an autonomous analyst/coder loop over a whole domain pack.
run-scenario is the preferred capture mode for domains where the user's next question depends on the previous result set.
run-pack is the preferred capture mode when the user brings a full domain pool that should be kept in one aggregate backlog.
Runtime contract
The scenario runner does not introduce a new product runtime.
It reuses:
POST /api/assistant/messageGET /api/assistant/session/:session_id- current backend LLM/profile configuration
- current address/deep routing inside the product
Artifact contract
Scenario artifacts live under:
artifacts/domain_runs/<scenario_id>/
Top-level artifacts:
scenario_brief.mdscenario_manifest.jsonscenario_state.jsonscenario_summary.mdscenario_output.mdfinal_status.md
Per-step artifacts:
steps/<step_id>/output.mdsteps/<step_id>/debug.jsonsteps/<step_id>/turn.jsonsteps/<step_id>/session.jsonsteps/<step_id>/assistant_response.jsonsteps/<step_id>/step_state.json
Pack artifacts live under:
artifacts/domain_runs/<pack_id>/
pack_manifest.jsonpack_state.jsonpack_summary.mdfinal_status.mdscenarios/<scenario_id>/...
AGENT autorun save gate
scripts/save_agent_semantic_run.py is a post-validation persistence tool, not a replay executor.
The normal path is:
- build/update the truth-harness spec;
- run
python scripts/domain_truth_harness.py run-live --spec ... --output-dir artifacts/domain_runs/<run_id>; - inspect
truth_review.md,business_review.md,pack_state.json, andfinal_status.md; - save to GUI autoruns only with
python scripts/save_agent_semantic_run.py --spec ... --validated-run-dir artifacts/domain_runs/<run_id>.
The save gate requires:
pack_state.final_status = accepted;pack_state.acceptance_gate_passed = true;truth_review.summary.overall_status = pass;business_review.overall_business_status = pass;- zero unresolved P0 and zero business-answer failures.
If a pack must be saved as a deliberate manual draft before live acceptance, use
--allow-unvalidated --unvalidated-reason "<why this is intentionally not accepted>".
That path is explicitly marked as unvalidated and must not be treated as semantic proof.
Stage-level AGENT loop
scripts/stage_agent_loop.py wraps the domain pack loop into the development-stage workflow:
- take the current global/local stage manifest;
- run
scripts/domain_case_loop.py run-pack-loopfor that stage pack; - let the loop iterate through pack replay, business-first analyst verdict, coder patch, and rerun until the objective gate is accepted, blocked, or a real user decision is required;
- if accepted, persist the validated AGENT pack into GUI autoruns through
scripts/save_agent_semantic_run.py --validated-run-dir; - write
stage_loop_summary.jsonandstage_loop_handoff.mdfor the final human visual confirmation.
The stage manifest schema is docs/orchestration/schemas/stage_agent_loop_manifest.schema.json.
The default stage gate is intentionally stricter than a narrow case gate: target_score = 88, no unresolved P0/P1 repair targets, accepted analyst verdict, clean business usefulness, direct-answer, temporal-honesty, field-truth, and answer-layering flags.
Canonical commands:
python scripts/stage_agent_loop.py plan --manifest docs/orchestration/<stage_loop>.json
python scripts/stage_agent_loop.py run --manifest docs/orchestration/<stage_loop>.json
python scripts/stage_agent_loop.py ingest-gui-run --manifest docs/orchestration/<stage_loop>.json --run-id assistant-stage1-<id>
python scripts/stage_agent_loop.py prepare-repair --manifest docs/orchestration/<stage_loop>.json
python scripts/stage_agent_loop.py summarize --manifest docs/orchestration/<stage_loop>.json
This is the intended path for “implement the stage, generate/check stage questions, analyze business answers, patch code, rerun, then ask the user for final visual confirmation”.
GUI run review bridge
When a manual or GUI autorun already exists, scripts/review_assistant_stage1_run.py turns the run id into the same machine-readable review surface.
Canonical command:
python scripts/review_assistant_stage1_run.py assistant-stage1-<id> --print-summary
The script resolves:
llm_normalizer/reports/assistant-stage1-<id>.md;llm_normalizer/data/assistant_sessions/assistant-stage1-<id>-*.json.
It writes:
artifacts/domain_runs/gui_run_reviews/assistant-stage1-<id>/run_review.json;artifacts/domain_runs/gui_run_reviews/assistant-stage1-<id>/run_review.md;conversation_pairs.json;question_quality_review.json;repair_targets.json.
This bridge is intentionally business-first:
- the user's question and visible assistant answer are reviewed before route ids and debug fields;
- noisy direct answers, missing first-line answers, technical garbage, and over-broad business answers become findings;
- generated question packs get a deterministic quality review for follow-up density, direct questions, report-style analysis, domain diversity, duplicates, and weak business anchors.
Use this bridge when the operator would otherwise say “чекни прогон assistant-stage1-...”. The expected next step is no longer manual eyeballing first; it is: review by id, inspect run_review.md, map repair_targets.json into the current stage loop, patch, and rerun.
For stage work, prefer the integrated command:
python scripts/stage_agent_loop.py ingest-gui-run --manifest docs/orchestration/<stage_loop>.json --run-id assistant-stage1-<id>
It stores the GUI review under artifacts/domain_runs/stage_agent_loops/<stage_id>/gui_run_reviews/<run_id>/, updates stage_loop_summary.json, and writes the next stage action:
continue_repair_from_gui_review_p0when the GUI run exposes business-wrong or missing direct-answer defects;continue_repair_from_gui_review_p1when the run is semantically usable but still noisy, over-broad, or poorly layered;manual_gui_confirmation_or_stage_closewhen the GUI run is clean enough for final human confirmation.
It also writes stage_repair_handoff.md/json next to the stage summary. That handoff is the preferred input for the next coder pass: it lists primary repair targets and sample user-facing failures without forcing the coder to reread the entire GUI conversation first.
To prepare the next repair iteration from that handoff, run:
python scripts/stage_agent_loop.py prepare-repair --manifest docs/orchestration/<stage_loop>.json
This writes repair_iterations/<iteration_id>/repair_iteration_plan.json, repair_prompt.md, and repair_checklist.md. The plan enriches GUI repair targets with candidate runtime files and rerun instructions, so the next coder pass can start from a bounded business defect instead of a full transcript archaeology dig.
Placeholder contract
Scenario questions can reference earlier step outputs with placeholders such as:
{{step_01_inventory.entries[0].item}}{{semantic_memory.active_result_set_id}}
This keeps carryover explicit and machine-readable.
Status contract
Scenario capture uses four operational statuses:
acceptedpartialblockedneeds_exact_capability
partial means the scenario executed, but one or more steps still need route hardening, evidence hardening, or presentation hardening.
needs_exact_capability means the scenario is valid for the project, but the current contour still lacks the exact route or capability needed to answer it.
In autonomous pack-loop mode, partial and needs_exact_capability are non-terminal by default. The loop should continue domain enablement work until one of these happens:
- analyst quality reaches the configured acceptance gate, normally
>= 80; - the analyst marks
requires_user_decision = truebecause the next step would otherwise require guessing a missing required observation, making an architecture-risky change, accepting a hacky/brittle workaround, or choosing a business-critical tradeoff without enough evidence; - the runtime is truly blocked;
- the loop reaches
max_iterations.