Актуализировать документы по агентному циклу

This commit is contained in:
dctouch 2026-05-10 08:45:37 +03:00
parent 3be06b5f93
commit b625f9af5b
5 changed files with 206 additions and 11 deletions

View File

@ -23,6 +23,27 @@ From this point forward:
For the current execution spine, read `23 - current_execution_spine_and_semantic_control_gate_2026-05-05.md`.
## 2026-05-10 Overlay - Agentic Loop And Autorun Hygiene
The next active operating layer is now the repo-native `Agentic Semantic Development Loop`, not another isolated route patch.
Current interpretation:
- the Open-World Semantic Control Gate remains the semantic pressure surface;
- the stage-loop is the development operating system around that surface: generate/review/replay/audit/repair/rerun, then save accepted AGENT autoruns only after reviewed acceptance;
- Lead Codex remains the repair brain, while the loop produces strong business-audit artifacts and lead-coder handoff instead of relying on a weak autonomous coder;
- the first dogfood loop artifact is accepted at `artifacts/domain_runs/stage_agent_loops/agentic_semantic_development_loop/domain_loops/asl/final_status.md`;
- manual GUI confirmation remains required after accepted replay artifacts;
- autorun/runtime Cyrillic hygiene is now part of the acceptance surface, because broken saved-session text can invalidate the GUI review even when the backend route is correct.
Fresh validation cut:
- commit `3be06b5 Починить восстановление кириллицы в автопрогонах`;
- targeted mojibake/autorun/runtime tests passed `20/20`;
- targeted organization-clarification carryover tests passed `2/2`;
- `npm.cmd run build` passed;
- graphify rebuilt to `6371` nodes, `14048` edges, `141` communities.
## Current Module Map
- `Post-F Semantic Integrity Hardening`: `99%`, operationally closed as a hardening slice and now used as a regression gate.
@ -55,9 +76,11 @@ For the current execution spine, read `23 - current_execution_spine_and_semantic
- Completed active slice: `Business Overview Counterparty/Contract Profile Bridge`: business overview now executes reviewed `counterparty_population_and_roles` and `contract_usage_overview` recipes, surfacing active counterparty role split and contract usage without claiming CRM quality, counterparty due diligence, legal completeness, or contract-risk.
- Completed active slice: `Business Overview Missing Proof Ledger`: business overview now records machine-readable hard proof gaps for accounting profit/margin, due-date debt aging, inventory reserve/liquidation quality, and vendor/procurement quality, distinguishing proxy-only evidence from reviewed routes that are not wired yet.
- Completed semantic-control slice: `W5/W7 Counterparty Value-Flow And Money-Breakdown Integrity`: bank-document/value-flow recipes now materialize explicit counterparty predicates, zero-row supplier-payment checks answer as checked negative evidence, compound money-breakdown wording stays in `business_overview`, and MCP discovery receives active organization scope only when the current turn has no explicit organization.
- Completed operating-system slice: `Agentic Semantic Development Loop Dogfood Gate`: stage manifest, stage pack, stage-loop wrapper, review/status/continue safety, lead-coder handoff, and save-after-acceptance gating are wired and accepted by the `asl` dogfood loop artifact.
- Completed hygiene slice: `Autorun Cyrillic C1 Repair`: old autorun cards/questions/runtime materialization now repair C1-control mojibake before UI or assistant-lane use, including the historical `БОЛЬШОЙ ОБЩИЙ` / `АЛЬТЕРНАТИВА` failure class.
- Implementation breadth: `~99% (Open-World Bounded Autonomy Breadth through Slice 25)`.
- Next active slice: `Open-World Semantic Control Gate`, covering garbage-anchor protection, business-overview continuation, intent dominance, frame hygiene, counterparty/organization arbitration, and final-summary answer shape.
- Active module progress: `~99% (Open-World Bounded Autonomy Breadth, active slice: Semantic Control Gate)`.
- Next active slice: continue `Agentic Semantic Development Loop` dogfood over real Open-World/Semantic-Control packs and confirm the latest autorun hygiene fix in the GUI.
- Active module progress: `~99% (Agentic Semantic Development Loop, accepted dogfood loop + autorun hygiene; manual GUI confirmation still required)`.
## Reporting Rule
@ -66,6 +89,7 @@ Use these labels when reporting progress:
- `Прогресс модуля: 99% (Post-F Semantic Integrity Hardening, operationally closed/regression gate)` when discussing the Post-F slice itself.
- `Прогресс модуля: 100% (Planner Autonomy Consolidation, declared phase83 slice closed)` when discussing the planner-autonomy slice that was just completed.
- `Прогресс модуля: 99% (Open-World Bounded Autonomy Breadth, active slice: Semantic Control Gate)` while discussing current module closure after the EHMO-derived critical subset accepted live again with W5/W7 hardening.
- `Прогресс модуля: 99% (Agentic Semantic Development Loop, accepted dogfood loop + autorun hygiene; manual GUI confirmation still required)` when discussing the current development-loop operating layer.
- `Open-World Business Overview implementation breadth: ~99%, Semantic Control Gate critical subset accepted, fat GUI pack still pending` when discussing only the already wired Slice 25 breadth.
- `Прогресс модуля: X% (Open-World Bounded Autonomy Breadth, active slice: <name>)` for later breadth work after the Semantic Control Gate is accepted.
@ -98,6 +122,8 @@ The project is not yet a universal arbitrary-1C agent.
Remaining work belongs to the next breadth module:
- confirm the latest autorun Cyrillic hygiene cut in the GUI after backend refresh and inspect frontend/API payloads if old replacement characters remain visible;
- continue dogfooding the `Agentic Semantic Development Loop` on real stage packs, especially generated-question quality, semantic business audit, repair handoff, and rerun acceptance;
- finish closure of the `Open-World Semantic Control Gate` opened by `assistant-stage1-EHMOy3lNFt`; the EHMO-derived critical subset is accepted live after W5/W7 hardening, but the fat GUI pack and residual answer-shape roughness still need final review;
- extend `business_overview` beyond money-flow/activity, customer and supplier concentration, document/account-section activity mix, counterparty role split, contract usage, yearly operating-flow dynamics, explicit profit/margin wording boundaries, explicit debt due-date wording boundaries, explicit inventory reserve/liquidation wording boundaries, explicit supplier/procurement-quality wording boundaries, explicit-period VAT/tax, as-of-date debt position, open-settlement concentration, contract-date debt age, debt staleness-risk proxy, as-of-date inventory position, trading-margin proxy, sales-to-stock inventory proxy, warehouse staleness-risk proxy, and the missing-proof ledger into separately proven exact accounting profit/margin, due-date debt aging/overdue, confirmed vendor-risk/procurement-quality analysis, and confirmed reserve/write-off/liquidation inventory evidence families;
- broader dynamic schema traversal for unfamiliar 1C asks;
@ -120,11 +146,12 @@ For current planning, read:
1. `README.md`
2. this document
3. `23 - current_execution_spine_and_semantic_control_gate_2026-05-05.md`
4. `22 - open_world_bounded_autonomy_breadth_2026-05-01.md`
5. `20 - planner_autonomy_consolidation_2026-05-01.md`
6. `19 - inventory_stock_open_world_breadth_proof_2026-05-01.md`
7. `17 - post_f_semantic_integrity_hardening_2026-04-23.md`
8. `16 - data_need_graph_and_open_world_mcp_plan_2026-04-22.md`
3. `24 - agentic_semantic_development_loop_and_autorun_hygiene_2026-05-10.md`
4. `23 - current_execution_spine_and_semantic_control_gate_2026-05-05.md`
5. `22 - open_world_bounded_autonomy_breadth_2026-05-01.md`
6. `20 - planner_autonomy_consolidation_2026-05-01.md`
7. `19 - inventory_stock_open_world_breadth_proof_2026-05-01.md`
8. `17 - post_f_semantic_integrity_hardening_2026-04-23.md`
9. `16 - data_need_graph_and_open_world_mcp_plan_2026-04-22.md`
Documents `01` through `15` remain valuable, but mostly as the historical architecture trail.

View File

@ -73,6 +73,22 @@ This is not a regression from `99%` to `96%`. It is a metric split:
- `99%` describes wired breadth;
- `99%` describes closure confidence after the EHMO-derived critical subset passed live replay again with W5/W7 hardening; the gate is still not full module closure until the fat manual GUI pack and remaining answer-shape residuals are reviewed.
## 2026-05-10 Status Overlay
This document remains the Semantic Control Gate spine, but it is no longer the latest operating overlay.
Read `24 - agentic_semantic_development_loop_and_autorun_hygiene_2026-05-10.md` after this file.
Newer status:
- the current development operating layer is `Agentic Semantic Development Loop`;
- the first dogfood loop artifact for `agentic_semantic_development_loop` is accepted under `artifacts/domain_runs/stage_agent_loops/.../asl/final_status.md`;
- the loop uses Lead Codex handoff for repairs, not a weak autonomous coder as the primary repair actor;
- accepted AGENT autoruns should be saved only after reviewed replay/loop acceptance;
- latest autorun/runtime hygiene cut repairs old C1-control Cyrillic mojibake before GUI cards, runtime questions, or assistant-lane continuation.
The Semantic Control Gate work units below remain valid regression classes, but current execution should go through the stage-loop machinery when a substantial pack is being validated.
## Current Local Cut
Local cut 1 is implemented:

View File

@ -0,0 +1,138 @@
# 24 - Agentic Semantic Development Loop And Autorun Hygiene (2026-05-10)
## Purpose
This note is the current status overlay after the Open-World Semantic Control Gate work moved from direct manual Codex operation into a repo-native agentic development loop.
It exists to keep the project spine clear:
- the target product is still a bounded MCP-first 1C analyst assistant;
- Post-F, planner autonomy, inventory breadth, and business overview remain regression gates;
- the active operational work is now the development loop that generates, reviews, replays, audits, repairs, reruns, and only then saves AGENT autoruns;
- autorun/runtime hygiene is part of that loop, because broken saved-session text can invalidate the human GUI review even when the backend route is correct.
## What Changed Since Document 23
The previous execution-spine document stopped at the EHMO-derived Semantic Control Gate.
Since then, the project added a repo-native stage-loop layer:
- `scripts/stage_agent_loop.py` is the stage-level wrapper for pack-loop execution, review, status, safe continuation, and optional save-to-autoruns after acceptance.
- `docs/orchestration/stage_agent_loop_agentic_semantic_development_loop.json` is the active stage manifest.
- `docs/orchestration/agentic_semantic_development_loop_stage_pack.json` is the dogfood pack for business overview, VAT, stale scope, counterparty pivots, legacy route canaries, and answer-shape quality.
- `docs/orchestration/schemas/stage_agent_loop_manifest.schema.json` defines the manifest contract.
- `scripts/save_agent_semantic_run.py` refuses to save AGENT autoruns before reviewed live replay/loop acceptance unless explicitly forced as a draft.
- `docs/orchestration/agent_semantic_source_catalog.*` remains the reusable source catalog for mixed AGENT pack construction.
The accepted dogfood status is recorded at:
- `artifacts/domain_runs/stage_agent_loops/agentic_semantic_development_loop/domain_loops/asl/final_status.md`
Current recorded result:
- status: `accepted`
- loop id: `asl`
- repair mode: `lead-handoff`
- target score: `88`
- iterations ran: `1`
- stop reason: analyst accepted plus deterministic gate passed at `iteration_00`
- manual GUI confirmation is still required after acceptance
## Design Decision
The project should not rely on a weak autonomous coder as the primary repair actor.
The chosen model is:
- Lead Codex remains the responsible repair brain.
- A strong independent semantic/business audit layer reviews the replay from the user's business meaning first.
- The stage-loop produces machine-readable artifacts and a lead-coder handoff instead of silently patching production code.
- The loop can continue safely, but real code repair requires explicit execution mode and must be validated by rerun artifacts.
- Human GUI confirmation remains the final high-signal reality check for accepted AGENT packs.
This preserves the user's desired automation pattern without delegating high-risk architecture repair to a low-context worker.
## Autorun Cyrillic Hygiene Cut
The GUI exposed an old saved-session failure where Cyrillic in autorun cards/questions displayed as replacement-character text.
Control examples after repair:
- `БОЛЬШОЙ ОБЩИЙ`
- `АЛЬТЕРНАТИВА`
Root cause:
- old autorun history/runtime payloads contained double-decoded Cyrillic plus C1 controls such as `U+0098`;
- the previous repair path encoded those controls as normal Windows-1251 text and lost the raw byte needed to reconstruct UTF-8;
- the UI then displayed the remaining replacement character honestly.
Current fix:
- `addressTextRepair.ts` preserves C1 controls as raw bytes during UTF-8 reconstruction;
- `autoRuns.ts` repairs autorun titles/questions before exposing cards or runtime materialization;
- `eval.ts` and `assistantService.ts` now receive repaired scenario/runtime question text;
- known already-lossy fragments with `U+FFFD` inside the organization name are repaired before they can poison organization clarification;
- tests now cover the historical `БОЛЬШОЙ ОБЩИЙ` and `АЛЬТЕРНАТИВА` autorun cases.
Committed cut:
- `3be06b5 Починить восстановление кириллицы в автопрогонах`
Validation recorded during this cut:
- `npm.cmd test -- assistantOrganizationMatcher.test.ts addressTextRepair.test.ts autoRunsQuestionSplit.test.ts evalRuntimeQuestionSplit.test.ts` passed `20/20`;
- `npm.cmd test -- assistantAddressFollowupContext.test.ts -t "continues the original inventory query after"` passed `2/2`;
- `npm.cmd run build` passed;
- graphify rebuilt to `6371` nodes, `14048` edges, `141` communities.
## Current Status
The current large module should be described as:
`Agentic Semantic Development Loop / Open-World Semantic Control operating system`
Status:
- implementation state: operational dogfood loop exists and has an accepted first loop artifact;
- semantic status: accepted loop artifact is useful, but manual GUI confirmation remains required;
- hygiene status: saved autorun/runtime Cyrillic repair is now covered by code and tests;
- risk: medium, because the loop is now infrastructure for future acceptance decisions, not just a local route fix.
Recommended reporting line:
`Прогресс модуля: 99% (Agentic Semantic Development Loop, accepted dogfood loop + autorun hygiene; manual GUI confirmation still required)`
## What Is Not Closed
Do not treat the stage-loop as a replacement for business-answer review.
Still open:
- frontend/browser cache can still show old broken autorun text until the backend is restarted and the UI is refreshed;
- the first accepted dogfood loop proves the mechanism, not all future stage packs;
- generated question quality still needs pressure from real GUI runs and user feedback;
- broad arbitrary 1C autonomy is still bounded by reviewed routes, truth gates, and replay evidence;
- manual GUI confirmation remains required before declaring a fat AGENT pack fully accepted.
## Next Work
Next operational pass:
1. Restart/reload the backend and reopen the affected autorun card from the GUI.
2. Confirm old saved-session text now displays as normal Cyrillic.
3. If the GUI still shows replacement characters, inspect frontend state/cache and API payload side by side.
4. Continue dogfooding the stage-loop on the next real Open-World/agentic pack.
5. Keep Post-F, phase83, inventory, business-overview, and mojibake autorun cases as regression canaries.
## Canonical Reading Order Update
For current planning, read:
1. `README.md`
2. `21 - current_status_canon_2026-05-01.md`
3. this document
4. `23 - current_execution_spine_and_semantic_control_gate_2026-05-05.md`
5. `22 - open_world_bounded_autonomy_breadth_2026-05-01.md`
6. `20 - planner_autonomy_consolidation_2026-05-01.md`
7. `17 - post_f_semantic_integrity_hardening_2026-04-23.md`

View File

@ -41,13 +41,18 @@ This package answers the next question:
21. [21 - current_status_canon_2026-05-01.md](./21%20-%20current_status_canon_2026-05-01.md)
22. [22 - open_world_bounded_autonomy_breadth_2026-05-01.md](./22%20-%20open_world_bounded_autonomy_breadth_2026-05-01.md)
23. [23 - current_execution_spine_and_semantic_control_gate_2026-05-05.md](./23%20-%20current_execution_spine_and_semantic_control_gate_2026-05-05.md)
24. [24 - agentic_semantic_development_loop_and_autorun_hygiene_2026-05-10.md](./24%20-%20agentic_semantic_development_loop_and_autorun_hygiene_2026-05-10.md)
## Current Status Snapshot (2026-05-05)
## Current Status Snapshot (2026-05-10)
This package is no longer planning-only.
Status canon for planning:
- The current operational overlay is now [24 - agentic_semantic_development_loop_and_autorun_hygiene_2026-05-10.md](./24%20-%20agentic_semantic_development_loop_and_autorun_hygiene_2026-05-10.md).
- The active engineering surface is no longer only individual route hardening; it is the repo-native AGENT/stage-loop operating system that should generate/review/replay/audit/repair/rerun current-stage packs before saving accepted autoruns.
- The first dogfood stage loop for `agentic_semantic_development_loop` is accepted in artifacts, but manual GUI confirmation remains required before treating a fat AGENT pack as fully human-accepted.
- Autorun/runtime Cyrillic hygiene is now a current regression gate: old saved-session mojibake with C1 controls must be repaired before cards, questions, and runtime jobs reach the GUI or assistant lane.
- Post-F Semantic Integrity Hardening is operationally closed at `99%` and should now be used as a regression gate, not as the active module denominator.
- Planner Autonomy Consolidation is closed at `100%` for the declared phase83 planner-brain slice.
- The active next module is now `Open-World Bounded Autonomy Breadth` over unfamiliar 1C asks, with Post-F and phase83 retained as semantic canaries.
@ -78,8 +83,11 @@ Status canon for planning:
- The `assistant-stage1-EHMOy3lNFt` manual GUI replay opened the next acceptance gate: `Open-World Semantic Control Gate`.
- The `~99%` Open-World number now means implementation breadth through Slice 25, not accepted semantic closure under broad human dialogue pressure.
- The active breadth slice is semantic control rather than new proof-family expansion: garbage-anchor protection, business-overview continuation, intent dominance, frame hygiene, counterparty/organization arbitration, and final-summary answer shape.
- The current accepted dogfood infrastructure slice is `Agentic Semantic Development Loop`: stage manifest, stage pack, loop wrapper, status/continue safety, strong business-audit handoff, and save-after-acceptance gating are wired and validated by the `asl` accepted loop artifact.
- The latest hygiene slice is `Autorun Cyrillic C1 Repair`: `addressTextRepair`, `autoRuns`, `eval`, and `assistantService` now preserve C1 bytes while repairing old saved-session Russian text, preventing replacement-character autorun cards or runtime turns from leaking into the user path after backend refresh.
- The short source of truth for status wording is [21 - current_status_canon_2026-05-01.md](./21%20-%20current_status_canon_2026-05-01.md).
- The current execution spine after EHMO is [23 - current_execution_spine_and_semantic_control_gate_2026-05-05.md](./23%20-%20current_execution_spine_and_semantic_control_gate_2026-05-05.md).
- The current stage-loop/hygiene overlay after the AGENT dogfood cut is [24 - agentic_semantic_development_loop_and_autorun_hygiene_2026-05-10.md](./24%20-%20agentic_semantic_development_loop_and_autorun_hygiene_2026-05-10.md).
It now documents a turnaround that is already operational in code, already materially past the acute regression breakpoint, and already moved through bounded MCP autonomy, Post-F hardening, inventory breadth proof, and the declared Planner Autonomy slice:

View File

@ -3,7 +3,7 @@
Документ описывает практический контур, который используется оператором в интерфейсе
`История автопрогонов`: генерация вопросов, запуск прогонов, разметка ответов, закрытие кейсов и пост-анализ.
Дата актуализации: `2026-04-09`
Дата актуализации: `2026-05-10`
---
@ -213,6 +213,13 @@ Queue mapping:
6. Фильтр "скрыть выполненные" корректно исключает `resolved=true`.
7. Пост-анализ показывает очереди и кандидатов.
8. Текст в интерфейсе читается без mojibake.
9. Старые сохраненные автопрогоны с C1-control mojibake в `history.json` и runtime job payload должны отдаваться через backend уже в восстановленной кириллице; контрольные примеры: `БОЛЬШОЙ ОБЩИЙ`, `АЛЬТЕРНАТИВА`.
Важно после правок encoding/autorun:
- перезапустить backend, чтобы UI получил новый repair-слой;
- обновить список автопрогонов в браузере;
- если replacement-character `U+FFFD` остается видимым, сравнить API payload `GET /api/autoruns/autogen/history` с состоянием frontend/browser cache.
---
@ -221,4 +228,3 @@ Queue mapping:
1. Async run ограничен `assistant_stage1`.
2. Качество live-данных зависит от заполнения session-файлов на стороне рантайма.
3. Пост-анализ основан на фактической ручной разметке; без нее очереди пустые.