NODEDC_1C/AGENTS.md

113 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## encoding_rule
- All source/code/config/docs files must be saved and edited in UTF-8 without BOM; never write mojibake placeholders or replacement characters.
## commit_message_rule
- After applying fixes, always provide the user with a ready commit title in Russian.
## change_risk_rule
- After applying fixes, always provide `Степень опасности правки: X/10` immediately above the ready commit title.
- The score must use an integer scale from `1` to `10`, where:
- `1` = low-risk local change with narrow blast radius;
- `10` = high-risk architecture/runtime change with broad blast radius and mandatory close validation.
- The score must reflect real project risk, not optimism, and should help the user decide how much manual attention and replay validation the change deserves.
## closeout_risk_reporting_rule
- After applying fixes, always provide `Потенциал регресса на текущем этапе: X%`.
- After applying fixes, always provide `Необходимость жирного ручного прогона: X%`.
- These two lines must be emitted together with the change-risk score and the ready commit title in every close-out.
- Both percentages must use an integer scale from `0%` to `100%`.
- `Потенциал регресса на текущем этапе` must reflect the real probability that nearby or not-yet-covered contours can regress at the current stabilization stage.
- `Необходимость жирного ручного прогона` must reflect how strongly the current change still needs a broad manual reality-check beyond unit tests, narrow replay, and build verification.
- The percentages must be honest, architecture-aware, and useful for deciding whether the current pass is safe enough to trust without additional human validation.
## development_stage_progress_rule
- After every completed development stage, always report `Прогресс модуля: X%`.
- The percentage must describe progress inside the current large module or plan block, not the whole project.
- If the stage belongs to a named large block, include that block name next to the percentage.
- Use an integer `0%` to `100%` scale and keep the estimate architecture-aware, based on implemented runtime wiring, tests, replay coverage, and remaining integration risk.
- Do not inflate progress because unit tests are green; semantic replay and real runtime wiring still count as unfinished work when they are pending.
## graphify
This project has a graphify knowledge graph at graphify-out/.
Rules:
- Before answering architecture or codebase questions, read graphify-out/GRAPH_REPORT.md for god nodes and community structure
- If graphify-out/wiki/index.md exists, navigate it instead of reading raw files
- After modifying code files in this session, run `python -c "from graphify.watch import _rebuild_code; from pathlib import Path; _rebuild_code(Path('.'))"` to keep the graph current
## codex_domain_loop
- Project-scoped Codex orchestration lives under `.codex/`.
- Use `.codex/skills/domain-case-loop` for repeatable domain hardening loops on one concrete case.
- Prefer `docs/orchestration/active_domain_contract.json` as the single mutable source of truth for the current domain/scenario pack; keep the agent canon stable and swap only this file when the active domain changes.
- The same skill/launcher also supports multi-step domain scenarios with shared assistant session state under `artifacts/domain_runs/<scenario_id>/steps/`.
- For full domain question pools, use pack mode and aggregate artifacts under `artifacts/domain_runs/<pack_id>/scenarios/`.
- Preserve current architecture: domain loop may automate capture, review, rerun, and artifact storage, but must not rewrite runtime foundations.
- Prefer machine-readable case artifacts in `artifacts/domain_runs/<case_id>/`, especially `baseline_turn.json` / `rerun_turn.json`, over ad hoc prose-only summaries.
- For cascading user questions in one domain, prefer scenario artifacts (`scenario_manifest.json`, `scenario_state.json`, per-step `turn.json`) over separate unlinked case folders.
- For follow-up-heavy domains, treat acceptance as scenario-tree coverage: root node, critical child nodes, critical edges, and the primary user path must be validated explicitly.
- Do not accept a domain when only the root snapshot works but selected-object or drilldown follow-up edges still fail.
- For critical branches, validate at least canonical wording, colloquial wording, and UI-generated selected-object wording when that UX exists.
- Treat temporal carryover, selected-object carryover, answer-shape match, and ordering semantics as first-class acceptance invariants rather than optional polish.
- Treat direct-answer-first behavior, business usefulness, selected-object memory, and field truthfulness as first-class analyst criteria rather than optional presentation polish.
- Treat stable `focus_object`, reusable bundles such as `provenance_bundle`, and pronoun-style follow-up resolution (`по ней`, `по этой позиции`) as first-class analyst criteria in follow-up-heavy domains.
- Treat action-first selected-object follow-ups, layered answer shape, stable `answer_object`, and temporal honesty about out-of-window evidence as first-class analyst criteria rather than optional polish.
- If a case falls outside the current routed contour because the route/intent/capability is not wired yet, treat it as domain enablement work for this project, not as automatic out-of-scope rejection.
- For new unmarked domains, `needs_exact_capability` means "bootstrap or extend the contour" rather than "close the case as unsupported".
- A case can be marked `accepted` only when analyst verdict is at least `80/100`, no unresolved `P0` remains, and the rerun does not mask heuristic output as confirmed.
## agent_semantic_runs
- `АГЕНТНЫЙ ПРОГОН` is a targeted full semantic replay for the current architecture fix, not a generic smoke test.
- Use it to validate human user questions, human model answers, technical chats, business logic, and system routing together.
- Semantic meaning analysis is mandatory and has priority over green local tests.
- After every full-system replay or saved-session run, the agent must first analyze:
- what the user actually meant and what business result the user expected;
- what the assistant actually answered in human-readable form;
- whether the answer is semantically correct, context-aware, direct, and useful for the user.
- Only after that semantic review may the agent inspect technical chats, debug payloads, route ids, capability ids, filters, and internal orchestration metadata.
- Unit tests, narrow regressions, and green builds are secondary evidence only. They must never be presented as the primary proof that a replay is healthy when the user-facing semantic answer was not reviewed carefully.
- If a full-system replay contains a business-wrong, context-wrong, misleading, over-broad, or semantically off-target answer, the agent must treat it as a real failure even when route ids, tests, and low-level checks look green.
- The default review order for every substantial replay is:
1. read the user question chain as a human conversation;
2. read the assistant answers as a human user would see them;
3. judge semantic correctness, continuity, directness, and business usefulness;
4. only then inspect technical chats and machine artifacts to explain why the semantic defect happened;
5. only then use unit tests or narrow regressions as supporting verification after the fix.
- Do not hide behind `tests are green` or `route matched` when the semantic answer is still wrong. In this project, the meaning of the user question and the meaning of the assistant answer are the primary acceptance surface.
- Build question lists around the active fix: mix direct domain questions with contextual chains, meta interruptions, cross-domain pivots, and follow-up edges that specifically hit the architecture change under validation.
- Do not run or save an `АГЕНТНЫЙ ПРОГОН` on every turn by default.
- Run it when the user explicitly asks for it, or when a substantial architecture/domain fix needs critical semantic proof beyond unit tests and narrow synthetic checks.
- `АГЕНТНЫЙ ПРОГОН` has a mandatory execution order. The correct order is:
1. prepare or update the replay spec;
2. run the replay live against the real assistant runtime;
3. inspect machine artifacts and judge business/logic/technical quality;
4. patch architecture/domain code if needed;
5. rerun the same replay until the scenario is semantically clean;
6. only after that, save the question pack into autoruns as legacy.
- Do not treat "questions were saved into autoruns" as "the AGENT run was executed". Saving questions is not the run. It is only a post-run persistence step.
- Preferred repo-native system tools for `АГЕНТНЫЙ ПРОГОН` are:
- build/update a mixed pack from reusable sources: `python scripts/agent_semantic_pack_builder.py build-pack --recipe <recipe> --output-spec docs/orchestration/<spec>.json`
- bootstrap a spec from a technical export: `python scripts/domain_truth_harness.py bootstrap --export <export.md> --output docs/orchestration/<spec>.json --scenario-id <scenario_id> --domain <domain>`
- execute the real replay: `python scripts/domain_truth_harness.py run-live --spec docs/orchestration/<spec>.json --output-dir artifacts/domain_runs/<run_id>`
- save the already-validated replay into autoruns: `python scripts/save_agent_semantic_run.py --spec docs/orchestration/<spec>.json`
- The default artifact-reading order after `run-live` is:
- `artifacts/domain_runs/<run_id>/final_status.md`
- `artifacts/domain_runs/<run_id>/truth_review.md`
- `artifacts/domain_runs/<run_id>/pack_state.json`
- `artifacts/domain_runs/<run_id>/steps/<step_id>/turn.json`
- `artifacts/domain_runs/<run_id>/steps/<step_id>/output.md`
- When reviewing a replay, do not trust only the top-level `accepted/pass` flag. A run can still hide a semantic bug if the step-level answer is business-wrong, logically wrong, context-leaking, or routed through the wrong lane.
- Do not mislabel a valid clarification as a bug. If the assistant correctly asks the user to choose an organization/company because the active contour is ambiguous, that is normal behavior, not a regression.
- For multi-company contours, the AGENT run must continue the same session after the clarification and explicitly choose the company needed for the scenario. Do not stop the analysis at "уточните организацию"; extend the replay with the natural next user turn that selects the company and then continue hardening the real business path.
- If the replay reveals business-answer defects, logic defects, stale carryover, answer-shape mismatch, or technical routing bugs, fix the architecture/domain code first and rerun the same spec before saving anything to autoruns.
- If the replay reveals a capability gap rather than a regression, do not frame it as "the system is buggy". Frame it as unfinished contour/domain enablement work and keep iterating until the missing path is either implemented or honestly bounded.
- A blocked answer inside the replay is not the end of the analysis. The agent must ask why the system could not answer, inspect reachable MCP/1C evidence, and decide whether the missing business answer can be recovered by a new route, a new capability, or an evidence-based derived answer.
- When the direct fact is unavailable in the current contour but recoverable from 1C activity evidence, prefer domain enablement work: fetch the supporting evidence via MCP/1C, derive the business-useful answer carefully, and state the derivation basis honestly. Example: if legal registration age is unavailable, the system may answer with age/activity duration inferred from the first and latest confirmed 1C activity, explicitly marked as an inference rather than a legal registration fact.
- When a fact cannot be proven exactly, the user-facing answer must say what is confirmed, what is inferred, and what remains unknown. Do not present an inferred business estimate as a юридический or formally confirmed fact.
- Save agent-built question packs into autoruns under `Пользовательские сессии` with title prefix `AGENT | ...` only after the live replay has been executed and reviewed.
- Agent semantic runs saved into autoruns must remain runnable by the user from the UI like any other saved user session.
- If a pack was saved too early by mistake, treat it as an invalid intermediate artifact: remove its files from `llm_normalizer/data/autorun_generators/saved_sessions/`, `llm_normalizer/data/eval_cases/`, and its record from `llm_normalizer/data/autorun_generators/history.json`, then regenerate it only after the successful replay.
- The goal of an AGENT run is not only to confirm routes but to actively improve the assistant until the problematic questions are handled acceptably. Run, inspect, fix, rerun, and repeat until the critical business questions in the scenario are no longer broken, misleading, or underpowered.
- Evaluate the replay primarily through the user-facing business answer. Internal labels, raw route ids, capability ids, debug enums, `snapshot_items`, `bank_operations_by_*`, `answer_object`, and other service metadata are for diagnosis only; they must not leak into the user-facing answer and must not dominate the analyst verdict.
- Treat "technical garbage in the final answer" as a real quality defect even when the underlying route is correct. The hardened assistant should surface business meaning first and keep internal mechanics out of the user's head unless the user explicitly asks for technical detail.