11 KiB
11 KiB
encoding_rule
- All source/code/config/docs files must be saved and edited in UTF-8 without BOM; never write mojibake placeholders or replacement characters.
commit_message_rule
- After applying fixes, always provide the user with a ready commit title in Russian.
change_risk_rule
- After applying fixes, always provide
Степень опасности правки: X/10immediately above the ready commit title. - The score must use an integer scale from
1to10, where:1= low-risk local change with narrow blast radius;10= high-risk architecture/runtime change with broad blast radius and mandatory close validation.
- The score must reflect real project risk, not optimism, and should help the user decide how much manual attention and replay validation the change deserves.
closeout_risk_reporting_rule
- After applying fixes, always provide
Потенциал регресса на текущем этапе: X%. - After applying fixes, always provide
Необходимость жирного ручного прогона: X%. - These two lines must be emitted together with the change-risk score and the ready commit title in every close-out.
- Both percentages must use an integer scale from
0%to100%. Потенциал регресса на текущем этапеmust reflect the real probability that nearby or not-yet-covered contours can regress at the current stabilization stage.Необходимость жирного ручного прогонаmust reflect how strongly the current change still needs a broad manual reality-check beyond unit tests, narrow replay, and build verification.- The percentages must be honest, architecture-aware, and useful for deciding whether the current pass is safe enough to trust without additional human validation.
graphify
This project has a graphify knowledge graph at graphify-out/.
Rules:
- Before answering architecture or codebase questions, read graphify-out/GRAPH_REPORT.md for god nodes and community structure
- If graphify-out/wiki/index.md exists, navigate it instead of reading raw files
- After modifying code files in this session, run
python -c "from graphify.watch import _rebuild_code; from pathlib import Path; _rebuild_code(Path('.'))"to keep the graph current
codex_domain_loop
- Project-scoped Codex orchestration lives under
.codex/. - Use
.codex/skills/domain-case-loopfor repeatable domain hardening loops on one concrete case. - Prefer
docs/orchestration/active_domain_contract.jsonas the single mutable source of truth for the current domain/scenario pack; keep the agent canon stable and swap only this file when the active domain changes. - The same skill/launcher also supports multi-step domain scenarios with shared assistant session state under
artifacts/domain_runs/<scenario_id>/steps/. - For full domain question pools, use pack mode and aggregate artifacts under
artifacts/domain_runs/<pack_id>/scenarios/. - Preserve current architecture: domain loop may automate capture, review, rerun, and artifact storage, but must not rewrite runtime foundations.
- Prefer machine-readable case artifacts in
artifacts/domain_runs/<case_id>/, especiallybaseline_turn.json/rerun_turn.json, over ad hoc prose-only summaries. - For cascading user questions in one domain, prefer scenario artifacts (
scenario_manifest.json,scenario_state.json, per-stepturn.json) over separate unlinked case folders. - For follow-up-heavy domains, treat acceptance as scenario-tree coverage: root node, critical child nodes, critical edges, and the primary user path must be validated explicitly.
- Do not accept a domain when only the root snapshot works but selected-object or drilldown follow-up edges still fail.
- For critical branches, validate at least canonical wording, colloquial wording, and UI-generated selected-object wording when that UX exists.
- Treat temporal carryover, selected-object carryover, answer-shape match, and ordering semantics as first-class acceptance invariants rather than optional polish.
- Treat direct-answer-first behavior, business usefulness, selected-object memory, and field truthfulness as first-class analyst criteria rather than optional presentation polish.
- Treat stable
focus_object, reusable bundles such asprovenance_bundle, and pronoun-style follow-up resolution (по ней,по этой позиции) as first-class analyst criteria in follow-up-heavy domains. - Treat action-first selected-object follow-ups, layered answer shape, stable
answer_object, and temporal honesty about out-of-window evidence as first-class analyst criteria rather than optional polish. - If a case falls outside the current routed contour because the route/intent/capability is not wired yet, treat it as domain enablement work for this project, not as automatic out-of-scope rejection.
- For new unmarked domains,
needs_exact_capabilitymeans "bootstrap or extend the contour" rather than "close the case as unsupported". - A case can be marked
acceptedonly when analyst verdict is at least80/100, no unresolvedP0remains, and the rerun does not mask heuristic output as confirmed.
agent_semantic_runs
АГЕНТНЫЙ ПРОГОНis a targeted full semantic replay for the current architecture fix, not a generic smoke test.- Use it to validate human user questions, human model answers, technical chats, business logic, and system routing together.
- Build question lists around the active fix: mix direct domain questions with contextual chains, meta interruptions, cross-domain pivots, and follow-up edges that specifically hit the architecture change under validation.
- Do not run or save an
АГЕНТНЫЙ ПРОГОНon every turn by default. - Run it when the user explicitly asks for it, or when a substantial architecture/domain fix needs critical semantic proof beyond unit tests and narrow synthetic checks.
АГЕНТНЫЙ ПРОГОНhas a mandatory execution order. The correct order is:- prepare or update the replay spec;
- run the replay live against the real assistant runtime;
- inspect machine artifacts and judge business/logic/technical quality;
- patch architecture/domain code if needed;
- rerun the same replay until the scenario is semantically clean;
- only after that, save the question pack into autoruns as legacy.
- Do not treat "questions were saved into autoruns" as "the AGENT run was executed". Saving questions is not the run. It is only a post-run persistence step.
- Preferred repo-native system tools for
АГЕНТНЫЙ ПРОГОНare:- build/update a mixed pack from reusable sources:
python scripts/agent_semantic_pack_builder.py build-pack --recipe <recipe> --output-spec docs/orchestration/<spec>.json - bootstrap a spec from a technical export:
python scripts/domain_truth_harness.py bootstrap --export <export.md> --output docs/orchestration/<spec>.json --scenario-id <scenario_id> --domain <domain> - execute the real replay:
python scripts/domain_truth_harness.py run-live --spec docs/orchestration/<spec>.json --output-dir artifacts/domain_runs/<run_id> - save the already-validated replay into autoruns:
python scripts/save_agent_semantic_run.py --spec docs/orchestration/<spec>.json
- build/update a mixed pack from reusable sources:
- The default artifact-reading order after
run-liveis:artifacts/domain_runs/<run_id>/final_status.mdartifacts/domain_runs/<run_id>/truth_review.mdartifacts/domain_runs/<run_id>/pack_state.jsonartifacts/domain_runs/<run_id>/steps/<step_id>/turn.jsonartifacts/domain_runs/<run_id>/steps/<step_id>/output.md
- When reviewing a replay, do not trust only the top-level
accepted/passflag. A run can still hide a semantic bug if the step-level answer is business-wrong, logically wrong, context-leaking, or routed through the wrong lane. - Do not mislabel a valid clarification as a bug. If the assistant correctly asks the user to choose an organization/company because the active contour is ambiguous, that is normal behavior, not a regression.
- For multi-company contours, the AGENT run must continue the same session after the clarification and explicitly choose the company needed for the scenario. Do not stop the analysis at "уточните организацию"; extend the replay with the natural next user turn that selects the company and then continue hardening the real business path.
- If the replay reveals business-answer defects, logic defects, stale carryover, answer-shape mismatch, or technical routing bugs, fix the architecture/domain code first and rerun the same spec before saving anything to autoruns.
- If the replay reveals a capability gap rather than a regression, do not frame it as "the system is buggy". Frame it as unfinished contour/domain enablement work and keep iterating until the missing path is either implemented or honestly bounded.
- A blocked answer inside the replay is not the end of the analysis. The agent must ask why the system could not answer, inspect reachable MCP/1C evidence, and decide whether the missing business answer can be recovered by a new route, a new capability, or an evidence-based derived answer.
- When the direct fact is unavailable in the current contour but recoverable from 1C activity evidence, prefer domain enablement work: fetch the supporting evidence via MCP/1C, derive the business-useful answer carefully, and state the derivation basis honestly. Example: if legal registration age is unavailable, the system may answer with age/activity duration inferred from the first and latest confirmed 1C activity, explicitly marked as an inference rather than a legal registration fact.
- When a fact cannot be proven exactly, the user-facing answer must say what is confirmed, what is inferred, and what remains unknown. Do not present an inferred business estimate as a юридический or formally confirmed fact.
- Save agent-built question packs into autoruns under
Пользовательские сессииwith title prefixAGENT | ...only after the live replay has been executed and reviewed. - Agent semantic runs saved into autoruns must remain runnable by the user from the UI like any other saved user session.
- If a pack was saved too early by mistake, treat it as an invalid intermediate artifact: remove its files from
llm_normalizer/data/autorun_generators/saved_sessions/,llm_normalizer/data/eval_cases/, and its record fromllm_normalizer/data/autorun_generators/history.json, then regenerate it only after the successful replay. - The goal of an AGENT run is not only to confirm routes but to actively improve the assistant until the problematic questions are handled acceptably. Run, inspect, fix, rerun, and repeat until the critical business questions in the scenario are no longer broken, misleading, or underpowered.
- Evaluate the replay primarily through the user-facing business answer. Internal labels, raw route ids, capability ids, debug enums,
snapshot_items,bank_operations_by_*,answer_object, and other service metadata are for diagnosis only; they must not leak into the user-facing answer and must not dominate the analyst verdict. - Treat "technical garbage in the final answer" as a real quality defect even when the underlying route is correct. The hardened assistant should surface business meaning first and keep internal mechanics out of the user's head unless the user explicitly asks for technical detail.