Stage 3: улучшена логика жизненного цикла и очищены ответы ассистента

This commit is contained in:
dctouch 2026-03-26 20:21:51 +03:00
parent d0b842adb0
commit 914843a8ba
81 changed files with 18051 additions and 654 deletions

View File

@ -0,0 +1,272 @@
# ACCEPTANCE_CHECKLIST_STAGE_04
## Назначение документа
Этот документ используется для приёмки реализации Stage 4.
Его задача — проверить, что graph core внедрён как рабочий runtime-слой, а не как формальная схема.
Документ обязателен для:
- Codex;
- разработчика;
- ручного review;
- финальной фиксации Stage 4.
---
## Статус документа
- Статус: чеклист приёмки Stage 4
- Язык: русский
- Режим использования: обязателен при завершении каждой волны и при финальной приёмке Stage 4
- При конфликте по scope приоритет имеет `STAGE_04_TASK_CARD.md`
- При конфликте по архитектурным ограничениям приоритет имеет `ARCHITECTURE_GUARDRAILS.md`
- При конфликте по platform logic приоритет имеет `TZ_Platform_Core_Accounting_Assistant_Mode.md`
---
## Правила оценки
Допустимые статусы:
- `PASS` — выполнено полностью
- `PARTIAL` — выполнено частично, требуется доработка
- `FAIL` — не выполнено
- `N/A` — не применимо (только с явным обоснованием)
Для каждого пункта обязателен комментарий:
- что проверялось;
- где реализовано;
- чем подтверждается;
- какие ограничения остались.
---
## Общая логика приёмки
Stage 4 считается принятым только если одновременно выполнено:
1. Закрыт именно Stage 4, без скрытого выезда в Stage 56.
2. Graph contracts реализованы и используются runtime.
3. Graph traversal реально участвует в graph-eligible retrieval.
4. Problem assembly использует graph connectivity.
5. Lifecycle reasoning использует graph transitions.
6. Answer layer использует graph-backed causal explanation.
7. Есть benchmark/eval подтверждение value.
8. Рабочий контур не разрушен.
---
# Блок A. Scope discipline
## A1. Реализован именно Stage 4
Статус:
Комментарий:
## A2. Нет скрытого выезда в Stage 5
Статус:
Комментарий:
## A3. Нет скрытого выезда в Stage 6
Статус:
Комментарий:
## A4. Нет большого ненужного platform refactor
Статус:
Комментарий:
---
# Блок B. Graph model
## B1. Реализована schema `AccountingGraphNode`
Статус:
Комментарий:
## B2. Реализована schema `AccountingGraphEdge`
Статус:
Комментарий:
## B3. Внедрён `GraphSchemaRegistry`
Статус:
Комментарий:
## B4. Узлы/связи имеют provenance/confidence
Статус:
Комментарий:
## B5. Нет generic edges уровня `related_to` как основного механизма
Статус:
Комментарий:
---
# Блок C. Graph runtime
## C1. Реализован `GraphBuilder`
Статус:
Комментарий:
## C2. Реализован `GraphTraversalPolicy`
Статус:
Комментарий:
## C3. Реализован `GraphValidationLayer`
Статус:
Комментарий:
## C4. Missing/conflicting links детектируются как runtime-сигналы
Статус:
Комментарий:
## C5. Runtime устойчив к неполным данным
Статус:
Комментарий:
---
# Блок D. Интеграция слоёв
## D1. Planner поддерживает graph eligibility
Статус:
Комментарий:
## D2. Execution использует typed graph traversal в graph-eligible запросах
Статус:
Комментарий:
## D3. Problem assembly использует graph connectivity
Статус:
Комментарий:
## D4. Lifecycle checks используют graph transitions
Статус:
Комментарий:
## D5. Answer layer использует graph-backed causal path
Статус:
Комментарий:
---
# Блок E. Quality / eval
## E1. Добавлены unit tests для graph contracts/runtime
Статус:
Комментарий:
## E2. Добавлены integration tests для planner/execution graph path
Статус:
Комментарий:
## E3. Regression по Stage 2/Stage 3 не сломан
Статус:
Комментарий:
## E4. Есть benchmark suite Stage 4
Статус:
Комментарий:
## E5. Есть before/after value report
Статус:
Комментарий:
---
# Блок F. Observability / compatibility
## F1. Graph decisions и traversal диагностируемы
Статус:
Комментарий:
## F2. Contracts и source of truth документированы
Статус:
Комментарий:
## F3. Изменения совместимы с roadmap Stage 5
Статус:
Комментарий:
## F4. Миграционная дисциплина соблюдена
Статус:
Комментарий:
---
# Блок G. Documentation completeness
## G1. Есть актуальный `STAGE_04_TASK_CARD.md`
Статус:
Комментарий:
## G2. Есть acceptance mapping `изменение -> критерий`
Статус:
Комментарий:
## G3. Есть explicit non-scope список
Статус:
Комментарий:
## G4. Run-артефакты оформлены по стандарту `date -> Stage -> Wave`, включая `prompt_dialogs`
Статус:
Комментарий:
---
# Блок H. Финальное решение по этапу
## H1. Stage 4 можно считать принятым
Статус:
Комментарий:
## H2. Stage 4 нельзя считать принятым
Статус:
Комментарий:
---
# Итоговая сводка по приёмке
## Общий итог
- Результат: `PASS / PARTIAL / FAIL`
- Дата проверки:
- Проверял:
- Версия / ветка / commit:
- Связанные документы:
## Ключевые сильные стороны
1.
2.
3.
## Ключевые недочёты
1.
2.
3.
## Что обязательно исправить до приёмки
1.
2.
3.
## Что допустимо перенести в следующий этап
1.
2.
3.
## Явно подтверждено как non-scope текущего этапа
1.
2.
3.
## Финальное решение
- `Принять Stage 4`
- `Принять Stage 4 условно`
- `Вернуть на доработку`
Комментарий:
---
## Короткая практическая формула
Stage 4 считается успешным тогда, когда graph layer реально работает в runtime и улучшает retrieval/problem/lifecycle/answer, а не только добавляет новую схему данных.

View File

@ -20,7 +20,7 @@
- Статус: основной управляющий бриф для Codex
- Язык: русский
- Режим использования: обязателен к прочтению перед любыми изменениями в коде
- При конфликте с рабочим scope текущей итерации приоритет имеет `STAGE_03_TASK_CARD.md`
- При конфликте с рабочим scope текущей итерации приоритет имеет `STAGE_04_TASK_CARD.md`
- При конфликте по архитектурным ограничениям приоритет имеет `TZ_Platform_Core_Accounting_Assistant_Mode.md`
---
@ -37,29 +37,29 @@
- возвращать ответ пользователю.
При этом текущая система ещё не является полноценным accountant-grade investigation copilot.
На текущем переходе считаем этапы 1 и 2 выполненными и переходим к **Stage 3 / Lifecycle Formalization**.
Stage 3 зафиксирован как завершённый (accepted), и текущий переход — **Stage 4 / Accounting Ontology Graph Core**.
Основные текущие ограничения, которые Stage 3 должен закрыть:
Основные текущие ограничения, которые Stage 4 должен закрыть:
- lifecycle-семантика остаётся частично эвристической;
- отсутствует формализованная модель допустимых состояний/переходов по ключевым доменам;
- problem units недостаточно насыщены temporal и stage-based смыслом;
- ranking по ряду классов вопросов всё ещё тяготеет к frequency/sum/entity сигналам;
- ответы местами остаются на уровне generic lifecycle labels.
- отсутствует единое graph-представление бухгалтерских сущностей и связей;
- причинно-следственные цепочки до сих пор частично собираются эвристически;
- missing/conflicting links не являются first-class runtime-объектами;
- lifecycle/problem reasoning недостаточно использует структурную graph-связность;
- cross-branch traversal и period-impact проверки ограничены локальными rule bundles.
---
## Цель работы Codex на текущей итерации
Codex должен помочь реализовать **только Stage 3**, не разрушая текущий рабочий контур и не подтягивая prematurely решения из следующих этапов.
Codex должен помочь реализовать **только Stage 4**, не разрушая текущий рабочий контур и не подтягивая prematurely решения из следующих этапов.
Текущая цель:
- ввести формальную lifecycle-модель по целевым доменам Stage 3;
- внедрить lifecycle runtime-компоненты и их использование в рабочем пути;
- интегрировать lifecycle в problem units, ranking и answer synthesis;
- ввести рабочее graph-ядро бухгалтерских сущностей и типизированных связей;
- внедрить graph runtime-компоненты в retrieval/planning/problem assembly/lifecycle binding;
- интегрировать graph-связность в reasoning и answer synthesis;
- подтвердить полезность через domain-eval и before/after проверку;
- не превращать текущий этап в скрытую реализацию Stage 46.
- не превращать текущий этап в скрытую реализацию Stage 56.
---
@ -68,7 +68,7 @@ Codex должен помочь реализовать **только Stage 3**,
При чтении и интерпретации материалов использовать следующий порядок приоритета.
### 1. Текущий рабочий scope
- `03_execution/STAGE_03_TASK_CARD.md`
- `03_execution/STAGE_04_TASK_CARD.md`
Это главный документ по тому, что делать прямо сейчас.
@ -84,15 +84,16 @@ Codex должен помочь реализовать **только Stage 3**,
- security;
- live bridge policy.
### 3. Детальное ТЗ третьего этапа
### 3. Детальное ТЗ четвёртого этапа
- `02_stages/TZ_Stage_4_Accounting_Ontology_Graph_Core_Assistant_Mode.md`
Этот документ определяет содержимое Stage 4.
### 4. Зависимости Stage 4
- `02_stages/TZ_Stage_3_Lifecycle_Formalization_Assistant_Mode.md`
Этот документ определяет содержимое Stage 3.
### 4. Зависимости Stage 3
- `02_stages/TZ_Stage_2_Retrieval_Unit_Shift_Assistant_Mode.md`
Stage 3 опирается на problem-centric слой Stage 2 и не должен его ломать.
Stage 4 опирается на problem-centric слой Stage 2 и lifecycle слой Stage 3 и не должен их ломать.
### 5. Текущий статус и общая логика развития
- `00_context/Assistant_Mode_GLOBAL_STATUS_2026-03-24.md`
@ -103,11 +104,10 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
Эти документы нужны для понимания:
- что уже сделано;
- где реальные потолки системы;
- почему сейчас выполняется Stage 3;
- как Stage 3 стыкуется с дальнейшими этапами.
- почему сейчас выполняется Stage 4;
- как Stage 4 стыкуется с дальнейшими этапами.
### 6. Этапы 46
- `02_stages/TZ_Stage_4_...`
### 6. Этапы 56
- `02_stages/TZ_Stage_5_...`
- `02_stages/TZ_Stage_6_...`
@ -122,18 +122,17 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
## Scope текущей итерации
Разрешено делать только то, что относится к Stage 3 и необходимо для его корректной реализации.
Разрешено делать только то, что относится к Stage 4 и необходимо для его корректной реализации.
К текущему scope относятся:
- формализация lifecycle-доменов и lifecycle-сущностей Stage 3;
- описание states/transitions/defects с привязкой к доступным evidence;
- реализация runtime-слоя (`LifecycleRegistry`, `LifecycleResolver`, `LifecycleDefectClassifier`, `LifecycleEnricher`);
- обновление `problem_unit_schema` lifecycle-полями;
- интеграция lifecycle-факторов в ranking policy;
- интеграция lifecycle-логики в answer policy;
- lifecycle-aware тесты и benchmark контур по ключевым доменам;
- before/after eval отчёт по продуктовой ценности Stage 3.
- формализация graph-ядра (`AccountingGraphNode`, `AccountingGraphEdge`, typed relations);
- реализация runtime-слоя (`GraphSchemaRegistry`, `GraphBuilder`, `GraphTraversalPolicy`, `GraphValidationLayer`);
- интеграция graph-сигналов в retrieval planning/execution;
- интеграция graph connectivity в problem assembly и lifecycle binding;
- интеграция graph-based объяснений в answer policy;
- graph-aware тесты и benchmark контур по ключевым доменам;
- before/after eval отчёт по продуктовой ценности Stage 4.
---
@ -141,12 +140,13 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
На этой итерации нельзя фактически реализовывать как core-runtime следующие слои:
- полноразмерный ontology / graph runtime из Stage 4;
- полноценный investigation orchestrator из Stage 5;
- live verification runtime core и full product mode split из Stage 6;
- полноразмерный enterprise-wide graph beyond accounting core Stage 4;
- переезд на новую полную сервисную архитектуру;
- переписывание ассистента вокруг новых abstraction layers без крайней необходимости;
- домены, которые не поддерживаются текущими данными/evidence mapping;
- попытки закрывать graph-gap только prompt-инженерией;
- большие инфраструктурные переделки ради “красоты”.
---
@ -154,20 +154,20 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
## Главный принцип текущей работы
**Не строить целевую систему раньше времени.**
Нужно сделать Stage 3 так, чтобы lifecycle-модели были не формальными таблицами, а реально работающим runtime-слоем и базой для следующих этапов.
Нужно сделать Stage 4 так, чтобы graph-модели были не формальными схемами, а реально работающим runtime-слоем и базой для следующих этапов.
---
## Жёсткие архитектурные ограничения
### 1. Нельзя ломать текущий рабочий контур без прямой причины
Если существующий transport / endpoint / base routing / normalizer pipeline работает, он должен сохраняться, если только изменение не является обязательным условием Stage 3.
Если существующий transport / endpoint / base routing / normalizer pipeline работает, он должен сохраняться, если только изменение не является обязательным условием Stage 4.
### 2. Нельзя подменять архитектурные изменения промптами
Проблемы lifecycle-state, transition logic, defect classification, ranking integration и answer grounding не должны решаться только промптами или “умной формулировкой ответа”.
Проблемы graph connectivity, relation semantics, traversal logic, problem assembly integration и answer grounding не должны решаться только промптами или “умной формулировкой ответа”.
### 3. Нельзя преждевременно тащить Stage 46 в кодовую базу
Если какое-либо изменение фактически реализует future-stage runtime, оно должно быть отклонено или отложено, если не доказана его необходимость для Stage 3.
### 3. Нельзя преждевременно тащить Stage 56 в кодовую базу
Если какое-либо изменение фактически реализует future-stage runtime, оно должно быть отклонено или отложено, если не доказана его необходимость для Stage 4.
### 4. Нельзя делать большие рефакторы ради абстрактной чистоты
Разрешены только те изменения, которые:
@ -175,15 +175,15 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
- повышают устойчивость текущего слоя;
- не разрушают траекторию дальнейшего развития.
### 5. Каждый lifecycle-элемент обязан иметь полный контур реализации
Для каждого lifecycle-элемента должны существовать:
### 5. Каждый graph-элемент обязан иметь полный контур реализации
Для каждого graph-элемента должны существовать:
- spec-level описание;
- runtime-level вычисление;
- retrieval/ranking-level использование;
- answer-level интерпретация.
### 6. Нельзя вводить состояния и дефекты без evidence mapping
Если состояние/переход/дефект нельзя определить по реально доступным данным, его нельзя вводить как runtime-элемент Stage 3.
### 6. Нельзя вводить узлы/связи без provenance и evidence mapping
Если node/edge нельзя определить по реально доступным данным и привязать к источнику, его нельзя вводить как runtime-элемент Stage 4.
---
@ -195,14 +195,14 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
Сначала изучить:
- текущий статус;
- platform core ТЗ;
- Stage 3;
- зависимость от Stage 2;
- Stage 4;
- зависимости от Stage 3 и Stage 2;
- roadmap;
- контекст следующих этапов.
### Шаг B. Анализ текущего кода
До внесения изменений определить:
- какие части lifecycle already/partially реализованы;
- какие части graph/lifecycle/problem layers already/partially реализованы;
- где находятся реальные точки расширения;
- какие элементы являются хрупкими;
- какие изменения потребуют новых contracts;
@ -221,7 +221,7 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
Только после плана переходить к реализации.
Изменения должны вноситься малыми порциями, чтобы можно было проверить:
- не вышел ли scope за Stage 3;
- не вышел ли scope за Stage 4;
- не сломан ли текущий контур;
- не появились ли premature abstractions.
@ -233,6 +233,61 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
- список ограничений;
- список нерешённых вопросов;
- оценку совместимости с дальнейшими этапами.
- ссылку на run-папку в `llm_normalizer/docs/runs` с артефактами по стандарту структуры волны.
---
## Стандарт структуры run-артефактов (обязательный)
Для каждой волны тестов и приёмки нужно создавать отдельную run-папку в:
- `llm_normalizer/docs/runs`
### 1. Обязательный формат имени run-папки
Имя папки должно быть в формате:
- `YYYY-MM-DD_Stage_<NN>_Wave_<NN>_<short_topic>`
Где:
- после даты обязательно идёт `Stage`;
- после `Stage` обязательно идёт `Wave`;
- только потом добавляется краткая тема прогона.
Пример:
- `2026-03-26_Stage_03_Wave_03_Lifecycle_Prompts`
### 2. Обязательный состав артефактов run-папки
В каждой run-папке должны быть минимум:
- `README.md` (контекст волны и что проверяли);
- `run_summary.json` (команды, результаты, ссылки на артефакты);
- артефакты тестов/прогонов (eval, acceptance, regression и т.д.);
- отдельная папка `prompt_dialogs`.
### 3. Обязательная папка диалогов `prompt_dialogs`
Папка `prompt_dialogs` должна содержать данные диалога в формате "вопрос пользователя -> ответ системы" и runtime-контекст:
- `prompt_dialogs/index.json` (индекс всех кейсов и файлов);
По каждому кейсу:
- `prompt_dialogs/<suite>/<case_id>.json` (сырой JSON диалога, debug/runtime поля, decomposition/grounding если доступны);
- `prompt_dialogs/<suite>/<case_id>.md` (быстро читаемая версия user/system/assistant).
Эти файлы обязательны для разборов wave-результатов, чтобы быстро видеть:
- что именно спросил пользователь;
- что вернула система;
- что было декомпозировано и на чём основан ответ;
- что отсутствует или отфильтровано в pipeline.
### 4. Запрет на смешивание волн
Нельзя складывать артефакты разных волн в одну и ту же run-папку.
Каждая волна должна иметь собственную папку и собственный набор `prompt_dialogs`.
---
@ -243,20 +298,20 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
### 1. Summary текущего состояния
Краткое описание того, как текущая реализация устроена по коду.
### 2. Gap analysis относительно Stage 3
Перечень того, чего не хватает для соответствия Stage 3.
### 2. Gap analysis относительно Stage 4
Перечень того, чего не хватает для соответствия Stage 4.
### 3. Предлагаемый file-level plan
Какие файлы нужно менять, создавать или расширять.
### 4. Предлагаемые contracts / types / schemas
Какие lifecycle-сущности и интерфейсы появятся.
Какие graph/lifecycle/problem-сущности и интерфейсы появятся.
### 5. Test plan
Какие тесты будут добавлены или обновлены.
### 6. Acceptance mapping
Какие критерии Stage 3 покрываются какими изменениями.
Какие критерии Stage 4 покрываются какими изменениями.
### 7. Explicit non-scope
Что сознательно не будет делаться сейчас.
@ -272,7 +327,7 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
1. Что было проанализировано
2. Что обнаружено
3. Что предлагается изменить
4. Почему это соответствует Stage 3
4. Почему это соответствует Stage 4
5. Что не входит в текущий scope
6. Какие файлы затрагиваются
7. Какие риски есть
@ -295,8 +350,8 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
### 2. Явные contracts
Всё, что касается:
- lifecycle states/transitions/defects;
- lifecycle resolution;
- graph nodes/edges/relation semantics;
- graph traversal/resolution;
- enrichment contracts;
- ranking factors;
- answer interpretation;
@ -306,11 +361,11 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
### 3. Контролируемая расширяемость
Расширяемость допустима, но только в той мере, в которой она:
- реально нужна Stage 3;
- реально нужна Stage 4;
- не заставляет внедрять всю будущую архитектуру заранее.
### 4. Наблюдаемость изменений
Если добавляется новая lifecycle-логика, нужно продумать:
Если добавляется новая graph-логика, нужно продумать:
- как она тестируется;
- как она логируется;
- как проверяется её корректность;
@ -329,10 +384,10 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
Следующие действия считаются ошибочными:
- “красивые lifecycle-таблицы” без рабочего resolver;
- lifecycle-поля в логах без влияния на ranking/answer;
- ответы вида “broken_lifecycle” без state/transition логики;
- скрытая реализация Stage 46 под видом Stage 3;
- “красивая ontology-схема” без рабочего graph runtime;
- graph-поля в payload без влияния на retrieval/problem assembly/answer;
- формальный builder без typed traversal и causal value;
- скрытая реализация Stage 56 под видом Stage 4;
- создание новых абстракций без runtime-пользы;
- переписывание рабочего контура ради абстрактной чистоты;
- неявное изменение scope;
@ -344,12 +399,12 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
Если в процессе работы появляется одно или несколько из следующих явлений, нужно остановиться и пересобрать plan:
- предлагается graph runtime как обязательный путь Stage 3;
- graph-модель проектируется без runtime-использования в retrieval/problem assembly/lifecycle;
- предлагается full investigation orchestration для “удобства”;
- lifecycle-модели проектируются без data/evidence mapping;
- ranking и answer не получают lifecycle-интеграцию;
- для Stage 3 предлагается большой platform refactor;
- формируется новый data model слой без связи с acceptance criteria Stage 3.
- relation semantics задаются без provenance/evidence mapping;
- answer слой не получает graph-based объяснения;
- для Stage 4 предлагается большой platform refactor;
- формируется новый data model слой без связи с acceptance criteria Stage 4.
---
@ -357,13 +412,14 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
Текущая волна считается завершённой только если выполнены одновременно все условия:
1. Реализован scope Stage 3, а не произвольный “улучшенный вариант”.
1. Реализован scope Stage 4, а не произвольный “улучшенный вариант”.
2. Текущий рабочий контур не разрушен.
3. Новые lifecycle contracts описаны явно.
3. Новые graph contracts описаны явно.
4. Есть тесты и/или проверяемые критерии для внесённых изменений.
5. Нет скрытого уезда в Stage 46.
5. Нет скрытого уезда в Stage 56.
6. Изменения совместимы с platform core ТЗ.
7. Зафиксировано, что сознательно осталось за пределами текущего этапа.
8. Run-артефакты оформлены по стандарту `дата -> Stage -> Wave` и включают обязательную папку `prompt_dialogs`.
---
@ -371,10 +427,10 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
Текущая итерация должна дать следующий результат:
- lifecycle-aware problem reasoning вместо generic lifecycle labels;
- stage/transition-aware ranking на covered-доменах;
- более прикладные ответы по сценариям 51/60, 97, ОС, НДС и period close;
- рабочий lifecycle runtime-контур, пригодный для дальнейшего развития.
- graph-backed causal reasoning вместо локальных эвристических связок;
- typed edge traversal в сценариях cross-branch и period-impact;
- более прикладные ответы с явным путём проблемы по связям;
- рабочий graph runtime-контур, пригодный для Stage 5 investigation engine.
---
@ -394,6 +450,6 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
Главный вопрос перед любым изменением:
**Это действительно необходимо для Stage 3, или это попытка преждевременно реализовать Stage 46?**
**Это действительно необходимо для Stage 4, или это попытка преждевременно реализовать Stage 56?**
Если ответ неочевиден, изменение откладывается и выносится на отдельное согласование.

View File

@ -0,0 +1,58 @@
# STAGE_03_CLOSEOUT_2026-03-26
## Статус
- Этап: Stage 3 / Lifecycle Formalization
- Решение: `Accepted / Closed`
- Дата фиксации: 2026-03-26
---
## Что подтверждено
1. `03_S3-97-STALLED-NODES` выведен из `out_of_scope` в `in_scope`.
2. Схлопывание доменов в `bank_settlement` устранено.
3. Synthetic placeholders удалены из user-facing `assistant_reply`.
4. Stage 2 regression и Stage 3 probe разделение сохранено.
5. Mojibake cleanup в user-facing layer завершён и подтверждён на всех 9 Stage 3 lifecycle probe кейсах.
---
## Финальные артефакты Stage 3
- Основной финальный run:
- `llm_normalizer/docs/runs/2026-03-26_Stage_3_Wave_6_Mojibake_Final_MicroPatch`
- В run-папке присутствует `prompt_dialogs/stage3_lifecycle_probe`:
- `01_S3-51-WRONG-CLOSE-TYPE`
- `02_S3-60-PAYMENT-WITHOUT-CLOSURE`
- `03_S3-97-STALLED-NODES`
- `04_S3-97-EXPECTED-VS-ACTUAL`
- `05_S3-OS-BRANCH-DIVERGENCE`
- `06_S3-OS-TERMINAL-GAP`
- `07_S3-VAT-CROSS-BRANCH-CONFLICT`
- `08_S3-VAT-ACTUAL-VS-EXPECTED`
- `09_S3-PERIOD-CLOSE-LIFECYCLE-IMPACT`
---
## Техническая валидация на момент закрытия
- `npm test` (backend): PASS
- `npm run build` (backend): PASS
---
## Что переносится в Stage 4
1. Ввести graph-backed causal layer поверх текущих Stage 2/3 контрактов.
2. Перевести retrieval/problem/lifecycle reasoning на typed graph connectivity.
3. Подготовить архитектурную базу под Stage 5 investigation engine без преждевременной реализации Stage 5.
---
## Scope-дисциплина
- Stage 3 закрыт без изменения prompt-set как механизма решения runtime-проблем.
- Stage 3 завершён без redesign transport/endpoint/base routing.
- Stage 3 завершён без скрытой реализации Stage 46.

View File

@ -12,12 +12,22 @@
## Статус документа
- Статус: рабочая карта реализации Stage 3
- Статус: Stage 3 завершён и зафиксирован как accepted (2026-03-26)
- Язык: русский
- Режим использования: обязателен к прочтению перед любыми изменениями по Stage 3
- При конфликте по архитектурным ограничениям приоритет имеет `TZ_Platform_Core_Accounting_Assistant_Mode.md`
- При конфликте по общему режиму работы Codex приоритет имеет `CODEX_MASTER_BRIEF.md`
### Фиксация закрытия Stage 3 (2026-03-26)
- Приёмка Stage 3 подтверждена.
- Финальный micro-patch по mojibake закрыт.
- Full test suite после финальной стабилизации: `npm test` = PASS.
- Финальные run-артефакты Stage 3:
- `llm_normalizer/docs/runs/2026-03-26_Stage_3_Wave_6_Mojibake_Final_MicroPatch`
Документ сохраняется как reference-card завершённого этапа и как baseline для Stage 4.
---
## Контекст
@ -191,7 +201,9 @@ Codex должен вернуть не только код, но и набор
- что сделано;
- что не сделано сознательно;
- какие риски остались;
- что подготовлено для Stage 4.
- что подготовлено для Stage 4;
- run-папка в `llm_normalizer/docs/runs` с именем по схеме `YYYY-MM-DD_Stage_<NN>_Wave_<NN>_<short_topic>`;
- обязательная папка `prompt_dialogs` с логами диалогов по кейсам (`index.json`, `<case_id>.json`, `<case_id>.md`).
---

View File

@ -0,0 +1,277 @@
# STAGE_04_TASK_CARD
## Назначение документа
Этот документ фиксирует **рабочий implementation scope четвёртого этапа** для Codex и разработчика.
Документ не заменяет Stage 4 ТЗ и не заменяет platform core ТЗ.
Его задача — перевести Stage 4 в практический рабочий контур без расползания в Stage 56.
---
## Статус документа
- Статус: рабочая карта реализации Stage 4
- Язык: русский
- Режим использования: обязателен к прочтению перед любыми изменениями по Stage 4
- При конфликте по архитектурным ограничениям приоритет имеет `TZ_Platform_Core_Accounting_Assistant_Mode.md`
- При конфликте по общему режиму работы Codex приоритет имеет `CODEX_MASTER_BRIEF.md`
---
## Контекст
- Stage 3 закрыт и принят (2026-03-26).
- Текущий baseline: lifecycle-aware reasoning работает, Stage 2 regression и Stage 3 probe разделены.
- Следующий шаг — не расширять prompt-слой, а ввести graph-backed causal layer как основу для дальнейшего investigation режима.
Опорные артефакты закрытия Stage 3:
- `llm_normalizer/docs/runs/2026-03-26_Stage_3_Wave_6_Mojibake_Final_MicroPatch`
Стартовая run-папка Stage 4 Wave 1:
- `llm_normalizer/docs/runs/2026-03-26_Stage_04_Wave_01_Kickoff`
---
## Цель Stage 4
Stage 4 должен дать **рабочее graph-ядро бухгалтерской предметной области**, чтобы retrieval, lifecycle и problem assembly опирались на единое причинно-следственное представление.
Практический результат этапа:
- типизированные graph-узлы и связи для ключевых бухгалтерских сущностей;
- runtime-построение графа с provenance/confidence;
- graph-aware planning/execution для graph-eligible запросов;
- graph-backed problem assembly и lifecycle binding;
- более причинные и проверяемые пользовательские ответы;
- измеримое улучшение по benchmark/eval.
---
## Scope текущей реализации
### В scope входят
1. **Graph contracts и schema layer**
- `AccountingGraphNode`;
- `AccountingGraphEdge`;
- `GraphSchemaRegistry`;
- доменные типы узлов/связей для покрываемых сценариев.
2. **Graph runtime core**
- `GraphBuilder`;
- `GraphTraversalPolicy`;
- `GraphProvenanceLayer`;
- `GraphValidationLayer`.
3. **Интеграция в retrieval path**
- graph eligibility в planner;
- typed traversal в execution для graph-eligible кейсов;
- детекция missing/conflicting links как runtime сигналов.
4. **Интеграция в problem/lifecycle layers**
- graph-backed problem assembly;
- graph-backed lifecycle transition checks;
- корректная передача graph evidence в answer layer.
5. **Интеграция в answer layer**
- user-facing объяснение по causal path;
- явная фиксация отсутствующих/конфликтных связей;
- сохранение честных ограничений confidence/coverage.
6. **Quality контур Stage 4**
- unit/integration тесты graph core;
- regression на Stage 2/Stage 3 маршрутах;
- benchmark/eval до/после по graph value сценариям.
---
## Что не входит в scope
### Не делать сейчас
- полноценный Investigation Engine Stage 5;
- full orchestration case-runtime с глубокой ветвизацией;
- live verification core path и full mode split Stage 6;
- глобальный enterprise graph beyond accounting core Stage 4;
- большой рефактор transport/endpoint/base routing;
- попытки закрыть graph-gap только prompt-изменениями.
---
## Обязательные результаты этапа
### 1. Рабочая graph-модель по целевым доменам
Должны быть внедрены типизированные узлы/связи как минимум для доменов, критичных для текущего набора кейсов:
- 51/60 расчётные цепочки;
- 97 (расходы будущих периодов);
- ОС;
- НДС;
- period_close.
### 2. Рабочий graph runtime
Должен существовать runtime-контур, который:
- строит graph из нормализованных сущностей;
- хранит provenance/confidence;
- поддерживает typed traversal;
- выявляет missing/conflicting edges.
### 3. Graph-backed retrieval и problem assembly
- graph-eligible queries реально используют traversal;
- problem units используют graph connectivity, а не только proximity/heuristics.
### 4. Graph-backed lifecycle binding
- lifecycle transition checks используют graph relations;
- missing/invalid transitions имеют graph-опору.
### 5. Улучшение user-facing объяснений
- ответы показывают причинный путь проблемы;
- видны узлы/связи разрыва;
- сохраняется прозрачность uncertainty.
### 6. Измеримость ценности
- есть benchmark suite;
- есть before/after evidence;
- есть отчёт, где именно graph layer даёт прирост качества.
---
## Ожидаемые сущности Stage 4
Минимальный набор сущностей/компонентов:
1. `AccountingGraphNode`
2. `AccountingGraphEdge`
3. `GraphSchemaRegistry`
4. `GraphBuilder`
5. `GraphTraversalPolicy`
6. `GraphProvenanceLayer`
7. `GraphValidationLayer`
8. `GraphBackedProblemAssembly`
9. `GraphBackedLifecycleBinding`
---
## Жёсткие implementation-ограничения
### 1. Не ломать рабочий контур
Без прямой необходимости не переписывать:
- transport;
- endpoint;
- base routing;
- normalizer pipeline.
### 2. Graph только с runtime-value
Graph считается внедрённым только если влияет на:
- retrieval execution;
- problem assembly;
- lifecycle reasoning;
- user-facing explanation.
### 3. Никаких бездоказательных узлов/связей
Нельзя добавлять node/edge, если нет:
- источника данных;
- evidence mapping;
- provenance trace.
### 4. Stage 5/6 не реализовывать внутри Stage 4
Любая попытка внедрить full investigation orchestration или live verification core отклоняется как non-scope.
---
## Порядок работы по Stage 4
### Шаг 1. Прочитать материалы
Обязательно прочитать:
- `CODEX_MASTER_BRIEF.md`
- `TZ_Platform_Core_Accounting_Assistant_Mode.md`
- `TZ_Stage_4_Accounting_Ontology_Graph_Core_Assistant_Mode.md`
- `TZ_Stage_3_Lifecycle_Formalization_Assistant_Mode.md`
- `TZ_Stage_2_Retrieval_Unit_Shift_Assistant_Mode.md`
### Шаг 2. Сделать code-level mapping
Нужно определить:
- где безопасно встраивать graph builder;
- где planner/execution могут включать graph traversal;
- где problem/lifecycle layers принимают graph evidence;
- где answer layer получает causal path.
### Шаг 3. План без кода
До начала реализации Codex обязан вернуть:
- gap analysis;
- file-level plan;
- contracts/types plan;
- test/eval plan;
- explicit non-scope.
### Шаг 4. Реализация малыми волнами
Рекомендуемая последовательность:
- Волна 1: graph schema + registry;
- Волна 2: graph builder + provenance;
- Волна 3: retrieval planner/execution graph integration;
- Волна 4: problem/lifecycle graph binding;
- Волна 5: answer integration;
- Волна 6: benchmark/eval + hardening.
---
## Acceptance criteria (кратко)
Stage 4 считается закрытым только если одновременно:
1. Graph contracts реализованы и используются runtime.
2. Graph traversal реально участвует в graph-eligible запросах.
3. Problem assembly использует graph connectivity.
4. Lifecycle checks используют graph transitions.
5. User-facing ответы отражают causal graph path.
6. Есть before/after подтверждение улучшения.
7. Нет скрытого выезда в Stage 56.
8. Run-артефакты оформлены по стандарту `date -> Stage -> Wave`, включая `prompt_dialogs`.
---
## Что Codex обязан явно указать в конце работы
1. Что сделано
2. Какие файлы изменены
3. Какие graph-сущности и компоненты введены
4. Какие тесты добавлены
5. Какие acceptance criteria закрыты
6. Что сознательно НЕ реализовано
7. Какие риски и ограничения остались
8. Что подготовлено для Stage 5
---
## Definition of Done
Stage 4 завершён, если одновременно:
- graph-ядро работает в runtime, а не только в документации;
- retrieval/problem/lifecycle/answer слои используют graph signals;
- ответы становятся причинно связными и проверяемыми;
- есть измеримая прибавка по benchmark/eval;
- рабочий контур не разрушен;
- нет premature implementation Stage 56.
---
## Короткая практическая формула этапа
**Stage 4 = переход от lifecycle-aware reasoning к graph-backed accounting causality.**

File diff suppressed because it is too large Load Diff

View File

@ -14,15 +14,85 @@ const UUID_PATTERN = /\b[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]
const LONG_HEX_PATTERN = /\b[0-9a-f]{24,}\b/gi;
const RAW_REF_BLOB_PATTERN = /\bevidence_source_ref_v1\|[^\s,;]+/gi;
const RAW_REF_TOKEN_PATTERN = /\b(?:source_ref|canonical_ref|entity_id|fragment_id|guid|uuid)\b/gi;
const SYNTHETIC_PLACEHOLDER_PATTERN = /\bunknown_entity(?::[^\s,;]+)?\b/gi;
const SYNTHETIC_FALLBACK_MARKER_PATTERN = /\b(?:unknown_source|unknown_record)\b/gi;
const SYNTHETIC_ROUTE_TOKEN_PATTERN = /\bbatch_refresh_then_store:[^\s,;]+/gi;
const CYRILLIC_MOJIBAKE_FRAGMENT_PATTERN = /(?:[\u0420\u0421][\u0080-\u04FF\u2000-\u20CF]){2,}/u;
const LATIN_MOJIBAKE_FRAGMENT_PATTERN = /(?:[\u00D0\u00D1][\u0080-\u00FF]){2,}/u;
const SHORT_CYRILLIC_MOJIBAKE_TOKEN_PATTERN = /^[\u0420\u0421][\u0080-\u04FF\u2000-\u20CF]{1,2}$/u;
const PREFIXED_SHORT_CYRILLIC_MOJIBAKE_TOKEN_PATTERN = /^[\p{L}\p{N}_-]+[\u0420\u0421][\u0080-\u04FF\u2000-\u20CF]{1,2}$/u;
const MOJIBAKE_SINGLE_MARKER_PATTERN = /^[\u0420\u0421\u00D0\u00D1]$/u;
const MOJIBAKE_MARKER_CHAR_PATTERN = /[\u0402\u0403\u040A\u040C\u040E\u040F\u0452\u0453\u0459\u045A\u045C\u045E\u045F\u201A\u201E\u2020\u2021\u2026\u2030\u20AC\u2122]/u;
const CYRILLIC_MOJIBAKE_FRAGMENT_GLOBAL_PATTERN = /(?:[\u0420\u0421][\u0080-\u04FF\u2000-\u20CF]){2,}/gu;
const LATIN_MOJIBAKE_FRAGMENT_GLOBAL_PATTERN = /(?:[\u00D0\u00D1][\u0080-\u00FF]){2,}/g;
const MOJIBAKE_MARKER_CHAR_GLOBAL_PATTERN = /[\u0402\u0403\u040A\u040C\u040E\u040F\u0452\u0453\u0459\u045A\u045C\u045E\u045F\u201A\u201E\u2020\u2021\u2026\u2030\u20AC\u2122]/gu;
function normalizeToken(value) {
return value.replace(/^[^\p{L}\p{N}_-]+|[^\p{L}\p{N}_-]+$/gu, "");
}
function isLikelyMojibakeToken(value) {
const token = normalizeToken(String(value ?? ""));
if (!token) {
return false;
}
if (MOJIBAKE_SINGLE_MARKER_PATTERN.test(token)) {
return true;
}
if (SHORT_CYRILLIC_MOJIBAKE_TOKEN_PATTERN.test(token)) {
return true;
}
if (token.length <= 8 && PREFIXED_SHORT_CYRILLIC_MOJIBAKE_TOKEN_PATTERN.test(token)) {
return true;
}
return CYRILLIC_MOJIBAKE_FRAGMENT_PATTERN.test(token) || LATIN_MOJIBAKE_FRAGMENT_PATTERN.test(token);
}
function countMojibakeTokens(value) {
return String(value ?? "")
.split(/[\s,.;:!?()[\]{}"']+/g)
.filter((token) => token.length > 0)
.filter((token) => isLikelyMojibakeToken(token)).length;
}
function countMojibakeSingleMarkers(value) {
return String(value ?? "")
.split(/[\s,.;:!?()[\]{}"']+/g)
.filter((token) => token.length > 0)
.map((token) => normalizeToken(token))
.filter((token) => MOJIBAKE_SINGLE_MARKER_PATTERN.test(token)).length;
}
function stripMojibakeFragments(value) {
const removedByToken = String(value ?? "")
.split(/(\s+)/g)
.map((part) => {
if (/^\s+$/u.test(part)) {
return part;
}
return isLikelyMojibakeToken(part) ? "" : part;
})
.join("");
return removedByToken
.replace(CYRILLIC_MOJIBAKE_FRAGMENT_GLOBAL_PATTERN, "")
.replace(LATIN_MOJIBAKE_FRAGMENT_GLOBAL_PATTERN, "")
.replace(MOJIBAKE_MARKER_CHAR_GLOBAL_PATTERN, "")
.replace(/\s+([,.;:!?])/g, "$1")
.replace(/\s{2,}/g, " ")
.trim();
}
function looksLikeMojibake(value) {
const text = String(value ?? "");
if (!text.trim()) {
return false;
}
if (/(?:Р.|С.){5,}/u.test(text)) {
const tokenHits = countMojibakeTokens(text);
const singleMarkers = countMojibakeSingleMarkers(text);
if (tokenHits >= 2 || (tokenHits >= 1 && singleMarkers >= 1) || singleMarkers >= 3) {
return true;
}
if (/[ЃѓЂђЌќЎў]/u.test(text)) {
if (MOJIBAKE_MARKER_CHAR_PATTERN.test(text)) {
return true;
}
if (CYRILLIC_MOJIBAKE_FRAGMENT_PATTERN.test(text) || LATIN_MOJIBAKE_FRAGMENT_PATTERN.test(text)) {
return true;
}
if (/\uFFFD/u.test(text)) {
return true;
}
return false;
@ -59,14 +129,29 @@ function scrubRawTechnicalRefs(value) {
.replace(/\s{2,}/g, " ")
.trim();
}
function sanitizeUserFacingReply(value) {
return scrubRawTechnicalRefs(value)
.replace(/[ \t]+\n/g, "\n")
.replace(/\n{3,}/g, "\n\n")
function stripSyntheticPlaceholders(value) {
return String(value ?? "")
.replace(SYNTHETIC_PLACEHOLDER_PATTERN, "")
.replace(SYNTHETIC_FALLBACK_MARKER_PATTERN, "")
.replace(SYNTHETIC_ROUTE_TOKEN_PATTERN, "")
.replace(/[;,:]\s*[;,:]+/g, "; ")
.replace(/\s{2,}/g, " ")
.trim();
}
function sanitizeUserFacingReply(value) {
const normalized = scrubRawTechnicalRefs(value).replace(/[ \t]+\n/g, "\n");
const cleanedLines = normalized
.split(/\r?\n/g)
.map((line) => stripSyntheticPlaceholders(line))
.map((line) => stripMojibakeFragments(line))
.map((line) => line.trim())
.filter((line) => line.length > 0)
.filter((line) => !looksLikeMojibake(line));
const cleaned = cleanedLines.join("\n").replace(/\n{3,}/g, "\n\n").trim();
return cleaned || "Available data requires clarification for a reliable user-facing answer.";
}
function sanitizeUserText(value) {
const normalized = scrubRawTechnicalRefs(String(value ?? "").replace(/\s+/g, " ").trim());
const normalized = stripMojibakeFragments(stripSyntheticPlaceholders(scrubRawTechnicalRefs(String(value ?? "").replace(/\s+/g, " ").trim())));
if (!normalized) {
return null;
}
@ -180,13 +265,13 @@ function buildFallbackWhyIncluded(results) {
const filteredRecords = summaryNumber(result, "filtered_records_after_narrowing");
const checkedRecords = summaryNumber(result, "checked_records");
if (routeFocus) {
lines.push(`Проверка выполнена по профилю ${routeFocus}.`);
lines.push(`Проверка выполнена по профилю ${routeFocus}.`);
}
if (sourceRecords !== null && filteredRecords !== null && filteredRecords < sourceRecords) {
lines.push(`Применено сужение выборки: ${filteredRecords} из ${sourceRecords} записей.`);
lines.push(`Применено сужение выборки: ${filteredRecords} из ${sourceRecords} записей.`);
}
if (checkedRecords !== null) {
lines.push(`Проверено записей в текущем проходе: ${checkedRecords}.`);
lines.push(`Проверено записей в текущем проходе: ${checkedRecords}.`);
}
}
return sanitizeUserLines(lines, 4);
@ -195,34 +280,34 @@ function buildFallbackSelectionReasons(results) {
const lines = [];
for (const result of results.slice(0, 2)) {
if (summaryBoolean(result, "semantic_narrowing_applied")) {
lines.push("Отбор выполнен по семантическому сужению предметной области.");
lines.push("Отбор выполнен по семантическому сужению предметной области.");
}
const rankingBasis = summaryStringArray(result, "ranking_basis");
if (rankingBasis.length > 0) {
lines.push(`Ранжирование основано на: ${rankingBasis.join(", ")}.`);
lines.push(`Ранжирование основано на: ${rankingBasis.join(", ")}.`);
}
if (summaryBoolean(result, "broad_guard_applied")) {
lines.push("Применен broad-query guard для контроля ложной точности.");
lines.push("Применен broad-query guard для контроля ложной точности.");
}
}
if (lines.length === 0) {
lines.push("Отбор выполнен по совпадению предметных сигналов и доступной evidence-опоры.");
lines.push("Отбор выполнен по совпадению предметных сигналов и доступной evidence-опоры.");
}
return sanitizeUserLines(lines, 4);
}
function suggestNextStep(requirements, coverage) {
const next = [];
if (coverage.clarification_needed_for.length > 0) {
next.push("Уточните период, счет, документ или контрагента для требований: " + coverage.clarification_needed_for.join(", ") + ".");
next.push("Уточните период, счет, документ или контрагента для требований: " + coverage.clarification_needed_for.join(", ") + ".");
}
if (coverage.requirements_uncovered.length > 0) {
next.push("Проверьте непокрытые требования: " + coverage.requirements_uncovered.join(", ") + ".");
next.push("Проверьте непокрытые требования: " + coverage.requirements_uncovered.join(", ") + ".");
}
if (coverage.out_of_scope_requirements.length > 0) {
next.push("Часть запроса вне текущего учетного контура: " + coverage.out_of_scope_requirements.join(", ") + ".");
next.push("Часть запроса вне текущего учетного контура: " + coverage.out_of_scope_requirements.join(", ") + ".");
}
if (next.length === 0 && requirements.length > 0) {
next.push("Следующим шагом можно открыть технический разбор и углубить проверку по выбранным объектам.");
next.push("Следующим шагом можно открыть технический разбор и углубить проверку по выбранным объектам.");
}
return next;
}
@ -264,21 +349,25 @@ function selectProblemUnitSummary(results) {
return selected;
}
function formatAffectedScope(unit) {
const accountScope = sanitizeUserLines(unit.affected_accounts, 2);
const counterpartyScope = sanitizeUserLines(unit.affected_counterparties, 2);
const documentScope = sanitizeUserLines(unit.affected_documents, 2);
const entityScope = sanitizeUserLines(unit.affected_entities, 2);
const scopeParts = [];
if (unit.affected_accounts.length > 0) {
scopeParts.push(`счета: ${unit.affected_accounts.slice(0, 2).join(", ")}`);
if (accountScope.length > 0) {
scopeParts.push(`accounts: ${accountScope.join(", ")}`);
}
if (unit.affected_counterparties.length > 0) {
scopeParts.push(`контрагенты: ${unit.affected_counterparties.slice(0, 2).join(", ")}`);
if (counterpartyScope.length > 0) {
scopeParts.push(`counterparties: ${counterpartyScope.join(", ")}`);
}
if (unit.affected_documents.length > 0) {
scopeParts.push(`документы: ${unit.affected_documents.slice(0, 2).join(", ")}`);
if (documentScope.length > 0) {
scopeParts.push(`documents: ${documentScope.join(", ")}`);
}
if (scopeParts.length === 0 && unit.affected_entities.length > 0) {
scopeParts.push(`объекты: ${unit.affected_entities.slice(0, 2).join(", ")}`);
if (scopeParts.length === 0 && entityScope.length > 0) {
scopeParts.push(`entities: ${entityScope.join(", ")}`);
}
if (scopeParts.length === 0) {
return "затронутый контур требует уточнения";
return "affected scope requires clarification";
}
return scopeParts.join("; ");
}
@ -339,47 +428,47 @@ function buildProblemCentricActions(input) {
const actions = [];
const unitTypes = new Set(input.units.map((item) => item.problem_unit_type));
if (unitTypes.has("broken_chain_segment")) {
actions.push("Проверьте связку выписка -> документ -> проводка по проблемным участкам цепочки.");
actions.push("Проверьте связку выписка -> документ -> проводка по проблемным участкам цепочки.");
}
if (unitTypes.has("unresolved_settlement_cluster")) {
actions.push("Сверьте хвосты по расчетам: закрылся ли документ оплаты корректным закрывающим документом.");
actions.push("Сверьте хвосты по расчетам: закрылся ли документ оплаты корректным закрывающим документом.");
}
if (unitTypes.has("period_risk_cluster")) {
actions.push("Оцените влияние дефекта на закрытие периода и корректность регламентных операций.");
actions.push("Оцените влияние дефекта на закрытие периода и корректность регламентных операций.");
}
if (unitTypes.has("cross_branch_inconsistency_cluster")) {
actions.push("Сверьте противоречия между документами, проводками и регистрами по НДС/межконтурным связям.");
actions.push("Сверьте противоречия между документами, проводками и регистрами по НДС/межконтурным связям.");
}
if (unitTypes.has("lifecycle_anomaly_node")) {
actions.push("Проверьте lifecycle объекта: ожидаемый этап не должен оставаться в partially_linked состоянии.");
actions.push("Проверьте lifecycle объекта: ожидаемый этап не должен оставаться в partially_linked состоянии.");
}
for (const unit of input.units) {
if (unit.lifecycle_defect_type === "stale_active_state") {
actions.push("Проверьте, почему объект завис: ожидаемый переход не должен оставаться в активной стадии.");
actions.push("Проверьте, почему объект завис: ожидаемый переход не должен оставаться в активной стадии.");
}
if (unit.lifecycle_defect_type === "misclosed_state") {
actions.push("Проверьте закрывающий документ и проводки: закрытие может быть формальным, но некорректным по пути.");
actions.push("Проверьте закрывающий документ и проводки: закрытие может быть формальным, но некорректным по пути.");
}
if (unit.lifecycle_defect_type === "cross_branch_state_conflict") {
actions.push("Сверьте бухгалтерскую и смежную ветки (например, НДС/расчеты): обнаружен межконтурный конфликт состояния.");
actions.push("Сверьте бухгалтерскую и смежную ветки (например, НДС/расчеты): обнаружен межконтурный конфликт состояния.");
}
}
if (input.mode === "clarification_required") {
if (input.missingAnchors.period) {
actions.push("Уточните период проверки, чтобы зафиксировать границы проблемного контура.");
actions.push("Уточните период проверки, чтобы зафиксировать границы проблемного контура.");
}
if (input.missingAnchors.account) {
actions.push("Уточните счет или группу счетов для предметной локализации дефекта.");
actions.push("Уточните счет или группу счетов для предметной локализации дефекта.");
}
if (input.missingAnchors.documentOrObject) {
actions.push("Укажите конкретный документ или объект трассировки для проверки механизма отклонения.");
actions.push("Укажите конкретный документ или объект трассировки для проверки механизма отклонения.");
}
if (input.missingAnchors.counterparty) {
actions.push("Укажите контрагента/договор, чтобы проверить хвосты и разрывы на конкретной связке.");
actions.push("Укажите контрагента/договор, чтобы проверить хвосты и разрывы на конкретной связке.");
}
}
if (input.coverageReport.requirements_uncovered.length > 0) {
actions.push(`Закройте непокрытые требования: ${input.coverageReport.requirements_uncovered.join(", ")}.`);
actions.push(`Закройте непокрытые требования: ${input.coverageReport.requirements_uncovered.join(", ")}.`);
}
return uniqueStrings(actions, 6);
}
@ -390,28 +479,28 @@ function buildProblemCentricClarifications(input) {
const questions = [];
const unitTypes = new Set(input.units.map((item) => item.problem_unit_type));
if (input.missingAnchors.period) {
questions.push("Уточните период (например, 2020-06), в котором нужно проверить проблемный кластер.");
questions.push("Уточните период (например, 2020-06), в котором нужно проверить проблемный кластер.");
}
if (input.missingAnchors.account) {
questions.push("Уточните счет или связку счетов (например, 51/60), где вы ожидаете дефект.");
questions.push("Уточните счет или СЃРІСЏР·РєСѓ счетов (например, 51/60), РіРґРµ РІС РѕР¶РёРґР°РµС‚Рµ дефект.");
}
if (input.missingAnchors.documentOrObject) {
questions.push("Укажите документ/объект, от которого нужно строить проверку цепочки.");
questions.push("Укажите документ/объект, РѕС РєРѕС‚РѕСЂРѕРіРѕ РЅСѓР¶РЅРѕ строить проверку цепочки.");
}
if (input.missingAnchors.counterparty) {
questions.push("Укажите контрагента или договор, по которому проверить незакрытую экспозицию.");
questions.push("Укажите контрагента или договор, по которому проверить незакрытую экспозицию.");
}
if (unitTypes.has("broken_chain_segment")) {
questions.push("Уточните участок цепочки: выписка, платежный документ или проводка.");
questions.push("Уточните участок цепочки: выписка, платежный документ или проводка.");
}
if (unitTypes.has("period_risk_cluster")) {
questions.push("Уточните, какой этап закрытия периода критичен: начисление, закрытие счетов или НДС-блок.");
questions.push("Уточните, какой этап закрытия периода критичен: начисление, закрытие счетов или НДС-блок.");
}
if (unitTypes.has("unresolved_settlement_cluster")) {
questions.push("Уточните, интересуют хвосты поставщиков, покупателей или оба направления.");
questions.push("Уточните, интересуют хвосты поставщиков, покупателей или оба направления.");
}
if (input.coverageReport.clarification_needed_for.length > 0) {
questions.push(`Закройте уточнения для требований: ${input.coverageReport.clarification_needed_for.join(", ")}.`);
questions.push(`Закройте уточнения для требований: ${input.coverageReport.clarification_needed_for.join(", ")}.`);
}
return uniqueStrings(questions, 6);
}
@ -522,10 +611,10 @@ function limitationReasonToText(code) {
function detectMissingAnchors(userMessage) {
const lower = String(userMessage ?? "").toLowerCase();
const hasPeriod = /\b20\d{2}(?:[-./](?:0[1-9]|1[0-2]))?\b/.test(lower);
const hasAccount = /(?:\bсчет\b|\baccount\b|\bschet\b|\b\d{2}(?:\.\d{2})?\b)/i.test(lower);
const hasDocumentOrObject = /(?:документ|invoice|guid|object|obj|#\d+|\bid\b|\bref\b|dokument|doc)/i.test(lower);
const hasCounterparty = /(?:контрагент|supplier|buyer|customer|kontragent|postavsh|pokupatel)/i.test(lower);
const hasAnomalyType = /(?:аномал|risk|отклон|разрыв|mismatch|duplicate|tail|цепочк|anomali|hvost)/i.test(lower);
const hasAccount = /(?:\bсчет\b|\baccount\b|\bschet\b|\b\d{2}(?:\.\d{2})?\b)/i.test(lower);
const hasDocumentOrObject = /(?:документ|invoice|guid|object|obj|#\d+|\bid\b|\bref\b|dokument|doc)/i.test(lower);
const hasCounterparty = /(?:контрагент|supplier|buyer|customer|kontragent|postavsh|pokupatel)/i.test(lower);
const hasAnomalyType = /(?:аномал|risk|отклон|разрыв|mismatch|duplicate|tail|цепочк|anomali|hvost)/i.test(lower);
return {
period: !hasPeriod,
account: !hasAccount,
@ -541,53 +630,53 @@ function buildClarificationQuestions(input) {
return questions;
}
if (input.missingAnchors.period) {
questions.push("Уточните период проверки (например, 2020-06).");
questions.push("Уточните период проверки (например, 2020-06).");
}
if (input.missingAnchors.account) {
questions.push("Уточните счет или группу счетов (например, 19, 60, 62).");
questions.push("Уточните счет или группу счетов (например, 19, 60, 62).");
}
if (input.missingAnchors.documentOrObject) {
questions.push("Укажите документ/GUID/конкретный объект для трассировки.");
questions.push("Укажите документ/GUID/конкретный объект для трассировки.");
}
if (input.missingAnchors.counterparty) {
questions.push("Укажите контрагента или группу контрагентов.");
questions.push("Укажите контрагента или группу контрагентов.");
}
if (input.policySignals.broad_query_detected && input.missingAnchors.anomalyType) {
questions.push("Уточните тип отклонения: разрыв цепочки, неверный документ или аномальный риск.");
questions.push("Уточните тип отклонения: разрыв цепочки, неверный документ или аномальный риск.");
}
if (input.coverageReport.clarification_needed_for.length > 0) {
questions.push(`Закройте уточнения для требований: ${input.coverageReport.clarification_needed_for.join(", ")}.`);
questions.push(`Закройте уточнения для требований: ${input.coverageReport.clarification_needed_for.join(", ")}.`);
}
return uniqueStrings(questions, 6);
}
function buildRecommendedActions(input) {
const actions = [];
if (input.mode === "focused_grounded") {
actions.push("Проверьте 1-2 ключевые записи в учетной базе и зафиксируйте итог в рабочем файле проверки.");
actions.push("Проверьте 1-2 ключевые записи в учетной базе и зафиксируйте итог в рабочем файле проверки.");
}
if (input.mode === "broad_partial") {
actions.push("Сузьте запрос до периода + счета или периода + документа и повторите проверку.");
actions.push("Сузьте запрос до периода + счета или периода + документа и повторите проверку.");
}
if (input.mode === "clarification_required") {
actions.push("Дайте недостающие якоря (период/счет/объект), иначе сильный factual вывод невозможен.");
actions.push("Дайте недостающие якоря (период/счет/объект), иначе сильный factual вывод невозможен.");
}
if (input.coverageReport.requirements_uncovered.length > 0) {
actions.push(`Закройте непокрытые требования: ${input.coverageReport.requirements_uncovered.join(", ")}.`);
actions.push(`Закройте непокрытые требования: ${input.coverageReport.requirements_uncovered.join(", ")}.`);
}
if (input.coverageReport.requirements_partially_covered.length > 0) {
actions.push(`Доуточните частично покрытые требования: ${input.coverageReport.requirements_partially_covered.join(", ")}.`);
actions.push(`Доуточните частично покрытые требования: ${input.coverageReport.requirements_partially_covered.join(", ")}.`);
}
if (input.policySignals.broad_query_detected && input.policySignals.narrowing_strength !== "strong") {
actions.push("Добавьте более узкий контекст: тип отклонения, группу документов и бизнес-участок.");
actions.push("Добавьте более узкий контекст: тип отклонения, группу документов и бизнес-участок.");
}
if (input.limitationReasonCodes.includes("snapshot_only")) {
actions.push("Сверьте критичные выводы с live source-of-record в 1C.");
actions.push("Сверьте критичные выводы с live source-of-record в 1C.");
}
if (input.limitationReasonCodes.includes("weak_source_mapping")) {
actions.push("Проверьте source mapping для связей document/register по указанным ref.");
actions.push("Проверьте source mapping для связей document/register по указанным ref.");
}
if (input.sourceRefs.length > 0) {
actions.push(`Начните проверку с ${input.sourceRefs.length} подтвержденных записей и сверьте их с первичными документами.`);
actions.push(`Начните проверку с ${input.sourceRefs.length} подтвержденных записей и сверьте их с первичными документами.`);
}
return uniqueStrings(actions, 6);
}
@ -674,84 +763,88 @@ function buildPolicyDecision(input) {
}
function buildAnswerSummary(mode) {
if (mode === "focused_grounded")
return "Сформирован прямой ответ на основе подтвержденной опоры.";
return "Сформирован прямой ответ на основе подтвержденной опоры.";
if (mode === "broad_partial")
return "Вывод ограничен: есть частичная опора, но не полный coverage.";
return "Вывод ограничен: есть частичная опора, но не полный coverage.";
if (mode === "clarification_required")
return "Нужны уточнения: без сужения strong factual вывод ненадежен.";
return "Нужны уточнения: без сужения strong factual вывод ненадежен.";
if (mode === "out_of_scope")
return "Запрос вне доступного учетного контура.";
return "Запрос вне доступного учетного контура.";
if (mode === "route_mismatch")
return "Результат маршрута не совпал с предметом вопроса.";
return "Результат маршрута не совпал с предметом вопроса.";
if (mode === "empty")
return "В текущем срезе данных релевантные записи не обнаружены.";
return "Ртекущем срезе данных релевантные записи не обнаружены.";
if (mode === "no_grounded")
return "Недостаточно опоры для обоснованного ответа.";
return "Не удалось собрать обоснованный ответ по текущему запросу.";
return "Недостаточно опоры для обоснованного ответа.";
return "Не удалось собрать обоснованный ответ по текущему запросу.";
}
function buildDirectAnswer(input) {
const topFact = firstMeaningfulFact(input.retrievalResults);
if (input.mode === "focused_grounded") {
return topFact ?? "Подтвержденный результат получен; можно продолжать предметную проверку без деградации.";
return topFact ?? "Подтвержденный результат получен; можно продолжать предметную проверку без деградации.";
}
if (input.mode === "broad_partial") {
if (topFact) {
return `Доступен ограниченный подтвержденный фрагмент: ${topFact}`;
return `Доступен ограниченный подтвержденный фрагмент: ${topFact}`;
}
return "Есть только ограниченная опора; вывод дан в частичном режиме без ложной точности.";
return "Есть только ограниченная опора; вывод дан в частичном режиме без ложной точности.";
}
if (input.mode === "clarification_required") {
return "Текущий запрос слишком широкий или недоопределен; надежный factual вывод пока невозможен.";
return "Текущий запрос слишком широкий или недоопределен; надежный factual вывод пока невозможен.";
}
if (input.mode === "out_of_scope") {
return "Могу отвечать только в пределах данных доступного учетного контура.";
return "Могу отвечать только в пределах данных доступного учетного контура.";
}
if (input.mode === "route_mismatch") {
return "Предмет результата не совпал с предметом вопроса; требуется уточнение фокуса.";
return "Предмет результата не совпал с предметом вопроса; требуется уточнение фокуса.";
}
if (input.mode === "empty") {
return "В текущем срезе данных проблемные записи по заданному условию не найдены.";
return "Ртекущем срезе данных проблемные записи по заданному условию не найдены.";
}
if (input.mode === "no_grounded") {
return "Недостаточно подтвержденной опоры для ответа в требуемой точности.";
return "Недостаточно подтвержденной опоры для ответа в требуемой точности.";
}
if (input.policySignals.minimum_evidence_failed) {
return "Маршрут отработал, но минимальная evidence-опора не пройдена.";
return "Маршрут отработал, но минимальная evidence-опора не пройдена.";
}
return "Не удалось сформировать обоснованный ответ; нужно уточнение запроса.";
return "Не удалось сформировать обоснованный ответ; нужно уточнение запроса.";
}
function buildProblemCentricAnswerSummary(input) {
if (input.lifecycleEnriched && input.summary?.lifecycle_enriched_units && input.summary.lifecycle_enriched_units > 0) {
if (input.mode === "clarification_required") {
return "Выявлены lifecycle-дефекты, но для надежного вывода требуется уточнение предметных якорей.";
return "Выявлены lifecycle-дефекты, но для надежного вывода требуется уточнение предметных якорей.";
}
return `Сформирован lifecycle-aware problem срез: выделено ${input.summary.lifecycle_enriched_units} lifecycle-узлов с приоритетом по дефектам перехода.`;
return `Сформирован lifecycle-aware problem срез: выделено ${input.summary.lifecycle_enriched_units} lifecycle-узлов с приоритетом по дефектам перехода.`;
}
if (input.mode === "clarification_required") {
return "Выявлены проблемные кластеры, но для надежного вывода требуется предметное уточнение фокуса.";
return "Выявлены проблемные кластеры, но для надежного вывода требуется предметное уточнение фокуса.";
}
if (input.weakUnits) {
return "Сформирован problem-centric срез с ограниченной опорой; вывод предварительный и требует до-проверки.";
return "Сформирован problem-centric срез с ограниченной опорой; вывод предварительный и требует до-проверки.";
}
if (input.summary?.units_total && input.summary.units_total > 1) {
return `Сформирован problem-centric срез: выделено ${input.summary.units_total} проблемных кластера с приоритетами.`;
return `Сформирован problem-centric срез: выделено ${input.summary.units_total} проблемных кластера с приоритетами.`;
}
return "Сформирован problem-centric срез: выделен ключевой проблемный кластер и затронутый контур.";
return "Сформирован problem-centric срез: выделен ключевой проблемный кластер и затронутый контур.";
}
function buildProblemCentricDirectAnswer(input) {
const lead = input.mode === "clarification_required"
? "Обнаружены проблемные зоны, но без уточнения якорей сильный factual-вывод преждевременен."
? "Обнаружены проблемные зоны, но без уточнения якорей сильный factual-вывод преждевременен."
: input.weakUnits
? "Выделены проблемные зоны с ограниченной надежностью; вывод дан в ограниченном режиме."
? "Выделены проблемные зоны с ограниченной надежностью; вывод дан в ограниченном режиме."
: input.lifecycleAnswerEnabled && hasLifecycleResolution(input.units)
? "Выделены lifecycle-проблемы: определены текущие/ожидаемые стадии и тип нарушения перехода."
: "Выделены ключевые проблемные зоны и их влияние на учетный контур.";
? "Выделены lifecycle-проблемы: определены текущие/ожидаемые стадии и тип нарушения перехода."
: "Выделены ключевые проблемные зоны и их влияние на учетный контур.";
const unitLines = input.units.map((unit) => {
const scope = formatAffectedScope(unit);
const lifecycleScope = input.lifecycleAnswerEnabled ? formatLifecycleScope(unit) : null;
const lifecycleInterpretation = input.lifecycleAnswerEnabled ? unit.business_lifecycle_interpretation : null;
const lifecycleInterpretation = input.lifecycleAnswerEnabled && unit.business_lifecycle_interpretation
? sanitizeUserText(unit.business_lifecycle_interpretation)
: null;
const title = sanitizeUserText(unit.title) ?? "Problem cluster detected";
const defect = sanitizeUserText(unit.business_defect_class) ?? "detected_issue";
const segments = [
`${unit.title}: ${unit.business_defect_class}`,
`${title}: ${defect}`,
scope,
lifecycleScope,
lifecycleInterpretation,
@ -762,9 +855,9 @@ function buildProblemCentricDirectAnswer(input) {
return `- ${segments.join("; ")}.`;
});
if (unitLines.length === 0) {
return `${lead}\nПроблемные кластеры не удалось детализировать в текущем срезе.`;
return `${lead}\nПроблемные кластеры не удалось детализировать в текущем срезе.`;
}
return [lead, "Проблемные кластеры:", ...unitLines].join("\n");
return [lead, "Проблемные кластеры:", ...unitLines].join("\n");
}
function buildProblemCentricAnswerStructure(input) {
const weakUnits = input.selectedUnits.every((item) => item.confidence.grade === "low");
@ -1098,20 +1191,20 @@ function composeExplainableAnswer(input, scopeLabel) {
const limitations = uniqueStrings([...extractLimitations(input.retrievalResults), ...input.groundingCheck.reasons]);
const nextSteps = suggestNextStep(input.requirements, input.coverageReport);
const lead = scopeLabel === "full"
? "Итог: запрос обработан по предмету, найденные объекты подтверждены данными контура."
: "Итог: запрос обработан частично, ниже подтвержденная часть и ограничения.";
return [
? "ИСРѕРі: запрос обработан РїРѕ предмету, найденные объекты подтверждены данными контура."
: "ИСРѕРі: запрос обработан частично, РЅРёР¶Рµ подтвержденная часть Рё ограничения.";
return sanitizeUserFacingReply([
lead,
facts.length > 0 ? "Подтвержденные результаты:\n" + formatList(facts) : "",
whyIncluded.length > 0 ? "Почему это попало в ответ:\n" + formatList(whyIncluded) : "",
selectionReasons.length > 0 ? "Основание отбора:\n" + formatList(selectionReasons) : "",
riskFactors.length > 0 ? "Подтверждающие признаки:\n" + formatList(riskFactors) : "",
interpretation.length > 0 ? "Практический смысл:\n" + formatList(interpretation) : "",
limitations.length > 0 ? "Ограничения:\n" + formatList(limitations) : "",
nextSteps.length > 0 ? "Что проверить дальше:\n" + formatList(nextSteps) : ""
facts.length > 0 ? "Подтвержденные результаты:\n" + formatList(facts) : "",
whyIncluded.length > 0 ? "Почему это попало в ответ:\n" + formatList(whyIncluded) : "",
selectionReasons.length > 0 ? "Основание отбора:\n" + formatList(selectionReasons) : "",
riskFactors.length > 0 ? "Подтверждающие признаки:\n" + formatList(riskFactors) : "",
interpretation.length > 0 ? "Практический смысл:\n" + formatList(interpretation) : "",
limitations.length > 0 ? "Ограничения:\n" + formatList(limitations) : "",
nextSteps.length > 0 ? "Что проверить дальше:\n" + formatList(nextSteps) : ""
]
.filter(Boolean)
.join("\n\n");
.join("\n\n"));
}
function composeAssistantAnswer(input) {
if (input.enableAnswerPolicyV11) {
@ -1122,13 +1215,15 @@ function composeAssistantAnswer(input) {
const partialResults = input.retrievalResults.filter((item) => item.status === "partial");
const emptyResults = input.retrievalResults.filter((item) => item.status === "empty");
const errorResults = input.retrievalResults.filter((item) => item.status === "error");
const legacyEvidenceItems = flattenEvidence(input.retrievalResults);
const legacyLimitationReasonCodes = collectLimitationReasonCodes(legacyEvidenceItems);
const hasBroadMinimumEvidenceSignal = input.retrievalResults.some((item) => summaryBoolean(item, "broad_guard_applied") && summaryBoolean(item, "minimum_evidence_failed"));
const hasBroadClarificationSignal = input.retrievalResults.some((item) => summaryBoolean(item, "broad_guard_applied") &&
summaryBoolean(item, "minimum_evidence_failed") &&
summaryString(item, "degraded_to") === "clarification");
if (fallbackType === "out_of_scope" && input.coverageReport.requirements_covered === 0) {
return {
assistant_reply: "Я могу отвечать только по данным вашей учетной базы. Этот запрос выходит за рамки доступного контура.",
assistant_reply: "РЇ РјРѕРіСѓ отвечать только РїРѕ данным вашей учетной базы. Р­СРѕС Р·Р°РїСЂРѕСЃ выходит Р·Р° рамки доступного контура.",
fallback_type: "out_of_scope",
reply_type: "out_of_scope"
};
@ -1136,8 +1231,8 @@ function composeAssistantAnswer(input) {
if (input.groundingCheck.status === "route_mismatch_blocked") {
return {
assistant_reply: [
"Не отправляю финальный ответ, потому что предмет результата не совпал с предметом вопроса.",
"Уточните формулировку (например, нужный счет/участок учета), и я выполню повторный проход."
"Не отправляю финальный ответ, потому что предмет результата не совпал с предметом вопроса.",
"Уточните формулировку (например, нужный счет/участок учета), и я выполню повторный проход."
].join("\n\n"),
fallback_type: "partial",
reply_type: "route_mismatch_blocked"
@ -1145,28 +1240,28 @@ function composeAssistantAnswer(input) {
}
if (input.groundingCheck.status === "no_grounded_answer" && okResults.length === 0 && !hasBroadMinimumEvidenceSignal) {
return {
assistant_reply: "Пока не удалось собрать предметно подтвержденный ответ по вашему вопросу. Нужны дополнительные уточнения по периоду или объекту проверки.",
assistant_reply: "Пока не удалось собрать предметно подтвержденный ответ по вашему вопросу. Нужны дополнительные уточнения по периоду или объекту проверки.",
fallback_type: fallbackType,
reply_type: "no_grounded_answer"
};
}
if (hasBroadClarificationSignal && okResults.length === 0 && partialResults.length === 0) {
return {
assistant_reply: "Запрос слишком широкий для надежного вывода по текущей опоре. Уточните период, участок учета или объект проверки, после чего я дам предметный результат.",
assistant_reply: "Запрос слишком широкий для надежного вывода по текущей опоре. Уточните период, участок учета или объект проверки, после чего я дам предметный результат.",
fallback_type: "clarification",
reply_type: "clarification_required"
};
}
if (fallbackType === "clarification" && okResults.length === 0 && partialResults.length === 0) {
return {
assistant_reply: "Уточните, пожалуйста, период, счет, документ или контрагента, чтобы закрыть все части вопроса корректно.",
assistant_reply: "Уточните, пожалуйста, период, счет, документ или контрагента, чтобы закрыть все части вопроса корректно.",
fallback_type: "clarification",
reply_type: "clarification_required"
};
}
if (errorResults.length > 0 && okResults.length === 0 && partialResults.length === 0) {
return {
assistant_reply: "Не удалось получить данные из контура. Попробуйте повторить запрос или уточнить формулировку.",
assistant_reply: "Не удалось получить данные из контура. Попробуйте повторить запрос или уточнить формулировку.",
fallback_type: fallbackType,
reply_type: "backend_error"
};
@ -1180,7 +1275,7 @@ function composeAssistantAnswer(input) {
}
if (okResults.length === 0 && partialResults.length === 0 && emptyResults.length > 0) {
return {
assistant_reply: "По заданному условию в текущем срезе данных явных проблемных записей не найдено.",
assistant_reply: "По заданному условию в текущем срезе данных явных проблемных записей не найдено.",
fallback_type: fallbackType,
reply_type: "empty_but_valid"
};
@ -1190,7 +1285,9 @@ function composeAssistantAnswer(input) {
input.coverageReport.clarification_needed_for.length > 0 ||
input.coverageReport.out_of_scope_requirements.length > 0 ||
input.groundingCheck.status === "partial" ||
errorResults.length > 0;
errorResults.length > 0 ||
legacyLimitationReasonCodes.includes("weak_source_mapping") ||
legacyLimitationReasonCodes.includes("missing_mechanism");
if (okResults.length > 0 && hasPartialCoverage) {
return {
assistant_reply: composeExplainableAnswer(input, "partial"),
@ -1206,7 +1303,7 @@ function composeAssistantAnswer(input) {
};
}
return {
assistant_reply: "По текущему запросу не удалось построить обоснованный ответ. Уточните формулировку и попробуйте снова.",
assistant_reply: "По текущему запросу не удалось построить обоснованный ответ. Уточните формулировку и попробуйте снова.",
fallback_type: "unknown",
reply_type: "backend_error"
};

View File

@ -917,10 +917,10 @@ class AssistantDataLayer {
result = this.executeRisk(fragmentText, data);
}
else if (route === "batch_refresh_then_store") {
result = this.executeBatch(data);
result = this.executeBatch(fragmentText, data);
}
else if (route === "store_canonical") {
result = this.executeCanonical(data);
result = this.executeCanonical(fragmentText, data);
}
else if (route === "live_mcp_drilldown") {
result = this.executeDrilldown(fragmentText, data);
@ -1207,7 +1207,9 @@ class AssistantDataLayer {
errors: []
};
}
executeRisk(_fragmentText, data) {
executeRisk(fragmentText, data) {
const semanticProfile = buildSemanticRetrievalProfile(fragmentText);
const profileRiskFactors = semanticProfile.anomaly_patterns;
const records = [...data.problemCases, ...data.ndsRegisters];
const scored = records
.map((record) => {
@ -1258,12 +1260,15 @@ class AssistantDataLayer {
items: [],
summary: {
checked_records: records.length,
risky_records: 0
risky_records: 0,
query_subject: semanticProfile.query_subject,
semantic_profile: semanticProfile,
ranking_basis: semanticProfile.ranking_basis
},
evidence: [],
why_included: [],
selection_reason: ["Риск-оценка выполнялась по техническим признакам, но записи выше порога не найдены."],
risk_factors: [],
risk_factors: profileRiskFactors,
business_interpretation: ["По текущему срезу явные риск-признаки не обнаружены."],
confidence: "medium",
limitations: ["Оценка основана на snapshot-данных и эвристическом risk score."],
@ -1271,6 +1276,13 @@ class AssistantDataLayer {
};
}
const averageScore = items.reduce((acc, item) => acc + item.risk_score, 0) / items.length;
const normalizedRiskFactors = uniqueStrings([
...profileRiskFactors,
"unknown_link_count",
"zero_guid_values",
"navigation_links",
"missing_counterparty_link"
]);
return {
status: "ok",
result_type: "list",
@ -1278,7 +1290,10 @@ class AssistantDataLayer {
summary: {
checked_records: records.length,
risky_records: items.length,
average_risk_score: Number(averageScore.toFixed(2))
average_risk_score: Number(averageScore.toFixed(2)),
query_subject: semanticProfile.query_subject,
semantic_profile: semanticProfile,
ranking_basis: semanticProfile.ranking_basis
},
evidence: items.slice(0, 10).map((item) => ({
source_entity: item.source_entity,
@ -1287,21 +1302,18 @@ class AssistantDataLayer {
})),
why_included: ["Рответ включены записи с risk_score >= 2."],
selection_reason: [
"score растет при unknown links, zero GUID, навигационных ссылках и отсутствии явного контрагента."
],
risk_factors: [
"unknown_link_count",
"zero_guid_values",
"navigation_links",
"missing_counterparty_link"
"score растет при unknown links, zero GUID, навигационных ссылках и отсутствии явного контрагента.",
`Semantic profile subject: ${semanticProfile.query_subject}.`
],
risk_factors: normalizedRiskFactors,
business_interpretation: ["Эти записи требуют первичной бухгалтерской проверки как потенциальные аномалии."],
confidence: "high",
limitations: ["Риск-факторы определяются эвристикой, а не полным набором бизнес-правил 1С."],
errors: []
};
}
executeBatch(data) {
executeBatch(fragmentText, data) {
const semanticProfile = buildSemanticRetrievalProfile(fragmentText);
const source = [...data.problemCases, ...data.keyFields, ...data.docs];
const byEntity = new Map();
for (const record of source) {
@ -1321,7 +1333,10 @@ class AssistantDataLayer {
items,
summary: {
checked_records: source.length,
ranked_entities: items.length
ranked_entities: items.length,
query_subject: semanticProfile.query_subject,
semantic_profile: semanticProfile,
ranking_basis: semanticProfile.ranking_basis
},
evidence: items.slice(0, 5).map((item) => ({
entity: item.entity,
@ -1329,17 +1344,20 @@ class AssistantDataLayer {
})),
why_included: items.length > 0 ? ["Показаны сущности с максимальным количеством записей."] : [],
selection_reason: ["Ранжирование выполнено по records_count по убыванию."],
risk_factors: ["Высокий объем записей по сущности повышает приоритет проверки."],
risk_factors: uniqueStrings(["entity_volume_spike", ...semanticProfile.anomaly_patterns]),
business_interpretation: [
"Сущности в топе ранга чаще дают наибольший вклад в проблемный объем и требуют приоритетного аудита."
"Top entities by volume highlight where lifecycle-focused review should start first."
],
confidence: "medium",
limitations: ["Ранжирование по объему не всегда эквивалентно бизнес-риску."],
errors: []
};
}
executeCanonical(data) {
const items = data.docs
executeCanonical(fragmentText, data) {
const semanticProfile = buildSemanticRetrievalProfile(fragmentText);
const useVatSource = semanticProfile.domain_scope.includes("vat") || semanticProfile.domain_scope.includes("taxes");
const sourceRecords = useVatSource ? [...data.ndsRegisters, ...data.keyFields] : data.docs;
const items = sourceRecords
.map((record) => {
const period = extractDate(record);
return {
@ -1360,8 +1378,11 @@ class AssistantDataLayer {
result_type: "list",
items,
summary: {
checked_records: data.docs.length,
returned_records: items.length
checked_records: sourceRecords.length,
returned_records: items.length,
query_subject: semanticProfile.query_subject,
semantic_profile: semanticProfile,
ranking_basis: semanticProfile.ranking_basis
},
evidence: items.slice(0, 6).map((item) => ({
source_entity: item.source_entity,
@ -1369,8 +1390,11 @@ class AssistantDataLayer {
period: item.period
})),
why_included: items.length > 0 ? ["Показаны последние по дате записи канонического документного слоя."] : [],
selection_reason: ["Отбор по максимальной дате документа в пределах snapshot."],
risk_factors: [],
selection_reason: [
"Отбор по максимальной дате документа в пределах snapshot.",
`Semantic profile subject: ${semanticProfile.query_subject}.`
],
risk_factors: semanticProfile.anomaly_patterns,
business_interpretation: ["Слой отражает базовый factual-срез документов для оперативной сверки."],
confidence: "high",
limitations: ["Р­СРѕ read-only snapshot, Р° РЅРµ онлайн-состояние 1РЎ."],

View File

@ -32,18 +32,92 @@ function includesAny(source, patterns) {
function hasToken(values, pattern) {
return values.some((value) => pattern.test(value));
}
function defaultExpectedState(domain) {
if (domain === "bank_settlement")
return "settlement_closed";
if (domain === "customer_settlement")
return "receivable_closed";
if (domain === "deferred_expense")
return "fully_written_off";
if (domain === "fixed_asset")
return "depreciation_active";
if (domain === "vat_flow")
return "vat_deducted";
return "close_completed";
function normalizeStateToken(value) {
return value.trim().toLowerCase();
}
function resolveStateCode(model, stateCode) {
if (!stateCode || typeof stateCode !== "string") {
return null;
}
const normalized = normalizeStateToken(stateCode);
const matched = model.states.find((state) => normalizeStateToken(state.state_code) === normalized);
return matched?.state_code ?? null;
}
function defaultInitialState(model) {
const initial = model.states.find((state) => state.state_class === "initial");
if (initial) {
return initial.state_code;
}
return model.states[0]?.state_code ?? "unknown_state";
}
function defaultExpectedState(model) {
const terminal = model.states.find((state) => state.is_terminal || state.state_class === "terminal");
if (terminal) {
return terminal.state_code;
}
const active = model.states.find((state) => state.state_class === "active");
if (active) {
return active.state_code;
}
return defaultInitialState(model);
}
function expectedTransitionAdjacency(model) {
const graph = new Map();
for (const transition of model.transitions) {
if (transition.transition_type !== "expected") {
continue;
}
const from = transition.from_state;
const to = transition.to_state;
const current = graph.get(from) ?? [];
if (!current.includes(to)) {
current.push(to);
}
graph.set(from, current);
}
return graph;
}
function shortestExpectedPath(model, fromState, toState) {
if (fromState === toState) {
return [fromState];
}
const graph = expectedTransitionAdjacency(model);
const queue = [[fromState]];
const visited = new Set([fromState]);
while (queue.length > 0) {
const path = queue.shift();
if (!path) {
continue;
}
const tail = path[path.length - 1];
const nextStates = graph.get(tail) ?? [];
for (const nextState of nextStates) {
if (visited.has(nextState)) {
continue;
}
const nextPath = [...path, nextState];
if (nextState === toState) {
return nextPath;
}
visited.add(nextState);
queue.push(nextPath);
}
}
return null;
}
function transitionEdgeLabel(fromState, toState) {
return `${fromState}->${toState}`;
}
function resolvePreviousStates(model, currentState) {
const initialState = defaultInitialState(model);
if (initialState === currentState) {
return [];
}
const path = shortestExpectedPath(model, initialState, currentState);
if (!path || path.length <= 1) {
return [];
}
return path.slice(0, -1);
}
const LIFECYCLE_DOMAIN_MODELS = {
bank_settlement: {
@ -53,53 +127,53 @@ const LIFECYCLE_DOMAIN_MODELS = {
states: [
{
state_code: "initiated_payment",
state_label: "Платеж инициирован",
state_label: "Платеж инициирован",
state_class: "initial",
entry_conditions: ["payment_order_created"],
exit_conditions: ["bank_recorded"],
is_terminal: false,
is_problematic: false,
business_meaning: "Есть инициирование платежа."
business_meaning: "Есть инициирование платежа."
},
{
state_code: "bank_recorded",
state_label: "Платеж отражен банком",
state_label: "Платеж отражен банком",
state_class: "active",
entry_conditions: ["bank_statement_recorded"],
exit_conditions: ["settlement_linked"],
is_terminal: false,
is_problematic: false,
business_meaning: "Движение денег зафиксировано, ожидается расчетное закрытие."
business_meaning: "Движение денег зафиксировано, ожидается расчетное закрытие."
},
{
state_code: "settlement_closed",
state_label: "Расчет закрыт",
state_label: "Расчет закрыт",
state_class: "terminal",
entry_conditions: ["payment_to_settlement_linked"],
exit_conditions: [],
is_terminal: true,
is_problematic: false,
business_meaning: "Платеж доведен до расчетного результата."
business_meaning: "Платеж доведен до расчетного результата."
},
{
state_code: "stale_unlinked_payment",
state_label: "Платеж завис без закрытия",
state_label: "Платеж завис без закрытия",
state_class: "problematic",
entry_conditions: ["bank_recorded", "missing_link"],
exit_conditions: ["settlement_closed"],
is_terminal: false,
is_problematic: true,
business_meaning: "Платеж отражен, но ожидаемая связь по расчету не завершена."
business_meaning: "Платеж отражен, но ожидаемая связь по расчету не завершена."
},
{
state_code: "misclosed_payment",
state_label: "Платеж закрыт некорректно",
state_label: "Платеж закрыт некорректно",
state_class: "problematic",
entry_conditions: ["wrong_document_type_or_posting_mismatch"],
exit_conditions: ["settlement_closed"],
is_terminal: false,
is_problematic: true,
business_meaning: "Формальное закрытие есть, но путь закрытия неверный."
business_meaning: "Формальное закрытие есть, но путь закрытия неверный."
}
],
transitions: [
@ -110,7 +184,7 @@ const LIFECYCLE_DOMAIN_MODELS = {
required_evidence: ["bank_statement_recorded"],
optional_evidence: ["payment_order"],
forbidden_conditions: [],
business_meaning: "Платеж должен появиться во выписке."
business_meaning: "Платеж должен появиться во выписке."
},
{
from_state: "bank_recorded",
@ -119,7 +193,7 @@ const LIFECYCLE_DOMAIN_MODELS = {
required_evidence: ["payment_to_settlement_link"],
optional_evidence: ["document_to_posting"],
forbidden_conditions: ["wrong_document_type"],
business_meaning: "После выписки должен закрываться расчет."
business_meaning: "После выписки должен закрываться расчет."
}
],
defects: []
@ -131,43 +205,43 @@ const LIFECYCLE_DOMAIN_MODELS = {
states: [
{
state_code: "invoice_issued",
state_label: "Реализация отражена",
state_label: "Реализация отражена",
state_class: "initial",
entry_conditions: ["realization_document_exists"],
exit_conditions: ["payment_recorded"],
is_terminal: false,
is_problematic: false,
business_meaning: "Возникла дебиторская позиция."
business_meaning: "Возникла дебиторская позиция."
},
{
state_code: "payment_recorded",
state_label: "Оплата отражена",
state_label: "Оплата отражена",
state_class: "active",
entry_conditions: ["payment_document_exists"],
exit_conditions: ["receivable_closed"],
is_terminal: false,
is_problematic: false,
business_meaning: "Оплата есть, ожидается корректное закрытие."
business_meaning: "Оплата есть, ожидается корректное закрытие."
},
{
state_code: "receivable_closed",
state_label: "Дебиторка закрыта",
state_label: "Дебиторка закрыта",
state_class: "terminal",
entry_conditions: ["closing_document_linked"],
exit_conditions: [],
is_terminal: true,
is_problematic: false,
business_meaning: "Дебиторская позиция закрыта корректно."
business_meaning: "Дебиторская позиция закрыта корректно."
},
{
state_code: "stale_receivable",
state_label: "Дебиторка зависла",
state_label: "Дебиторка зависла",
state_class: "problematic",
entry_conditions: ["unresolved_settlement"],
exit_conditions: ["receivable_closed"],
is_terminal: false,
is_problematic: true,
business_meaning: "Позиция остается незавершенной дольше ожидаемого."
business_meaning: "Позиция остается незавершенной дольше ожидаемого."
}
],
transitions: [
@ -178,7 +252,7 @@ const LIFECYCLE_DOMAIN_MODELS = {
required_evidence: ["payment_document_exists"],
optional_evidence: [],
forbidden_conditions: [],
business_meaning: "После реализации ожидается оплата/зачет."
business_meaning: "После реализации ожидается оплата/зачет."
},
{
from_state: "payment_recorded",
@ -187,7 +261,7 @@ const LIFECYCLE_DOMAIN_MODELS = {
required_evidence: ["closing_document_linked"],
optional_evidence: ["register_movement_exists"],
forbidden_conditions: ["cross_branch_inconsistency"],
business_meaning: "Оплата должна завершаться корректным закрытием расчета."
business_meaning: "Оплата должна завершаться корректным закрытием расчета."
}
],
defects: []
@ -199,43 +273,43 @@ const LIFECYCLE_DOMAIN_MODELS = {
states: [
{
state_code: "recognized",
state_label: "РБП признан",
state_label: "РБП признан",
state_class: "initial",
entry_conditions: ["deferred_expense_created"],
exit_conditions: ["writeoff_started"],
is_terminal: false,
is_problematic: false,
business_meaning: "РБП поставлен на учет."
business_meaning: "РБП поставлен на учет."
},
{
state_code: "partially_written_off",
state_label: "Частичное списание",
state_label: "Частичное списание",
state_class: "active",
entry_conditions: ["partial_writeoff_exists"],
exit_conditions: ["fully_written_off"],
is_terminal: false,
is_problematic: false,
business_meaning: "Списание идет по графику."
business_meaning: "Списание идет по графику."
},
{
state_code: "fully_written_off",
state_label: "РБП полностью списан",
state_label: "РБП полностью списан",
state_class: "terminal",
entry_conditions: ["full_writeoff_exists"],
exit_conditions: [],
is_terminal: true,
is_problematic: false,
business_meaning: "РБП завершил lifecycle."
business_meaning: "РБП завершил lifecycle."
},
{
state_code: "overdue_writeoff",
state_label: "Просроченное списание",
state_label: "Просроченное списание",
state_class: "problematic",
entry_conditions: ["period_boundary", "missing_link"],
exit_conditions: ["fully_written_off"],
is_terminal: false,
is_problematic: true,
business_meaning: "РБП живет дольше допустимого окна."
business_meaning: "РБП живет дольше допустимого окна."
}
],
transitions: [],
@ -248,53 +322,53 @@ const LIFECYCLE_DOMAIN_MODELS = {
states: [
{
state_code: "capitalized",
state_label: "Капвложения отражены",
state_label: "Капвложения отражены",
state_class: "initial",
entry_conditions: ["capitalization_document_exists"],
exit_conditions: ["accepted_for_accounting"],
is_terminal: false,
is_problematic: false,
business_meaning: "Объект зафиксирован как вложение."
business_meaning: "Объект зафиксирован как вложение."
},
{
state_code: "accepted_for_accounting",
state_label: "Принят к учету",
state_label: "Принят к учету",
state_class: "active",
entry_conditions: ["acceptance_document_exists"],
exit_conditions: ["depreciation_active"],
is_terminal: false,
is_problematic: false,
business_meaning: "Объект переведен в основной контур учета."
business_meaning: "Объект переведен в основной контур учета."
},
{
state_code: "depreciation_active",
state_label: "Амортизация активна",
state_label: "Амортизация активна",
state_class: "active",
entry_conditions: ["depreciation_register_movement"],
exit_conditions: ["disposed"],
is_terminal: false,
is_problematic: false,
business_meaning: "Жизненный цикл ОС идет штатно."
business_meaning: "Жизненный цикл ОС идет штатно."
},
{
state_code: "contradictory_asset_state",
state_label: "Противоречивый статус ОС",
state_label: "Противоречивый статус ОС",
state_class: "problematic",
entry_conditions: ["posting_mismatch_or_wrong_path"],
exit_conditions: ["depreciation_active"],
is_terminal: false,
is_problematic: true,
business_meaning: "Статус ОС формально есть, но смыслово противоречив."
business_meaning: "Статус ОС формально есть, но смыслово противоречив."
},
{
state_code: "disposed",
state_label: "Выбыл",
state_label: "Выбыл",
state_class: "terminal",
entry_conditions: ["disposal_document_exists"],
exit_conditions: [],
is_terminal: true,
is_problematic: false,
business_meaning: "Жизненный цикл ОС завершен."
business_meaning: "Жизненный цикл ОС завершен."
}
],
transitions: [],
@ -307,43 +381,43 @@ const LIFECYCLE_DOMAIN_MODELS = {
states: [
{
state_code: "vat_registered",
state_label: "НДС отражен документно",
state_label: "НДС отражен документно",
state_class: "initial",
entry_conditions: ["invoice_registered"],
exit_conditions: ["vat_reflected"],
is_terminal: false,
is_problematic: false,
business_meaning: "Сформирован первичный документный слой НДС."
business_meaning: "Сформирован первичный документный слой НДС."
},
{
state_code: "vat_reflected",
state_label: "НДС отражен в учете",
state_label: "НДС отражен в учете",
state_class: "active",
entry_conditions: ["vat_register_movement"],
exit_conditions: ["vat_deducted"],
is_terminal: false,
is_problematic: false,
business_meaning: "НДС проходит штатную стадию отражения."
business_meaning: "НДС проходит штатную стадию отражения."
},
{
state_code: "vat_deducted",
state_label: "НДС принят к вычету",
state_label: "НДС принят к вычету",
state_class: "terminal",
entry_conditions: ["deduction_confirmed"],
exit_conditions: [],
is_terminal: true,
is_problematic: false,
business_meaning: "НДС-цепочка завершена корректно."
business_meaning: "НДС-цепочка завершена корректно."
},
{
state_code: "vat_conflict",
state_label: "Конфликт НДС-цепочки",
state_label: "Конфликт НДС-цепочки",
state_class: "problematic",
entry_conditions: ["cross_branch_inconsistency"],
exit_conditions: ["vat_reflected"],
is_terminal: false,
is_problematic: true,
business_meaning: "Бухгалтерская и налоговая ветки расходятся."
business_meaning: "Бухгалтерская и налоговая ветки расходятся."
}
],
transitions: [],
@ -356,53 +430,53 @@ const LIFECYCLE_DOMAIN_MODELS = {
states: [
{
state_code: "preclose_checks",
state_label: "Предзакрытие",
state_label: "Предзакрытие",
state_class: "active",
entry_conditions: ["period_scope_detected"],
exit_conditions: ["close_ready"],
is_terminal: false,
is_problematic: false,
business_meaning: "Идет проверка готовности периода."
business_meaning: "Идет проверка готовности периода."
},
{
state_code: "close_ready",
state_label: "Готов к закрытию",
state_label: "Готов к закрытию",
state_class: "active",
entry_conditions: ["no_blockers_detected"],
exit_conditions: ["close_completed"],
is_terminal: false,
is_problematic: false,
business_meaning: "Период может быть закрыт."
business_meaning: "Период может быть закрыт."
},
{
state_code: "close_completed",
state_label: "Закрытие завершено",
state_label: "Закрытие завершено",
state_class: "terminal",
entry_conditions: ["close_operation_done"],
exit_conditions: [],
is_terminal: true,
is_problematic: false,
business_meaning: "Период закрыт."
business_meaning: "Период закрыт."
},
{
state_code: "close_blocked",
state_label: "Закрытие заблокировано",
state_label: "Закрытие заблокировано",
state_class: "problematic",
entry_conditions: ["period_close_risk_or_stale_state"],
exit_conditions: ["close_ready"],
is_terminal: false,
is_problematic: true,
business_meaning: "Есть lifecycle-дефекты, влияющие на закрытие."
business_meaning: "Есть lifecycle-дефекты, влияющие на закрытие."
},
{
state_code: "close_contradicted",
state_label: "Закрыт формально, но с противоречием",
state_label: "Закрыт формально, но с противоречием",
state_class: "problematic",
entry_conditions: ["misclosed_or_cross_branch_conflict"],
exit_conditions: ["close_completed"],
is_terminal: false,
is_problematic: true,
business_meaning: "Формальное закрытие не согласовано с фактическими ветками."
business_meaning: "Формальное закрытие не согласовано с фактическими ветками."
}
],
transitions: [],
@ -414,7 +488,7 @@ const SHARED_DEFECTS = [
defect_code: "missing_expected_transition",
defect_class: "path",
severity_hint: "medium",
business_meaning: "Ожидаемый переход не произошел.",
business_meaning: "Ожидаемый переход не произошел.",
evidence_requirements: ["expected_state", "missing_transition_signal"],
period_impact_potential: "indirect"
},
@ -422,7 +496,7 @@ const SHARED_DEFECTS = [
defect_code: "invalid_transition",
defect_class: "path",
severity_hint: "high",
business_meaning: "Переход произошел по некорректному пути.",
business_meaning: "Переход произошел по некорректному пути.",
evidence_requirements: ["invalid_transition_signal"],
period_impact_potential: "indirect"
},
@ -430,7 +504,7 @@ const SHARED_DEFECTS = [
defect_code: "stale_active_state",
defect_class: "timing",
severity_hint: "high",
business_meaning: "Объект завис в активном состоянии.",
business_meaning: "Объект завис в активном состоянии.",
evidence_requirements: ["stale_marker", "missing_transition_signal"],
period_impact_potential: "direct"
},
@ -438,7 +512,7 @@ const SHARED_DEFECTS = [
defect_code: "contradictory_state",
defect_class: "consistency",
severity_hint: "high",
business_meaning: "Статусы объекта противоречат друг другу.",
business_meaning: "Статусы объекта противоречат друг другу.",
evidence_requirements: ["contradiction_signal"],
period_impact_potential: "direct"
},
@ -446,7 +520,7 @@ const SHARED_DEFECTS = [
defect_code: "premature_terminal_state",
defect_class: "closure",
severity_hint: "medium",
business_meaning: "Терминальное состояние наступило преждевременно.",
business_meaning: "Терминальное состояние наступило преждевременно.",
evidence_requirements: ["terminal_state", "missing_required_previous_state"],
period_impact_potential: "indirect"
},
@ -454,7 +528,7 @@ const SHARED_DEFECTS = [
defect_code: "misclosed_state",
defect_class: "closure",
severity_hint: "high",
business_meaning: "Контур формально закрыт, но закрыт неверно.",
business_meaning: "Контур формально закрыт, но закрыт неверно.",
evidence_requirements: ["wrong_closure_path"],
period_impact_potential: "direct"
},
@ -462,7 +536,7 @@ const SHARED_DEFECTS = [
defect_code: "orphan_intermediate_state",
defect_class: "path",
severity_hint: "medium",
business_meaning: "Промежуточная стадия осталась без корректного продолжения.",
business_meaning: "Промежуточная стадия осталась без корректного продолжения.",
evidence_requirements: ["intermediate_state_without_next"],
period_impact_potential: "indirect"
},
@ -470,7 +544,7 @@ const SHARED_DEFECTS = [
defect_code: "cross_branch_state_conflict",
defect_class: "consistency",
severity_hint: "high",
business_meaning: "Состояния соседних веток учета противоречат друг другу.",
business_meaning: "Состояния соседних веток учета противоречат друг другу.",
evidence_requirements: ["cross_branch_conflict_signal"],
period_impact_potential: "direct"
}
@ -489,6 +563,19 @@ class LifecycleRegistryImpl {
getDomain(domain) {
return this.models[domain];
}
hasState(domain, stateCode) {
const model = this.getDomain(domain);
return Boolean(resolveStateCode(model, stateCode));
}
resolveDefaultExpectedState(domain) {
return defaultExpectedState(this.getDomain(domain));
}
resolveInitialState(domain) {
return defaultInitialState(this.getDomain(domain));
}
findExpectedPath(domain, fromState, toState) {
return shortestExpectedPath(this.getDomain(domain), fromState, toState);
}
}
exports.LifecycleRegistry = new LifecycleRegistryImpl(LIFECYCLE_DOMAIN_MODELS);
function inferLifecycleDomain(input) {
@ -508,28 +595,81 @@ function inferLifecycleDomain(input) {
]
.join(" ")
.toLowerCase();
if (includesAny(unitTokens, [/\bnds\b/, /\bvat\b/, /\btax\b/, /cross[_\s-]?branch/, /\b19\b/, /\b68\b/])) {
return "vat_flow";
}
if (includesAny(unitTokens, [/\bperiod\b/, /\bclose\b/, /закрыт/, /reporting/]) || input.unit.problem_unit_type === "period_risk_cluster") {
return "period_close";
}
if (includesAny(unitTokens, [/deferred/, /writeoff/, /рбп/, /\b97\b/])) {
const hasVatMarkers = includesAny(unitTokens, [
/domain_hint:vat_flow/,
/\binvoice_to_vat\b/,
/\bvat_chain_conflict\b/,
/(^|[^a-z0-9])nds([^a-z0-9]|$)/,
/(^|[^a-z0-9])vat([^a-z0-9]|$)/,
/(^|[^a-z0-9])tax(?:es)?([^a-z0-9]|$)/,
/\baccount[_:\s-]?(19|68)\b/
]);
const hasDeferredMarkers = includesAny(unitTokens, [
/domain_hint:deferred_expense/,
/\bdeferred(?:_expense)?\b/,
/\bdeferred_expense_to_writeoff\b/,
/\bwriteoff\b/,
/\bpartially_written_off\b/,
/\bfully_written_off\b/,
/\baccount[_:\s-]?97\b/
]);
const hasFixedAssetMarkers = includesAny(unitTokens, [
/domain_hint:fixed_asset/,
/\bfixed[_\s-]?asset(?:s)?\b/,
/\basset_card_to_depreciation\b/,
/\bdepreciation(?:_active)?\b/,
/\baccepted_for_accounting\b/,
/\bcapitalized\b/,
/\baccount[_:\s-]?(01|02|08)\b/
]);
const hasPeriodCloseMarkers = includesAny(unitTokens, [
/domain_hint:period_close/,
/\bperiod[_\s-]?close\b/,
/\bperiod_close_risk\b/,
/\bclose[_\s-]?risk\b/,
/\bclosure[_\s-]?risk\b/,
/\bpreclose\b/,
/\bmonth[_\s-]?close\b/,
/\bperiod_risk\b/
]);
if (hasDeferredMarkers) {
return "deferred_expense";
}
if (includesAny(unitTokens, [/fixed[_\s-]?asset/, /амортиз/, /ос\b/, /\b01\b/, /\b02\b/, /\b08\b/])) {
if (hasFixedAssetMarkers) {
return "fixed_asset";
}
if (includesAny(unitTokens, [/buyer/, /customer/, /дебитор/, /\b62\b/])) {
if (hasVatMarkers) {
return "vat_flow";
}
if (hasPeriodCloseMarkers ||
input.unit.problem_unit_type === "period_risk_cluster" ||
input.unit.period_impact?.impact_class === "close_risk") {
return "period_close";
}
if (includesAny(unitTokens, [/buyer/, /customer/, /\b62\b/])) {
return "customer_settlement";
}
if (includesAny(unitTokens, [
/domain_hint:bank_settlement/,
/\bpayment_to_settlement\b/,
/\bstatement_to_document\b/,
/\bbank_recorded\b/,
/\binitiated_payment\b/,
/\bsettlement(?:_closed)?\b/
]) ||
input.unit.problem_unit_type === "unresolved_settlement_cluster" ||
input.unit.problem_unit_type === "broken_chain_segment") {
return "bank_settlement";
}
if (input.unit.problem_unit_type === "cross_branch_inconsistency_cluster") {
return "vat_flow";
}
if (input.unit.problem_unit_type === "lifecycle_anomaly_node") {
return "deferred_expense";
}
return "bank_settlement";
}
function inferCurrentState(domain, input) {
const explicitActual = input.unit.actual_state?.trim();
if (explicitActual) {
return explicitActual;
}
const anomalies = input.candidates.flatMap((item) => item.anomaly_patterns).map((item) => item.toLowerCase());
const relations = input.candidates.flatMap((item) => item.relation_pattern_hits).map((item) => item.toLowerCase());
const hasStale = hasToken(anomalies, /(no_continuation|stale|tail|missing_link|broken_lifecycle|partially_linked)/);
@ -562,7 +702,7 @@ function inferCurrentState(domain, input) {
return "contradictory_asset_state";
if (hasToken(relations, /depreciation|amort/))
return "depreciation_active";
if (hasToken(relations, /accept|учет/))
if (hasToken(relations, /accept|account/))
return "accepted_for_accounting";
return "capitalized";
}
@ -579,25 +719,42 @@ function inferCurrentState(domain, input) {
return "close_blocked";
return "preclose_checks";
}
function inferExpectedState(domain, input) {
function inferExpectedState(domain, input, model) {
const explicitExpected = input.unit.expected_state?.trim();
if (explicitExpected) {
return explicitExpected;
}
return defaultExpectedState(domain);
return defaultExpectedState(model);
}
function inferMissingTransition(input) {
function inferMissingTransition(input, model, currentState, expectedState) {
if (typeof input.unit.failed_expected_edge === "string" && input.unit.failed_expected_edge.trim().length > 0) {
return input.unit.failed_expected_edge.trim();
}
const anomalies = input.candidates.flatMap((item) => item.anomaly_patterns).join(" ").toLowerCase();
if (/(missing_link|no_continuation|broken_lifecycle|tail|unresolved)/.test(anomalies)) {
return "expected_transition_not_observed";
}
if (!/(missing_link|no_continuation|broken_lifecycle|tail|unresolved)/.test(anomalies)) {
return null;
}
function inferInvalidTransition(input) {
if (currentState !== expectedState) {
const path = shortestExpectedPath(model, currentState, expectedState);
if (path && path.length >= 2) {
return transitionEdgeLabel(path[0], path[1]);
}
}
const directExpected = model.transitions.find((transition) => transition.transition_type === "expected" && transition.from_state === currentState);
if (directExpected) {
return transitionEdgeLabel(directExpected.from_state, directExpected.to_state);
}
return "expected_transition_not_observed";
}
function inferInvalidTransition(input, model) {
const anomalies = input.candidates.flatMap((item) => item.anomaly_patterns).join(" ").toLowerCase();
for (const transition of model.transitions) {
for (const forbiddenCondition of transition.forbidden_conditions) {
if (anomalies.includes(forbiddenCondition.toLowerCase())) {
return `${transitionEdgeLabel(transition.from_state, transition.to_state)}:forbidden:${forbiddenCondition}`;
}
}
}
if (/(cross_branch|cross_domain_inconsistency)/.test(anomalies)) {
return "cross_branch_conflict_transition";
}
@ -634,6 +791,13 @@ function classifyLifecycleDefect(input) {
}
return null;
}
function registryBackedDefect(domain, defect) {
if (!defect) {
return null;
}
const model = exports.LifecycleRegistry.getDomain(domain);
return model.defects.some((definition) => definition.defect_code === defect) ? defect : null;
}
function resolutionConfidence(unitConfidence, input) {
let score = unitConfidence.score;
if (input.hasExplicitStates)
@ -661,31 +825,40 @@ function staleDurationHint(domain, defect, input) {
return "unknown_snapshot_window";
}
function lifecycleInterpretation(input) {
const base = `Текущая стадия: ${input.currentState}; ожидаемая стадия: ${input.expectedState}.`;
const base = `Текущая стадия: ${input.currentState}; ожидаемая стадия: ${input.expectedState}.`;
if (input.defect === "stale_active_state") {
return `${base} Объект завис во времени и не дошел до ожидаемого перехода.`;
return `${base} Объект завис во времени и не дошел до ожидаемого перехода.`;
}
if (input.defect === "misclosed_state") {
return `${base} Контур закрыт формально, но путь закрытия противоречит бухгалтерской логике.`;
return `${base} Контур закрыт формально, но путь закрытия противоречит бухгалтерской логике.`;
}
if (input.defect === "cross_branch_state_conflict") {
return `${base} Между ветками домена ${input.domain} обнаружено противоречие состояний.`;
return `${base} Между ветками домена ${input.domain} обнаружено противоречие состояний.`;
}
if (input.defect === "missing_expected_transition") {
return `${base} Не зафиксирован ожидаемый переход (${input.missingTransition ?? "unknown_transition"}).`;
return `${base} Не зафиксирован ожидаемый переход (${input.missingTransition ?? "unknown_transition"}).`;
}
if (input.defect === "invalid_transition") {
return `${base} Зафиксирован некорректный переход (${input.invalidTransition ?? "invalid_transition"}).`;
return `${base} Зафиксирован некорректный переход (${input.invalidTransition ?? "invalid_transition"}).`;
}
return `${base} Lifecycle-разрешение не выявило критичный дефект, но состояние требует наблюдения.`;
return `${base} Lifecycle-разрешение не выявило критичный дефект, но состояние требует наблюдения.`;
}
function resolveLifecycle(input) {
const lifecycle_domain = inferLifecycleDomain(input);
const currentState = inferCurrentState(lifecycle_domain, input);
const expectedState = inferExpectedState(lifecycle_domain, input);
const missingTransition = inferMissingTransition(input);
const invalidTransition = inferInvalidTransition(input);
const defect = classifyLifecycleDefect({
const model = exports.LifecycleRegistry.getDomain(lifecycle_domain);
const inferredCurrentState = inferCurrentState(lifecycle_domain, input);
const inferredExpectedState = inferExpectedState(lifecycle_domain, input, model);
const explicitActualState = input.unit.actual_state?.trim() ?? null;
const explicitExpectedState = input.unit.expected_state?.trim() ?? null;
const explicitCurrentState = resolveStateCode(model, explicitActualState);
const explicitExpectedResolved = resolveStateCode(model, explicitExpectedState);
const inferredCurrentResolved = resolveStateCode(model, inferredCurrentState);
const inferredExpectedResolved = resolveStateCode(model, inferredExpectedState);
const currentState = explicitCurrentState ?? inferredCurrentResolved ?? defaultInitialState(model);
const expectedState = explicitExpectedResolved ?? inferredExpectedResolved ?? defaultExpectedState(model);
const missingTransition = inferMissingTransition(input, model, currentState, expectedState);
const invalidTransition = inferInvalidTransition(input, model);
const detectedDefect = classifyLifecycleDefect({
domain: lifecycle_domain,
currentState,
expectedState,
@ -693,15 +866,19 @@ function resolveLifecycle(input) {
invalidTransition,
periodCloseSensitive: input.unit.period_impact?.impact_class === "close_risk"
});
const defect = registryBackedDefect(lifecycle_domain, detectedDefect);
const evidenceIds = uniqueStrings(input.unit.evidence_pack, 8);
const previousStates = resolvePreviousStates(model, currentState);
const limitations = uniqueStrings([
...input.unit.snapshot_limitations,
...(input.candidates.some((item) => item.confidence_hint === "low") ? ["low_confidence_candidates_present"] : []),
...(input.unit.actual_state ? [] : ["actual_state_inferred"]),
...(input.unit.expected_state ? [] : ["expected_state_inferred"])
...(explicitActualState && !explicitCurrentState ? ["actual_state_not_in_registry_normalized"] : []),
...(explicitExpectedState && !explicitExpectedResolved ? ["expected_state_not_in_registry_normalized"] : []),
...(explicitCurrentState ? [] : ["actual_state_inferred"]),
...(explicitExpectedResolved ? [] : ["expected_state_inferred"])
], 8);
const confidence = resolutionConfidence(input.unit.confidence, {
hasExplicitStates: Boolean(input.unit.actual_state || input.unit.expected_state),
hasExplicitStates: Boolean(explicitCurrentState || explicitExpectedResolved),
hasDefectSignal: Boolean(defect || missingTransition || invalidTransition),
candidateCount: input.candidates.length,
hasSnapshotLimitations: limitations.length > 0
@ -711,7 +888,7 @@ function resolveLifecycle(input) {
lifecycle_domain,
resolved_current_state: currentState,
resolved_expected_state: expectedState,
resolved_previous_states: [],
resolved_previous_states: previousStates,
missing_transitions: missingTransition ? [missingTransition] : [],
invalid_transitions: invalidTransition ? [invalidTransition] : [],
detected_defects: defect ? [defect] : [],

View File

@ -76,7 +76,7 @@ function intersectsAnySpan(start, end, spans) {
function extractAccounts(text) {
const lower = String(text ?? "").toLowerCase();
const explicitAccounts = new Set();
const contextualPattern = /(?:\bсчет(?:а|у|ом|ов)?\b|\bсч\.?\b|\baccount(?:s)?\b|\bschet(?:a|u|om|ov)?\b)\s*(?:№|#|:)?\s*(\d{2}(?:\.\d{2})?)/giu;
const contextualPattern = /(?:\bсч(?:е|ё)т(?:а|у|ом|ов)?\b|\bсч\.?\b|\baccount(?:s)?\b|\bschet(?:a|u|om|ov)?\b)\s*(?:№|#|:)?\s*(\d{2}(?:\.\d{2})?)/giu;
let contextual = null;
while ((contextual = contextualPattern.exec(lower)) !== null) {
if (contextual[1]) {
@ -284,8 +284,9 @@ function buildFragmentV2(rawText, index) {
if (noiseOnly) {
return null;
}
const inScopeTokens = /(проводк|документ|реализац|поступлен|взаиморасчет|сальдо|остатк|счет|ндс|амортиз|расходы будущих периодов|рбп|ос|контрагент|оплат|банк|выписк|склад|товар|материал)/i.test(lower);
const translitInScopeTokens = /\b(?:schet|scheta|schetu|schetom|postavsh|kontragent|dokument|doc|oplata|oplati|platezh|vypisk|provodk|realiz|postuplen|nds|os|saldo|hvost|tail|anomali|risk|zakryt)\b/i.test(lower);
const inScopeTokens = /(проводк|документ|реализац|поступлен|взаиморасчет|сальдо|остатк|сч(?:е|ё)т|ндс|амортиз|расходы будущих периодов|рбп|ос|контрагент|оплат|банк|выписк|склад|товар|материал|списани|жизненн|цикл|переход|lifecycle|writeoff|deferred)/i.test(lower);
const translitInScopeTokens = /\b(?:schet|scheta|schetu|schetom|postavsh|kontragent|dokument|doc|oplata|oplati|platezh|vypisk|provodk|realiz|postuplen|nds|os|saldo|hvost|tail|anomali|risk|zakryt|lifecycle|state|transition|writeoff|deferred|periodclose)\b/i.test(lower);
const lifecycleInScopeTokens = /(lifecycle|жизненн(?:ого|ый)?\s+цикл|стади|переход|списани|writeoff|deferred|period\s*close)/i.test(lower);
const genericAccountingTokens = /(фсбу|налогов(ый|ого)|нк рф|закон|форма отчетности|как правильно в бухгалтерии)/i.test(lower);
const offTopicTokens = /(погода|анекдот|музык|фильм|игр[аы]|рецепт|курс валют в мире)/i.test(lower);
let domainRelevance = "unclear";
@ -298,13 +299,13 @@ function buildFragmentV2(rawText, index) {
domainRelevance = "out_of_scope";
businessScope = "generic_accounting";
}
else if (inScopeTokens || translitInScopeTokens) {
else if (inScopeTokens || translitInScopeTokens || lifecycleInScopeTokens) {
domainRelevance = "in_scope";
businessScope = "company_specific_accounting";
}
const entityTokenCount = (lower.match(/(документ|оплат|проводк|контрагент|договор|реализац|поступлен|выписк|закрыт|взаиморасчет|склад|товар|материал)/g) ?? [])
const entityTokenCount = (lower.match(/(документ|оплат|проводк|контрагент|договор|реализац|поступлен|выписк|закрыт|взаиморасчет|склад|товар|материал|поставщ|покупат|списани|жизненн|цикл)/g) ?? [])
.length;
const translitEntityTokenCount = (lower.match(/\b(?:dokument|oplata|platezh|provodk|kontragent|realiz|postuplen|vypisk|zakryt|schet|sklad|tovar|material)\b/g) ?? []).length;
const translitEntityTokenCount = (lower.match(/\b(?:dokument|oplata|platezh|provodk|kontragent|postavsh|pokupat|realiz|postuplen|vypisk|zakryt|schet|sklad|tovar|material)\b/g) ?? []).length;
const entityTokenCountTotal = entityTokenCount + translitEntityTokenCount;
const flags = {
has_multi_entity_scope: entityTokenCountTotal >= 2,

View File

@ -202,12 +202,13 @@ function simulateDeterministicRouting(normalized) {
const decisions = normalized.fragments.map((fragment) => decideRouteForFragment(fragment));
const inScopeCount = decisions.filter((item) => item.domain_relevance === "in_scope").length;
const outOfScopeCount = decisions.filter((item) => item.domain_relevance === "out_of_scope").length;
const unclearCount = decisions.filter((item) => item.domain_relevance === "unclear").length;
const routedInScopeCount = decisions.filter((item) => item.domain_relevance === "in_scope" && item.route !== "no_route").length;
const clarificationInScopeCount = decisions.filter((item) => item.domain_relevance === "in_scope" && item.execution_readiness === "needs_clarification").length;
const noRouteInScopeCount = decisions.filter((item) => item.domain_relevance === "in_scope" && item.route === "no_route").length;
let fallbackType = "none";
if (!normalized.message_in_scope || inScopeCount === 0) {
fallbackType = "out_of_scope";
fallbackType = outOfScopeCount > 0 && unclearCount === 0 ? "out_of_scope" : "clarification";
}
else if (routedInScopeCount === 0 && clarificationInScopeCount > 0) {
fallbackType = "clarification";

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,270 @@
#!/usr/bin/env node
const fs = require("node:fs");
const path = require("node:path");
const request = require("supertest");
const STAGE3_SUITE_RELATIVE = path.join("eval_cases", "assistant_stage3_lifecycle_probe_v0_1.json");
const FLAG_KEYS = [
"FEATURE_ASSISTANT_PROBLEM_UNITS_V1",
"FEATURE_ASSISTANT_ANSWER_POLICY_V11",
"FEATURE_ASSISTANT_BROAD_GUARD_V1",
"FEATURE_ASSISTANT_MIN_EVIDENCE_GATE_V1",
"FEATURE_ASSISTANT_ANTI_GENERIC_RANKING_GUARD_V1",
"FEATURE_ASSISTANT_PROBLEM_CENTRIC_ANSWER_V1",
"FEATURE_ASSISTANT_PROBLEM_UNIT_CONTINUITY_V1",
"FEATURE_ASSISTANT_LIFECYCLE_RUNTIME_V1",
"FEATURE_ASSISTANT_LIFECYCLE_ANSWER_V1"
];
function parseArgs(argv) {
const args = {
runDir: "",
suitePath: "",
outputSubdir: path.join("prompt_dialogs", "stage3_lifecycle_probe")
};
for (let i = 0; i < argv.length; i += 1) {
const token = argv[i];
if (token === "--run-dir") {
args.runDir = String(argv[i + 1] ?? "");
i += 1;
continue;
}
if (token === "--suite-path") {
args.suitePath = String(argv[i + 1] ?? "");
i += 1;
continue;
}
if (token === "--output-subdir") {
args.outputSubdir = String(argv[i + 1] ?? "");
i += 1;
}
}
return args;
}
function ensureDir(dirPath) {
fs.mkdirSync(dirPath, { recursive: true });
}
function writeUtf8Bom(filePath, content) {
fs.writeFileSync(filePath, `\uFEFF${content}`, "utf8");
}
function toSafeFileToken(value) {
return String(value)
.trim()
.replace(/\s+/g, "_")
.replace(/[^a-zA-Z0-9_-]/g, "_")
.replace(/_+/g, "_");
}
function readJson(filePath) {
const raw = fs.readFileSync(filePath, "utf8").replace(/^\uFEFF/, "");
return JSON.parse(raw);
}
function findLatestRunDir(runsRoot) {
if (!fs.existsSync(runsRoot)) {
throw new Error(`Runs folder not found: ${runsRoot}`);
}
const dirs = fs
.readdirSync(runsRoot, { withFileTypes: true })
.filter((entry) => entry.isDirectory())
.map((entry) => path.join(runsRoot, entry.name))
.sort((a, b) => fs.statSync(b).mtimeMs - fs.statSync(a).mtimeMs);
if (dirs.length === 0) {
throw new Error(`No run directories found under: ${runsRoot}`);
}
return dirs[0];
}
function resolveRunDir(args, runsRoot) {
if (args.runDir) {
return path.resolve(args.runDir);
}
return findLatestRunDir(runsRoot);
}
function setLifecycleFlags() {
const original = {};
for (const key of FLAG_KEYS) {
original[key] = process.env[key];
}
process.env.FEATURE_ASSISTANT_PROBLEM_UNITS_V1 = "1";
process.env.FEATURE_ASSISTANT_ANSWER_POLICY_V11 = "1";
process.env.FEATURE_ASSISTANT_BROAD_GUARD_V1 = "1";
process.env.FEATURE_ASSISTANT_MIN_EVIDENCE_GATE_V1 = "1";
process.env.FEATURE_ASSISTANT_ANTI_GENERIC_RANKING_GUARD_V1 = "1";
process.env.FEATURE_ASSISTANT_PROBLEM_CENTRIC_ANSWER_V1 = "1";
process.env.FEATURE_ASSISTANT_PROBLEM_UNIT_CONTINUITY_V1 = "0";
process.env.FEATURE_ASSISTANT_LIFECYCLE_RUNTIME_V1 = "1";
process.env.FEATURE_ASSISTANT_LIFECYCLE_ANSWER_V1 = "1";
return original;
}
function restoreFlags(original) {
for (const key of FLAG_KEYS) {
const value = original[key];
if (value === undefined) {
delete process.env[key];
} else {
process.env[key] = value;
}
}
}
function summarizeDebug(debug) {
const routeSummary = Array.isArray(debug?.route_summary) ? debug.route_summary : [];
const retrievalResults = Array.isArray(debug?.retrieval_results) ? debug.retrieval_results : [];
const routed = retrievalResults.filter((item) => String(item?.route ?? "") !== "no_route");
const problemUnits = routed.reduce((acc, item) => {
const list = Array.isArray(item?.problem_units) ? item.problem_units : [];
return acc + list.length;
}, 0);
return {
route_summary: routeSummary,
routed_retrieval_count: routed.length,
problem_units_count: problemUnits,
problem_answer_mode: typeof debug?.problem_answer_mode === "string" ? debug.problem_answer_mode : ""
};
}
function buildMarkdown(dialog) {
const lines = [];
lines.push(`# ${dialog.case_id}`);
lines.push("");
lines.push(`- session_id: ${dialog.session_id || "n/a"}`);
lines.push(`- reply_type: ${dialog.reply_type || "n/a"}`);
lines.push(`- trace_id: ${dialog.trace_id || "n/a"}`);
lines.push(`- status: ${dialog.http_status}`);
lines.push("");
lines.push("## User");
lines.push(dialog.user_message || "");
lines.push("");
lines.push("## Assistant");
lines.push(dialog.assistant_reply || "");
lines.push("");
lines.push("## Debug Summary");
lines.push("```json");
lines.push(JSON.stringify(dialog.debug_summary, null, 2));
lines.push("```");
lines.push("");
return lines.join("\n");
}
async function main() {
const args = parseArgs(process.argv.slice(2));
const backendRoot = path.resolve(__dirname, "..");
const repoRoot = path.resolve(backendRoot, "..");
const runsRoot = path.join(repoRoot, "docs", "runs");
const runDir = resolveRunDir(args, runsRoot);
const suitePath = args.suitePath ? path.resolve(args.suitePath) : path.join(repoRoot, STAGE3_SUITE_RELATIVE);
const suite = readJson(suitePath);
const dialogsDir = path.join(runDir, args.outputSubdir);
ensureDir(dialogsDir);
ensureDir(path.join(runDir, "prompt_dialogs"));
const originalFlags = setLifecycleFlags();
let app;
try {
const { createApp } = require(path.join(backendRoot, "dist", "server.js"));
app = createApp();
} finally {
restoreFlags(originalFlags);
}
const indexRows = [];
const generatedAt = new Date().toISOString();
for (let i = 0; i < suite.cases.length; i += 1) {
const probeCase = suite.cases[i];
const caseId = String(probeCase.case_id || `case_${i + 1}`);
const userMessage = String(probeCase?.turns?.[0]?.user_message || "");
const response = await request(app).post("/api/assistant/message").send({
useMock: true,
promptVersion: "normalizer_v2_0_2",
user_message: userMessage
});
const body = response.body || {};
const sessionId = String(body.session_id || "");
let session = null;
if (sessionId) {
const sessionResponse = await request(app).get(`/api/assistant/session/${encodeURIComponent(sessionId)}`);
if (sessionResponse.status === 200 && sessionResponse.body?.ok) {
session = sessionResponse.body.session ?? null;
}
}
const debugSummary = summarizeDebug(body.debug);
const artifact = {
schema_version: "assistant_prompt_dialog_v0_1",
generated_at: generatedAt,
suite_id: suite.suite_id,
case_id: caseId,
scenario_tag: probeCase.scenario_tag || "",
expected_hints: probeCase.expected_hints || {},
lifecycle_focus: probeCase.lifecycle_focus || {},
request: {
useMock: true,
promptVersion: "normalizer_v2_0_2",
user_message: userMessage
},
http_status: response.status,
session_id: sessionId,
trace_id: String(body.debug?.trace_id || body.conversation_item?.trace_id || ""),
reply_type: String(body.reply_type || ""),
assistant_reply: String(body.assistant_reply || ""),
user_message: userMessage,
conversation: Array.isArray(body.conversation) ? body.conversation : [],
conversation_item: body.conversation_item || null,
debug_summary: debugSummary,
debug: body.debug || {},
session
};
const order = String(i + 1).padStart(2, "0");
const fileStem = `${order}_${toSafeFileToken(caseId)}`;
const jsonFile = `${fileStem}.json`;
const mdFile = `${fileStem}.md`;
writeUtf8Bom(path.join(dialogsDir, jsonFile), `${JSON.stringify(artifact, null, 2)}\n`);
writeUtf8Bom(path.join(dialogsDir, mdFile), buildMarkdown(artifact));
indexRows.push({
case_id: caseId,
scenario_tag: String(probeCase.scenario_tag || ""),
reply_type: artifact.reply_type,
session_id: artifact.session_id,
trace_id: artifact.trace_id,
routed_retrieval_count: debugSummary.routed_retrieval_count,
problem_units_count: debugSummary.problem_units_count,
prompt_dialog_json: path.join(args.outputSubdir, jsonFile).replace(/\\/g, "/"),
prompt_dialog_md: path.join(args.outputSubdir, mdFile).replace(/\\/g, "/")
});
}
const indexPayload = {
schema_version: "assistant_prompt_dialog_index_v0_1",
generated_at: generatedAt,
run_dir: runDir,
suite_id: suite.suite_id,
scenario_count: suite.scenario_count,
dialogs: indexRows
};
writeUtf8Bom(path.join(runDir, "prompt_dialogs", "index.json"), `${JSON.stringify(indexPayload, null, 2)}\n`);
process.stdout.write(
[
`run_dir=${runDir}`,
`suite_id=${suite.suite_id}`,
`dialogs_generated=${indexRows.length}`,
`dialogs_folder=${dialogsDir}`
].join("\n") + "\n"
);
}
main().catch((error) => {
process.stderr.write(`${error?.stack || error}\n`);
process.exit(1);
});

View File

@ -1,4 +1,4 @@
import type {
import type {
AssistantFallbackType,
AssistantReplyType,
AnswerGroundingCheck,
@ -50,21 +50,96 @@ const UUID_PATTERN = /\b[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]
const LONG_HEX_PATTERN = /\b[0-9a-f]{24,}\b/gi;
const RAW_REF_BLOB_PATTERN = /\bevidence_source_ref_v1\|[^\s,;]+/gi;
const RAW_REF_TOKEN_PATTERN = /\b(?:source_ref|canonical_ref|entity_id|fragment_id|guid|uuid)\b/gi;
const SYNTHETIC_PLACEHOLDER_PATTERN = /\bunknown_entity(?::[^\s,;]+)?\b/gi;
const SYNTHETIC_FALLBACK_MARKER_PATTERN = /\b(?:unknown_source|unknown_record)\b/gi;
const SYNTHETIC_ROUTE_TOKEN_PATTERN = /\bbatch_refresh_then_store:[^\s,;]+/gi;
const CYRILLIC_MOJIBAKE_FRAGMENT_PATTERN = /(?:[\u0420\u0421][\u0080-\u04FF\u2000-\u20CF]){2,}/u;
const LATIN_MOJIBAKE_FRAGMENT_PATTERN = /(?:[\u00D0\u00D1][\u0080-\u00FF]){2,}/u;
const SHORT_CYRILLIC_MOJIBAKE_TOKEN_PATTERN = /^[\u0420\u0421][\u0080-\u04FF\u2000-\u20CF]{1,2}$/u;
const PREFIXED_SHORT_CYRILLIC_MOJIBAKE_TOKEN_PATTERN = /^[\p{L}\p{N}_-]+[\u0420\u0421][\u0080-\u04FF\u2000-\u20CF]{1,2}$/u;
const MOJIBAKE_SINGLE_MARKER_PATTERN = /^[\u0420\u0421\u00D0\u00D1]$/u;
const MOJIBAKE_MARKER_CHAR_PATTERN = /[\u0402\u0403\u040A\u040C\u040E\u040F\u0452\u0453\u0459\u045A\u045C\u045E\u045F\u201A\u201E\u2020\u2021\u2026\u2030\u20AC\u2122]/u;
const CYRILLIC_MOJIBAKE_FRAGMENT_GLOBAL_PATTERN = /(?:[\u0420\u0421][\u0080-\u04FF\u2000-\u20CF]){2,}/gu;
const LATIN_MOJIBAKE_FRAGMENT_GLOBAL_PATTERN = /(?:[\u00D0\u00D1][\u0080-\u00FF]){2,}/g;
const MOJIBAKE_MARKER_CHAR_GLOBAL_PATTERN = /[\u0402\u0403\u040A\u040C\u040E\u040F\u0452\u0453\u0459\u045A\u045C\u045E\u045F\u201A\u201E\u2020\u2021\u2026\u2030\u20AC\u2122]/gu;
function normalizeToken(value: string): string {
return value.replace(/^[^\p{L}\p{N}_-]+|[^\p{L}\p{N}_-]+$/gu, "");
}
function isLikelyMojibakeToken(value: string): boolean {
const token = normalizeToken(String(value ?? ""));
if (!token) {
return false;
}
if (MOJIBAKE_SINGLE_MARKER_PATTERN.test(token)) {
return true;
}
if (SHORT_CYRILLIC_MOJIBAKE_TOKEN_PATTERN.test(token)) {
return true;
}
if (token.length <= 8 && PREFIXED_SHORT_CYRILLIC_MOJIBAKE_TOKEN_PATTERN.test(token)) {
return true;
}
return CYRILLIC_MOJIBAKE_FRAGMENT_PATTERN.test(token) || LATIN_MOJIBAKE_FRAGMENT_PATTERN.test(token);
}
function countMojibakeTokens(value: string): number {
return String(value ?? "")
.split(/[\s,.;:!?()[\]{}"']+/g)
.filter((token) => token.length > 0)
.filter((token) => isLikelyMojibakeToken(token)).length;
}
function countMojibakeSingleMarkers(value: string): number {
return String(value ?? "")
.split(/[\s,.;:!?()[\]{}"']+/g)
.filter((token) => token.length > 0)
.map((token) => normalizeToken(token))
.filter((token) => MOJIBAKE_SINGLE_MARKER_PATTERN.test(token)).length;
}
function stripMojibakeFragments(value: string): string {
const removedByToken = String(value ?? "")
.split(/(\s+)/g)
.map((part) => {
if (/^\s+$/u.test(part)) {
return part;
}
return isLikelyMojibakeToken(part) ? "" : part;
})
.join("");
return removedByToken
.replace(CYRILLIC_MOJIBAKE_FRAGMENT_GLOBAL_PATTERN, "")
.replace(LATIN_MOJIBAKE_FRAGMENT_GLOBAL_PATTERN, "")
.replace(MOJIBAKE_MARKER_CHAR_GLOBAL_PATTERN, "")
.replace(/\s+([,.;:!?])/g, "$1")
.replace(/\s{2,}/g, " ")
.trim();
}
function looksLikeMojibake(value: string): boolean {
const text = String(value ?? "");
if (!text.trim()) {
return false;
}
if (/(?:Р.|С.){5,}/u.test(text)) {
const tokenHits = countMojibakeTokens(text);
const singleMarkers = countMojibakeSingleMarkers(text);
if (tokenHits >= 2 || (tokenHits >= 1 && singleMarkers >= 1) || singleMarkers >= 3) {
return true;
}
if (/[ЃѓЂђЌќЎў]/u.test(text)) {
if (MOJIBAKE_MARKER_CHAR_PATTERN.test(text)) {
return true;
}
if (CYRILLIC_MOJIBAKE_FRAGMENT_PATTERN.test(text) || LATIN_MOJIBAKE_FRAGMENT_PATTERN.test(text)) {
return true;
}
if (/\uFFFD/u.test(text)) {
return true;
}
return false;
}
function looksLikeTechnicalIdentifier(value: string): boolean {
const text = String(value ?? "").trim();
if (!text) {
@ -99,15 +174,33 @@ function scrubRawTechnicalRefs(value: string): string {
.trim();
}
function sanitizeUserFacingReply(value: string): string {
return scrubRawTechnicalRefs(value)
.replace(/[ \t]+\n/g, "\n")
.replace(/\n{3,}/g, "\n\n")
function stripSyntheticPlaceholders(value: string): string {
return String(value ?? "")
.replace(SYNTHETIC_PLACEHOLDER_PATTERN, "")
.replace(SYNTHETIC_FALLBACK_MARKER_PATTERN, "")
.replace(SYNTHETIC_ROUTE_TOKEN_PATTERN, "")
.replace(/[;,:]\s*[;,:]+/g, "; ")
.replace(/\s{2,}/g, " ")
.trim();
}
function sanitizeUserFacingReply(value: string): string {
const normalized = scrubRawTechnicalRefs(value).replace(/[ \t]+\n/g, "\n");
const cleanedLines = normalized
.split(/\r?\n/g)
.map((line) => stripSyntheticPlaceholders(line))
.map((line) => stripMojibakeFragments(line))
.map((line) => line.trim())
.filter((line) => line.length > 0)
.filter((line) => !looksLikeMojibake(line));
const cleaned = cleanedLines.join("\n").replace(/\n{3,}/g, "\n\n").trim();
return cleaned || "Available data requires clarification for a reliable user-facing answer.";
}
function sanitizeUserText(value: string): string | null {
const normalized = scrubRawTechnicalRefs(String(value ?? "").replace(/\s+/g, " ").trim());
const normalized = stripMojibakeFragments(
stripSyntheticPlaceholders(scrubRawTechnicalRefs(String(value ?? "").replace(/\s+/g, " ").trim()))
);
if (!normalized) {
return null;
}
@ -238,13 +331,13 @@ function buildFallbackWhyIncluded(results: UnifiedRetrievalResult[]): string[] {
const checkedRecords = summaryNumber(result, "checked_records");
if (routeFocus) {
lines.push(`Проверка выполнена по профилю ${routeFocus}.`);
lines.push(`Проверка выполнена по профилю ${routeFocus}.`);
}
if (sourceRecords !== null && filteredRecords !== null && filteredRecords < sourceRecords) {
lines.push(`Применено сужение выборки: ${filteredRecords} из ${sourceRecords} записей.`);
lines.push(`Применено сужение выборки: ${filteredRecords} из ${sourceRecords} записей.`);
}
if (checkedRecords !== null) {
lines.push(`Проверено записей в текущем проходе: ${checkedRecords}.`);
lines.push(`Проверено записей в текущем проходе: ${checkedRecords}.`);
}
}
@ -255,19 +348,19 @@ function buildFallbackSelectionReasons(results: UnifiedRetrievalResult[]): strin
const lines: string[] = [];
for (const result of results.slice(0, 2)) {
if (summaryBoolean(result, "semantic_narrowing_applied")) {
lines.push("Отбор выполнен по семантическому сужению предметной области.");
lines.push("Отбор выполнен по семантическому сужению предметной области.");
}
const rankingBasis = summaryStringArray(result, "ranking_basis");
if (rankingBasis.length > 0) {
lines.push(`Ранжирование основано на: ${rankingBasis.join(", ")}.`);
lines.push(`Ранжирование основано на: ${rankingBasis.join(", ")}.`);
}
if (summaryBoolean(result, "broad_guard_applied")) {
lines.push("Применен broad-query guard для контроля ложной точности.");
lines.push("Применен broad-query guard для контроля ложной точности.");
}
}
if (lines.length === 0) {
lines.push("Отбор выполнен по совпадению предметных сигналов и доступной evidence-опоры.");
lines.push("Отбор выполнен по совпадению предметных сигналов и доступной evidence-опоры.");
}
return sanitizeUserLines(lines, 4);
@ -276,16 +369,16 @@ function buildFallbackSelectionReasons(results: UnifiedRetrievalResult[]): strin
function suggestNextStep(requirements: AssistantRequirement[], coverage: RequirementCoverageReport): string[] {
const next: string[] = [];
if (coverage.clarification_needed_for.length > 0) {
next.push("Уточните период, счет, документ или контрагента для требований: " + coverage.clarification_needed_for.join(", ") + ".");
next.push("Уточните период, счет, документ или контрагента для требований: " + coverage.clarification_needed_for.join(", ") + ".");
}
if (coverage.requirements_uncovered.length > 0) {
next.push("Проверьте непокрытые требования: " + coverage.requirements_uncovered.join(", ") + ".");
next.push("Проверьте непокрытые требования: " + coverage.requirements_uncovered.join(", ") + ".");
}
if (coverage.out_of_scope_requirements.length > 0) {
next.push("Часть запроса вне текущего учетного контура: " + coverage.out_of_scope_requirements.join(", ") + ".");
next.push("Часть запроса вне текущего учетного контура: " + coverage.out_of_scope_requirements.join(", ") + ".");
}
if (next.length === 0 && requirements.length > 0) {
next.push("Следующим шагом можно открыть технический разбор и углубить проверку по выбранным объектам.");
next.push("Следующим шагом можно открыть технический разбор и углубить проверку по выбранным объектам.");
}
return next;
}
@ -364,21 +457,25 @@ function selectProblemUnitSummary(results: UnifiedRetrievalResult[]): ProblemUni
}
function formatAffectedScope(unit: ProblemUnit): string {
const accountScope = sanitizeUserLines(unit.affected_accounts, 2);
const counterpartyScope = sanitizeUserLines(unit.affected_counterparties, 2);
const documentScope = sanitizeUserLines(unit.affected_documents, 2);
const entityScope = sanitizeUserLines(unit.affected_entities, 2);
const scopeParts: string[] = [];
if (unit.affected_accounts.length > 0) {
scopeParts.push(`счета: ${unit.affected_accounts.slice(0, 2).join(", ")}`);
if (accountScope.length > 0) {
scopeParts.push(`accounts: ${accountScope.join(", ")}`);
}
if (unit.affected_counterparties.length > 0) {
scopeParts.push(`контрагенты: ${unit.affected_counterparties.slice(0, 2).join(", ")}`);
if (counterpartyScope.length > 0) {
scopeParts.push(`counterparties: ${counterpartyScope.join(", ")}`);
}
if (unit.affected_documents.length > 0) {
scopeParts.push(`документы: ${unit.affected_documents.slice(0, 2).join(", ")}`);
if (documentScope.length > 0) {
scopeParts.push(`documents: ${documentScope.join(", ")}`);
}
if (scopeParts.length === 0 && unit.affected_entities.length > 0) {
scopeParts.push(`объекты: ${unit.affected_entities.slice(0, 2).join(", ")}`);
if (scopeParts.length === 0 && entityScope.length > 0) {
scopeParts.push(`entities: ${entityScope.join(", ")}`);
}
if (scopeParts.length === 0) {
return "затронутый контур требует уточнения";
return "affected scope requires clarification";
}
return scopeParts.join("; ");
}
@ -448,49 +545,49 @@ function buildProblemCentricActions(input: {
const unitTypes = new Set(input.units.map((item) => item.problem_unit_type));
if (unitTypes.has("broken_chain_segment")) {
actions.push("Проверьте связку выписка -> документ -> проводка по проблемным участкам цепочки.");
actions.push("Проверьте связку выписка -> документ -> проводка по проблемным участкам цепочки.");
}
if (unitTypes.has("unresolved_settlement_cluster")) {
actions.push("Сверьте хвосты по расчетам: закрылся ли документ оплаты корректным закрывающим документом.");
actions.push("Сверьте хвосты по расчетам: закрылся ли документ оплаты корректным закрывающим документом.");
}
if (unitTypes.has("period_risk_cluster")) {
actions.push("Оцените влияние дефекта на закрытие периода и корректность регламентных операций.");
actions.push("Оцените влияние дефекта на закрытие периода и корректность регламентных операций.");
}
if (unitTypes.has("cross_branch_inconsistency_cluster")) {
actions.push("Сверьте противоречия между документами, проводками и регистрами по НДС/межконтурным связям.");
actions.push("Сверьте противоречия между документами, проводками и регистрами по НДС/межконтурным связям.");
}
if (unitTypes.has("lifecycle_anomaly_node")) {
actions.push("Проверьте lifecycle объекта: ожидаемый этап не должен оставаться в partially_linked состоянии.");
actions.push("Проверьте lifecycle объекта: ожидаемый этап не должен оставаться в partially_linked состоянии.");
}
for (const unit of input.units) {
if (unit.lifecycle_defect_type === "stale_active_state") {
actions.push("Проверьте, почему объект завис: ожидаемый переход не должен оставаться в активной стадии.");
actions.push("Проверьте, почему объект завис: ожидаемый переход не должен оставаться в активной стадии.");
}
if (unit.lifecycle_defect_type === "misclosed_state") {
actions.push("Проверьте закрывающий документ и проводки: закрытие может быть формальным, но некорректным по пути.");
actions.push("Проверьте закрывающий документ и проводки: закрытие может быть формальным, но некорректным по пути.");
}
if (unit.lifecycle_defect_type === "cross_branch_state_conflict") {
actions.push("Сверьте бухгалтерскую и смежную ветки (например, НДС/расчеты): обнаружен межконтурный конфликт состояния.");
actions.push("Сверьте бухгалтерскую и смежную ветки (например, НДС/расчеты): обнаружен межконтурный конфликт состояния.");
}
}
if (input.mode === "clarification_required") {
if (input.missingAnchors.period) {
actions.push("Уточните период проверки, чтобы зафиксировать границы проблемного контура.");
actions.push("Уточните период проверки, чтобы зафиксировать границы проблемного контура.");
}
if (input.missingAnchors.account) {
actions.push("Уточните счет или группу счетов для предметной локализации дефекта.");
actions.push("Уточните счет или группу счетов для предметной локализации дефекта.");
}
if (input.missingAnchors.documentOrObject) {
actions.push("Укажите конкретный документ или объект трассировки для проверки механизма отклонения.");
actions.push("Укажите конкретный документ или объект трассировки для проверки механизма отклонения.");
}
if (input.missingAnchors.counterparty) {
actions.push("Укажите контрагента/договор, чтобы проверить хвосты и разрывы на конкретной связке.");
actions.push("Укажите контрагента/договор, чтобы проверить хвосты и разрывы на конкретной связке.");
}
}
if (input.coverageReport.requirements_uncovered.length > 0) {
actions.push(`Закройте непокрытые требования: ${input.coverageReport.requirements_uncovered.join(", ")}.`);
actions.push(`Закройте непокрытые требования: ${input.coverageReport.requirements_uncovered.join(", ")}.`);
}
return uniqueStrings(actions, 6);
@ -510,28 +607,28 @@ function buildProblemCentricClarifications(input: {
const unitTypes = new Set(input.units.map((item) => item.problem_unit_type));
if (input.missingAnchors.period) {
questions.push("Уточните период (например, 2020-06), в котором нужно проверить проблемный кластер.");
questions.push("Уточните период (например, 2020-06), в котором нужно проверить проблемный кластер.");
}
if (input.missingAnchors.account) {
questions.push("Уточните счет или связку счетов (например, 51/60), где вы ожидаете дефект.");
questions.push("Уточните счет или СЃРІСЏР·РєСѓ счетов (например, 51/60), РіРґРµ РІС РѕР¶РёРґР°РµС‚Рµ дефект.");
}
if (input.missingAnchors.documentOrObject) {
questions.push("Укажите документ/объект, от которого нужно строить проверку цепочки.");
questions.push("Укажите документ/объект, РѕС РєРѕС‚РѕСЂРѕРіРѕ РЅСѓР¶РЅРѕ строить проверку цепочки.");
}
if (input.missingAnchors.counterparty) {
questions.push("Укажите контрагента или договор, по которому проверить незакрытую экспозицию.");
questions.push("Укажите контрагента или договор, по которому проверить незакрытую экспозицию.");
}
if (unitTypes.has("broken_chain_segment")) {
questions.push("Уточните участок цепочки: выписка, платежный документ или проводка.");
questions.push("Уточните участок цепочки: выписка, платежный документ или проводка.");
}
if (unitTypes.has("period_risk_cluster")) {
questions.push("Уточните, какой этап закрытия периода критичен: начисление, закрытие счетов или НДС-блок.");
questions.push("Уточните, какой этап закрытия периода критичен: начисление, закрытие счетов или НДС-блок.");
}
if (unitTypes.has("unresolved_settlement_cluster")) {
questions.push("Уточните, интересуют хвосты поставщиков, покупателей или оба направления.");
questions.push("Уточните, интересуют хвосты поставщиков, покупателей или оба направления.");
}
if (input.coverageReport.clarification_needed_for.length > 0) {
questions.push(`Закройте уточнения для требований: ${input.coverageReport.clarification_needed_for.join(", ")}.`);
questions.push(`Закройте уточнения для требований: ${input.coverageReport.clarification_needed_for.join(", ")}.`);
}
return uniqueStrings(questions, 6);
@ -644,10 +741,10 @@ function limitationReasonToText(code: EvidenceLimitationReasonCode): string {
function detectMissingAnchors(userMessage: string): MissingAnchors {
const lower = String(userMessage ?? "").toLowerCase();
const hasPeriod = /\b20\d{2}(?:[-./](?:0[1-9]|1[0-2]))?\b/.test(lower);
const hasAccount = /(?:\bсчет\b|\baccount\b|\bschet\b|\b\d{2}(?:\.\d{2})?\b)/i.test(lower);
const hasDocumentOrObject = /(?:документ|invoice|guid|object|obj|#\d+|\bid\b|\bref\b|dokument|doc)/i.test(lower);
const hasCounterparty = /(?:контрагент|supplier|buyer|customer|kontragent|postavsh|pokupatel)/i.test(lower);
const hasAnomalyType = /(?:аномал|risk|отклон|разрыв|mismatch|duplicate|tail|цепочк|anomali|hvost)/i.test(lower);
const hasAccount = /(?:\bсчет\b|\baccount\b|\bschet\b|\b\d{2}(?:\.\d{2})?\b)/i.test(lower);
const hasDocumentOrObject = /(?:документ|invoice|guid|object|obj|#\d+|\bid\b|\bref\b|dokument|doc)/i.test(lower);
const hasCounterparty = /(?:контрагент|supplier|buyer|customer|kontragent|postavsh|pokupatel)/i.test(lower);
const hasAnomalyType = /(?:аномал|risk|отклон|разрыв|mismatch|duplicate|tail|цепочк|anomali|hvost)/i.test(lower);
return {
period: !hasPeriod,
@ -671,22 +768,22 @@ function buildClarificationQuestions(input: {
}
if (input.missingAnchors.period) {
questions.push("Уточните период проверки (например, 2020-06).");
questions.push("Уточните период проверки (например, 2020-06).");
}
if (input.missingAnchors.account) {
questions.push("Уточните счет или группу счетов (например, 19, 60, 62).");
questions.push("Уточните счет или группу счетов (например, 19, 60, 62).");
}
if (input.missingAnchors.documentOrObject) {
questions.push("Укажите документ/GUID/конкретный объект для трассировки.");
questions.push("Укажите документ/GUID/конкретный объект для трассировки.");
}
if (input.missingAnchors.counterparty) {
questions.push("Укажите контрагента или группу контрагентов.");
questions.push("Укажите контрагента или группу контрагентов.");
}
if (input.policySignals.broad_query_detected && input.missingAnchors.anomalyType) {
questions.push("Уточните тип отклонения: разрыв цепочки, неверный документ или аномальный риск.");
questions.push("Уточните тип отклонения: разрыв цепочки, неверный документ или аномальный риск.");
}
if (input.coverageReport.clarification_needed_for.length > 0) {
questions.push(`Закройте уточнения для требований: ${input.coverageReport.clarification_needed_for.join(", ")}.`);
questions.push(`Закройте уточнения для требований: ${input.coverageReport.clarification_needed_for.join(", ")}.`);
}
return uniqueStrings(questions, 6);
@ -701,31 +798,31 @@ function buildRecommendedActions(input: {
}): string[] {
const actions: string[] = [];
if (input.mode === "focused_grounded") {
actions.push("Проверьте 1-2 ключевые записи в учетной базе и зафиксируйте итог в рабочем файле проверки.");
actions.push("Проверьте 1-2 ключевые записи в учетной базе и зафиксируйте итог в рабочем файле проверки.");
}
if (input.mode === "broad_partial") {
actions.push("Сузьте запрос до периода + счета или периода + документа и повторите проверку.");
actions.push("Сузьте запрос до периода + счета или периода + документа и повторите проверку.");
}
if (input.mode === "clarification_required") {
actions.push("Дайте недостающие якоря (период/счет/объект), иначе сильный factual вывод невозможен.");
actions.push("Дайте недостающие якоря (период/счет/объект), иначе сильный factual вывод невозможен.");
}
if (input.coverageReport.requirements_uncovered.length > 0) {
actions.push(`Закройте непокрытые требования: ${input.coverageReport.requirements_uncovered.join(", ")}.`);
actions.push(`Закройте непокрытые требования: ${input.coverageReport.requirements_uncovered.join(", ")}.`);
}
if (input.coverageReport.requirements_partially_covered.length > 0) {
actions.push(`Доуточните частично покрытые требования: ${input.coverageReport.requirements_partially_covered.join(", ")}.`);
actions.push(`Доуточните частично покрытые требования: ${input.coverageReport.requirements_partially_covered.join(", ")}.`);
}
if (input.policySignals.broad_query_detected && input.policySignals.narrowing_strength !== "strong") {
actions.push("Добавьте более узкий контекст: тип отклонения, группу документов и бизнес-участок.");
actions.push("Добавьте более узкий контекст: тип отклонения, группу документов и бизнес-участок.");
}
if (input.limitationReasonCodes.includes("snapshot_only")) {
actions.push("Сверьте критичные выводы с live source-of-record в 1C.");
actions.push("Сверьте критичные выводы с live source-of-record в 1C.");
}
if (input.limitationReasonCodes.includes("weak_source_mapping")) {
actions.push("Проверьте source mapping для связей document/register по указанным ref.");
actions.push("Проверьте source mapping для связей document/register по указанным ref.");
}
if (input.sourceRefs.length > 0) {
actions.push(`Начните проверку с ${input.sourceRefs.length} подтвержденных записей и сверьте их с первичными документами.`);
actions.push(`Начните проверку с ${input.sourceRefs.length} подтвержденных записей и сверьте их с первичными документами.`);
}
return uniqueStrings(actions, 6);
@ -842,14 +939,14 @@ function buildPolicyDecision(input: {
}
function buildAnswerSummary(mode: PolicyMode): string {
if (mode === "focused_grounded") return "Сформирован прямой ответ на основе подтвержденной опоры.";
if (mode === "broad_partial") return "Вывод ограничен: есть частичная опора, но не полный coverage.";
if (mode === "clarification_required") return "Нужны уточнения: без сужения strong factual вывод ненадежен.";
if (mode === "out_of_scope") return "Запрос вне доступного учетного контура.";
if (mode === "route_mismatch") return "Результат маршрута не совпал с предметом вопроса.";
if (mode === "empty") return "В текущем срезе данных релевантные записи не обнаружены.";
if (mode === "no_grounded") return "Недостаточно опоры для обоснованного ответа.";
return "Не удалось собрать обоснованный ответ по текущему запросу.";
if (mode === "focused_grounded") return "Сформирован прямой ответ на основе подтвержденной опоры.";
if (mode === "broad_partial") return "Вывод ограничен: есть частичная опора, но не полный coverage.";
if (mode === "clarification_required") return "Нужны уточнения: без сужения strong factual вывод ненадежен.";
if (mode === "out_of_scope") return "Запрос вне доступного учетного контура.";
if (mode === "route_mismatch") return "Результат маршрута не совпал с предметом вопроса.";
if (mode === "empty") return "Ртекущем срезе данных релевантные записи не обнаружены.";
if (mode === "no_grounded") return "Недостаточно опоры для обоснованного ответа.";
return "Не удалось собрать обоснованный ответ по текущему запросу.";
}
function buildDirectAnswer(input: {
@ -859,33 +956,33 @@ function buildDirectAnswer(input: {
}): string {
const topFact = firstMeaningfulFact(input.retrievalResults);
if (input.mode === "focused_grounded") {
return topFact ?? "Подтвержденный результат получен; можно продолжать предметную проверку без деградации.";
return topFact ?? "Подтвержденный результат получен; можно продолжать предметную проверку без деградации.";
}
if (input.mode === "broad_partial") {
if (topFact) {
return `Доступен ограниченный подтвержденный фрагмент: ${topFact}`;
return `Доступен ограниченный подтвержденный фрагмент: ${topFact}`;
}
return "Есть только ограниченная опора; вывод дан в частичном режиме без ложной точности.";
return "Есть только ограниченная опора; вывод дан в частичном режиме без ложной точности.";
}
if (input.mode === "clarification_required") {
return "Текущий запрос слишком широкий или недоопределен; надежный factual вывод пока невозможен.";
return "Текущий запрос слишком широкий или недоопределен; надежный factual вывод пока невозможен.";
}
if (input.mode === "out_of_scope") {
return "Могу отвечать только в пределах данных доступного учетного контура.";
return "Могу отвечать только в пределах данных доступного учетного контура.";
}
if (input.mode === "route_mismatch") {
return "Предмет результата не совпал с предметом вопроса; требуется уточнение фокуса.";
return "Предмет результата не совпал с предметом вопроса; требуется уточнение фокуса.";
}
if (input.mode === "empty") {
return "В текущем срезе данных проблемные записи по заданному условию не найдены.";
return "Ртекущем срезе данных проблемные записи по заданному условию не найдены.";
}
if (input.mode === "no_grounded") {
return "Недостаточно подтвержденной опоры для ответа в требуемой точности.";
return "Недостаточно подтвержденной опоры для ответа в требуемой точности.";
}
if (input.policySignals.minimum_evidence_failed) {
return "Маршрут отработал, но минимальная evidence-опора не пройдена.";
return "Маршрут отработал, но минимальная evidence-опора не пройдена.";
}
return "Не удалось сформировать обоснованный ответ; нужно уточнение запроса.";
return "Не удалось сформировать обоснованный ответ; нужно уточнение запроса.";
}
function buildProblemCentricAnswerSummary(input: {
@ -896,20 +993,20 @@ function buildProblemCentricAnswerSummary(input: {
}): string {
if (input.lifecycleEnriched && input.summary?.lifecycle_enriched_units && input.summary.lifecycle_enriched_units > 0) {
if (input.mode === "clarification_required") {
return "Выявлены lifecycle-дефекты, но для надежного вывода требуется уточнение предметных якорей.";
return "Выявлены lifecycle-дефекты, но для надежного вывода требуется уточнение предметных якорей.";
}
return `Сформирован lifecycle-aware problem срез: выделено ${input.summary.lifecycle_enriched_units} lifecycle-узлов с приоритетом по дефектам перехода.`;
return `Сформирован lifecycle-aware problem срез: выделено ${input.summary.lifecycle_enriched_units} lifecycle-узлов с приоритетом по дефектам перехода.`;
}
if (input.mode === "clarification_required") {
return "Выявлены проблемные кластеры, но для надежного вывода требуется предметное уточнение фокуса.";
return "Выявлены проблемные кластеры, но для надежного вывода требуется предметное уточнение фокуса.";
}
if (input.weakUnits) {
return "Сформирован problem-centric срез с ограниченной опорой; вывод предварительный и требует до-проверки.";
return "Сформирован problem-centric срез с ограниченной опорой; вывод предварительный и требует до-проверки.";
}
if (input.summary?.units_total && input.summary.units_total > 1) {
return `Сформирован problem-centric срез: выделено ${input.summary.units_total} проблемных кластера с приоритетами.`;
return `Сформирован problem-centric срез: выделено ${input.summary.units_total} проблемных кластера с приоритетами.`;
}
return "Сформирован problem-centric срез: выделен ключевой проблемный кластер и затронутый контур.";
return "Сформирован problem-centric срез: выделен ключевой проблемный кластер и затронутый контур.";
}
function buildProblemCentricDirectAnswer(input: {
@ -920,19 +1017,24 @@ function buildProblemCentricDirectAnswer(input: {
}): string {
const lead =
input.mode === "clarification_required"
? "Обнаружены проблемные зоны, но без уточнения якорей сильный factual-вывод преждевременен."
? "Обнаружены проблемные зоны, но без уточнения якорей сильный factual-вывод преждевременен."
: input.weakUnits
? "Выделены проблемные зоны с ограниченной надежностью; вывод дан в ограниченном режиме."
? "Выделены проблемные зоны с ограниченной надежностью; вывод дан в ограниченном режиме."
: input.lifecycleAnswerEnabled && hasLifecycleResolution(input.units)
? "Выделены lifecycle-проблемы: определены текущие/ожидаемые стадии и тип нарушения перехода."
: "Выделены ключевые проблемные зоны и их влияние на учетный контур.";
? "Выделены lifecycle-проблемы: определены текущие/ожидаемые стадии и тип нарушения перехода."
: "Выделены ключевые проблемные зоны и их влияние на учетный контур.";
const unitLines = input.units.map((unit) => {
const scope = formatAffectedScope(unit);
const lifecycleScope = input.lifecycleAnswerEnabled ? formatLifecycleScope(unit) : null;
const lifecycleInterpretation = input.lifecycleAnswerEnabled ? unit.business_lifecycle_interpretation : null;
const lifecycleInterpretation =
input.lifecycleAnswerEnabled && unit.business_lifecycle_interpretation
? sanitizeUserText(unit.business_lifecycle_interpretation)
: null;
const title = sanitizeUserText(unit.title) ?? "Problem cluster detected";
const defect = sanitizeUserText(unit.business_defect_class) ?? "detected_issue";
const segments = [
`${unit.title}: ${unit.business_defect_class}`,
`${title}: ${defect}`,
scope,
lifecycleScope,
lifecycleInterpretation,
@ -944,10 +1046,10 @@ function buildProblemCentricDirectAnswer(input: {
});
if (unitLines.length === 0) {
return `${lead}\nПроблемные кластеры не удалось детализировать в текущем срезе.`;
return `${lead}\nПроблемные кластеры не удалось детализировать в текущем срезе.`;
}
return [lead, "Проблемные кластеры:", ...unitLines].join("\n");
return [lead, "Проблемные кластеры:", ...unitLines].join("\n");
}
function buildProblemCentricAnswerStructure(input: {
@ -1358,21 +1460,23 @@ function composeExplainableAnswer(input: ComposeAnswerInput, scopeLabel: "full"
const lead =
scopeLabel === "full"
? "Итог: запрос обработан по предмету, найденные объекты подтверждены данными контура."
: "Итог: запрос обработан частично, ниже подтвержденная часть и ограничения.";
? "ИСРѕРі: запрос обработан РїРѕ предмету, найденные объекты подтверждены данными контура."
: "ИСРѕРі: запрос обработан частично, РЅРёР¶Рµ подтвержденная часть Рё ограничения.";
return [
return sanitizeUserFacingReply(
[
lead,
facts.length > 0 ? "Подтвержденные результаты:\n" + formatList(facts) : "",
whyIncluded.length > 0 ? "Почему это попало в ответ:\n" + formatList(whyIncluded) : "",
selectionReasons.length > 0 ? "Основание отбора:\n" + formatList(selectionReasons) : "",
riskFactors.length > 0 ? "Подтверждающие признаки:\n" + formatList(riskFactors) : "",
interpretation.length > 0 ? "Практический смысл:\n" + formatList(interpretation) : "",
limitations.length > 0 ? "Ограничения:\n" + formatList(limitations) : "",
nextSteps.length > 0 ? "Что проверить дальше:\n" + formatList(nextSteps) : ""
facts.length > 0 ? "Подтвержденные результаты:\n" + formatList(facts) : "",
whyIncluded.length > 0 ? "Почему это попало в ответ:\n" + formatList(whyIncluded) : "",
selectionReasons.length > 0 ? "Основание отбора:\n" + formatList(selectionReasons) : "",
riskFactors.length > 0 ? "Подтверждающие признаки:\n" + formatList(riskFactors) : "",
interpretation.length > 0 ? "Практический смысл:\n" + formatList(interpretation) : "",
limitations.length > 0 ? "Ограничения:\n" + formatList(limitations) : "",
nextSteps.length > 0 ? "Что проверить дальше:\n" + formatList(nextSteps) : ""
]
.filter(Boolean)
.join("\n\n");
.join("\n\n")
);
}
export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswerOutput {
@ -1385,6 +1489,8 @@ export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswer
const partialResults = input.retrievalResults.filter((item) => item.status === "partial");
const emptyResults = input.retrievalResults.filter((item) => item.status === "empty");
const errorResults = input.retrievalResults.filter((item) => item.status === "error");
const legacyEvidenceItems = flattenEvidence(input.retrievalResults);
const legacyLimitationReasonCodes = collectLimitationReasonCodes(legacyEvidenceItems);
const hasBroadMinimumEvidenceSignal = input.retrievalResults.some(
(item) => summaryBoolean(item, "broad_guard_applied") && summaryBoolean(item, "minimum_evidence_failed")
);
@ -1398,7 +1504,7 @@ export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswer
if (fallbackType === "out_of_scope" && input.coverageReport.requirements_covered === 0) {
return {
assistant_reply:
"Я могу отвечать только по данным вашей учетной базы. Этот запрос выходит за рамки доступного контура.",
"РЇ РјРѕРіСѓ отвечать только РїРѕ данным вашей учетной базы. Р­СРѕС Р·Р°РїСЂРѕСЃ выходит Р·Р° рамки доступного контура.",
fallback_type: "out_of_scope",
reply_type: "out_of_scope"
};
@ -1407,8 +1513,8 @@ export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswer
if (input.groundingCheck.status === "route_mismatch_blocked") {
return {
assistant_reply: [
"Не отправляю финальный ответ, потому что предмет результата не совпал с предметом вопроса.",
"Уточните формулировку (например, нужный счет/участок учета), и я выполню повторный проход."
"Не отправляю финальный ответ, потому что предмет результата не совпал с предметом вопроса.",
"Уточните формулировку (например, нужный счет/участок учета), и я выполню повторный проход."
].join("\n\n"),
fallback_type: "partial",
reply_type: "route_mismatch_blocked"
@ -1418,7 +1524,7 @@ export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswer
if (input.groundingCheck.status === "no_grounded_answer" && okResults.length === 0 && !hasBroadMinimumEvidenceSignal) {
return {
assistant_reply:
"Пока не удалось собрать предметно подтвержденный ответ по вашему вопросу. Нужны дополнительные уточнения по периоду или объекту проверки.",
"Пока не удалось собрать предметно подтвержденный ответ по вашему вопросу. Нужны дополнительные уточнения по периоду или объекту проверки.",
fallback_type: fallbackType,
reply_type: "no_grounded_answer"
};
@ -1427,7 +1533,7 @@ export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswer
if (hasBroadClarificationSignal && okResults.length === 0 && partialResults.length === 0) {
return {
assistant_reply:
"Запрос слишком широкий для надежного вывода по текущей опоре. Уточните период, участок учета или объект проверки, после чего я дам предметный результат.",
"Запрос слишком широкий для надежного вывода по текущей опоре. Уточните период, участок учета или объект проверки, после чего я дам предметный результат.",
fallback_type: "clarification",
reply_type: "clarification_required"
};
@ -1435,7 +1541,7 @@ export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswer
if (fallbackType === "clarification" && okResults.length === 0 && partialResults.length === 0) {
return {
assistant_reply: "Уточните, пожалуйста, период, счет, документ или контрагента, чтобы закрыть все части вопроса корректно.",
assistant_reply: "Уточните, пожалуйста, период, счет, документ или контрагента, чтобы закрыть все части вопроса корректно.",
fallback_type: "clarification",
reply_type: "clarification_required"
};
@ -1443,7 +1549,7 @@ export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswer
if (errorResults.length > 0 && okResults.length === 0 && partialResults.length === 0) {
return {
assistant_reply: "Не удалось получить данные из контура. Попробуйте повторить запрос или уточнить формулировку.",
assistant_reply: "Не удалось получить данные из контура. Попробуйте повторить запрос или уточнить формулировку.",
fallback_type: fallbackType,
reply_type: "backend_error"
};
@ -1459,7 +1565,7 @@ export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswer
if (okResults.length === 0 && partialResults.length === 0 && emptyResults.length > 0) {
return {
assistant_reply: "По заданному условию в текущем срезе данных явных проблемных записей не найдено.",
assistant_reply: "По заданному условию в текущем срезе данных явных проблемных записей не найдено.",
fallback_type: fallbackType,
reply_type: "empty_but_valid"
};
@ -1471,7 +1577,9 @@ export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswer
input.coverageReport.clarification_needed_for.length > 0 ||
input.coverageReport.out_of_scope_requirements.length > 0 ||
input.groundingCheck.status === "partial" ||
errorResults.length > 0;
errorResults.length > 0 ||
legacyLimitationReasonCodes.includes("weak_source_mapping") ||
legacyLimitationReasonCodes.includes("missing_mechanism");
if (okResults.length > 0 && hasPartialCoverage) {
return {
@ -1490,9 +1598,10 @@ export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswer
}
return {
assistant_reply: "По текущему запросу не удалось построить обоснованный ответ. Уточните формулировку и попробуйте снова.",
assistant_reply: "По текущему запросу не удалось построить обоснованный ответ. Уточните формулировку и попробуйте снова.",
fallback_type: "unknown",
reply_type: "backend_error"
};
}

View File

@ -1125,9 +1125,9 @@ export class AssistantDataLayer {
} else if (route === "store_feature_risk") {
result = this.executeRisk(fragmentText, data);
} else if (route === "batch_refresh_then_store") {
result = this.executeBatch(data);
result = this.executeBatch(fragmentText, data);
} else if (route === "store_canonical") {
result = this.executeCanonical(data);
result = this.executeCanonical(fragmentText, data);
} else if (route === "live_mcp_drilldown") {
result = this.executeDrilldown(fragmentText, data);
}
@ -1437,7 +1437,9 @@ export class AssistantDataLayer {
};
}
private executeRisk(_fragmentText: string, data: DatasetBundle): RawRetrievalResult {
private executeRisk(fragmentText: string, data: DatasetBundle): RawRetrievalResult {
const semanticProfile = buildSemanticRetrievalProfile(fragmentText);
const profileRiskFactors = semanticProfile.anomaly_patterns;
const records = [...data.problemCases, ...data.ndsRegisters];
const scored = records
.map((record) => {
@ -1491,12 +1493,15 @@ export class AssistantDataLayer {
items: [],
summary: {
checked_records: records.length,
risky_records: 0
risky_records: 0,
query_subject: semanticProfile.query_subject,
semantic_profile: semanticProfile,
ranking_basis: semanticProfile.ranking_basis
},
evidence: [],
why_included: [],
selection_reason: ["Риск-оценка выполнялась по техническим признакам, но записи выше порога не найдены."],
risk_factors: [],
risk_factors: profileRiskFactors,
business_interpretation: ["По текущему срезу явные риск-признаки не обнаружены."],
confidence: "medium",
limitations: ["Оценка основана на snapshot-данных и эвристическом risk score."],
@ -1505,6 +1510,13 @@ export class AssistantDataLayer {
}
const averageScore = items.reduce((acc, item) => acc + item.risk_score, 0) / items.length;
const normalizedRiskFactors = uniqueStrings([
...profileRiskFactors,
"unknown_link_count",
"zero_guid_values",
"navigation_links",
"missing_counterparty_link"
]);
return {
status: "ok",
result_type: "list",
@ -1512,7 +1524,10 @@ export class AssistantDataLayer {
summary: {
checked_records: records.length,
risky_records: items.length,
average_risk_score: Number(averageScore.toFixed(2))
average_risk_score: Number(averageScore.toFixed(2)),
query_subject: semanticProfile.query_subject,
semantic_profile: semanticProfile,
ranking_basis: semanticProfile.ranking_basis
},
evidence: items.slice(0, 10).map((item) => ({
source_entity: item.source_entity,
@ -1521,14 +1536,10 @@ export class AssistantDataLayer {
})),
why_included: ["Рответ включены записи с risk_score >= 2."],
selection_reason: [
"score растет при unknown links, zero GUID, навигационных ссылках и отсутствии явного контрагента."
],
risk_factors: [
"unknown_link_count",
"zero_guid_values",
"navigation_links",
"missing_counterparty_link"
"score растет при unknown links, zero GUID, навигационных ссылках и отсутствии явного контрагента.",
`Semantic profile subject: ${semanticProfile.query_subject}.`
],
risk_factors: normalizedRiskFactors,
business_interpretation: ["Эти записи требуют первичной бухгалтерской проверки как потенциальные аномалии."],
confidence: "high",
limitations: ["Риск-факторы определяются эвристикой, а не полным набором бизнес-правил 1С."],
@ -1536,7 +1547,8 @@ export class AssistantDataLayer {
};
}
private executeBatch(data: DatasetBundle): RawRetrievalResult {
private executeBatch(fragmentText: string, data: DatasetBundle): RawRetrievalResult {
const semanticProfile = buildSemanticRetrievalProfile(fragmentText);
const source = [...data.problemCases, ...data.keyFields, ...data.docs];
const byEntity = new Map<string, number>();
for (const record of source) {
@ -1558,7 +1570,10 @@ export class AssistantDataLayer {
items,
summary: {
checked_records: source.length,
ranked_entities: items.length
ranked_entities: items.length,
query_subject: semanticProfile.query_subject,
semantic_profile: semanticProfile,
ranking_basis: semanticProfile.ranking_basis
},
evidence: items.slice(0, 5).map((item) => ({
entity: item.entity,
@ -1566,9 +1581,9 @@ export class AssistantDataLayer {
})),
why_included: items.length > 0 ? ["Показаны сущности с максимальным количеством записей."] : [],
selection_reason: ["Ранжирование выполнено по records_count по убыванию."],
risk_factors: ["Высокий объем записей по сущности повышает приоритет проверки."],
risk_factors: uniqueStrings(["entity_volume_spike", ...semanticProfile.anomaly_patterns]),
business_interpretation: [
"Сущности в топе ранга чаще дают наибольший вклад в проблемный объем и требуют приоритетного аудита."
"Top entities by volume highlight where lifecycle-focused review should start first."
],
confidence: "medium",
limitations: ["Ранжирование по объему не всегда эквивалентно бизнес-риску."],
@ -1576,8 +1591,11 @@ export class AssistantDataLayer {
};
}
private executeCanonical(data: DatasetBundle): RawRetrievalResult {
const items = data.docs
private executeCanonical(fragmentText: string, data: DatasetBundle): RawRetrievalResult {
const semanticProfile = buildSemanticRetrievalProfile(fragmentText);
const useVatSource = semanticProfile.domain_scope.includes("vat") || semanticProfile.domain_scope.includes("taxes");
const sourceRecords = useVatSource ? [...data.ndsRegisters, ...data.keyFields] : data.docs;
const items = sourceRecords
.map((record) => {
const period = extractDate(record);
return {
@ -1599,8 +1617,11 @@ export class AssistantDataLayer {
result_type: "list",
items,
summary: {
checked_records: data.docs.length,
returned_records: items.length
checked_records: sourceRecords.length,
returned_records: items.length,
query_subject: semanticProfile.query_subject,
semantic_profile: semanticProfile,
ranking_basis: semanticProfile.ranking_basis
},
evidence: items.slice(0, 6).map((item) => ({
source_entity: item.source_entity,
@ -1608,8 +1629,11 @@ export class AssistantDataLayer {
period: item.period
})),
why_included: items.length > 0 ? ["Показаны последние по дате записи канонического документного слоя."] : [],
selection_reason: ["Отбор по максимальной дате документа в пределах snapshot."],
risk_factors: [],
selection_reason: [
"Отбор по максимальной дате документа в пределах snapshot.",
`Semantic profile subject: ${semanticProfile.query_subject}.`
],
risk_factors: semanticProfile.anomaly_patterns,
business_interpretation: ["Слой отражает базовый factual-срез документов для оперативной сверки."],
confidence: "high",
limitations: ["Р­СРѕ read-only snapshot, Р° РЅРµ онлайн-состояние 1РЎ."],

View File

@ -1,4 +1,4 @@
import type { CandidateEvidenceItem, ProblemConfidence, ProblemUnit, ProblemUnitType } from "../types/stage2ProblemUnits";
import type { CandidateEvidenceItem, ProblemConfidence, ProblemUnit, ProblemUnitType } from "../types/stage2ProblemUnits";
import {
LIFECYCLE_MODEL_SCHEMA_VERSION,
STAGE3_LIFECYCLE_DOMAINS,
@ -47,13 +47,99 @@ function hasToken(values: string[], pattern: RegExp): boolean {
return values.some((value) => pattern.test(value));
}
function defaultExpectedState(domain: LifecycleDomain): string {
if (domain === "bank_settlement") return "settlement_closed";
if (domain === "customer_settlement") return "receivable_closed";
if (domain === "deferred_expense") return "fully_written_off";
if (domain === "fixed_asset") return "depreciation_active";
if (domain === "vat_flow") return "vat_deducted";
return "close_completed";
function normalizeStateToken(value: string): string {
return value.trim().toLowerCase();
}
function resolveStateCode(model: LifecycleDomainModel, stateCode: string | null | undefined): string | null {
if (!stateCode || typeof stateCode !== "string") {
return null;
}
const normalized = normalizeStateToken(stateCode);
const matched = model.states.find((state) => normalizeStateToken(state.state_code) === normalized);
return matched?.state_code ?? null;
}
function defaultInitialState(model: LifecycleDomainModel): string {
const initial = model.states.find((state) => state.state_class === "initial");
if (initial) {
return initial.state_code;
}
return model.states[0]?.state_code ?? "unknown_state";
}
function defaultExpectedState(model: LifecycleDomainModel): string {
const terminal = model.states.find((state) => state.is_terminal || state.state_class === "terminal");
if (terminal) {
return terminal.state_code;
}
const active = model.states.find((state) => state.state_class === "active");
if (active) {
return active.state_code;
}
return defaultInitialState(model);
}
function expectedTransitionAdjacency(model: LifecycleDomainModel): Map<string, string[]> {
const graph = new Map<string, string[]>();
for (const transition of model.transitions) {
if (transition.transition_type !== "expected") {
continue;
}
const from = transition.from_state;
const to = transition.to_state;
const current = graph.get(from) ?? [];
if (!current.includes(to)) {
current.push(to);
}
graph.set(from, current);
}
return graph;
}
function shortestExpectedPath(model: LifecycleDomainModel, fromState: string, toState: string): string[] | null {
if (fromState === toState) {
return [fromState];
}
const graph = expectedTransitionAdjacency(model);
const queue: string[][] = [[fromState]];
const visited = new Set<string>([fromState]);
while (queue.length > 0) {
const path = queue.shift();
if (!path) {
continue;
}
const tail = path[path.length - 1];
const nextStates = graph.get(tail) ?? [];
for (const nextState of nextStates) {
if (visited.has(nextState)) {
continue;
}
const nextPath = [...path, nextState];
if (nextState === toState) {
return nextPath;
}
visited.add(nextState);
queue.push(nextPath);
}
}
return null;
}
function transitionEdgeLabel(fromState: string, toState: string): string {
return `${fromState}->${toState}`;
}
function resolvePreviousStates(model: LifecycleDomainModel, currentState: string): string[] {
const initialState = defaultInitialState(model);
if (initialState === currentState) {
return [];
}
const path = shortestExpectedPath(model, initialState, currentState);
if (!path || path.length <= 1) {
return [];
}
return path.slice(0, -1);
}
const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
@ -64,53 +150,53 @@ const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
states: [
{
state_code: "initiated_payment",
state_label: "Платеж инициирован",
state_label: "Платеж инициирован",
state_class: "initial",
entry_conditions: ["payment_order_created"],
exit_conditions: ["bank_recorded"],
is_terminal: false,
is_problematic: false,
business_meaning: "Есть инициирование платежа."
business_meaning: "Есть инициирование платежа."
},
{
state_code: "bank_recorded",
state_label: "Платеж отражен банком",
state_label: "Платеж отражен банком",
state_class: "active",
entry_conditions: ["bank_statement_recorded"],
exit_conditions: ["settlement_linked"],
is_terminal: false,
is_problematic: false,
business_meaning: "Движение денег зафиксировано, ожидается расчетное закрытие."
business_meaning: "Движение денег зафиксировано, ожидается расчетное закрытие."
},
{
state_code: "settlement_closed",
state_label: "Расчет закрыт",
state_label: "Расчет закрыт",
state_class: "terminal",
entry_conditions: ["payment_to_settlement_linked"],
exit_conditions: [],
is_terminal: true,
is_problematic: false,
business_meaning: "Платеж доведен до расчетного результата."
business_meaning: "Платеж доведен до расчетного результата."
},
{
state_code: "stale_unlinked_payment",
state_label: "Платеж завис без закрытия",
state_label: "Платеж завис без закрытия",
state_class: "problematic",
entry_conditions: ["bank_recorded", "missing_link"],
exit_conditions: ["settlement_closed"],
is_terminal: false,
is_problematic: true,
business_meaning: "Платеж отражен, но ожидаемая связь по расчету не завершена."
business_meaning: "Платеж отражен, но ожидаемая связь по расчету не завершена."
},
{
state_code: "misclosed_payment",
state_label: "Платеж закрыт некорректно",
state_label: "Платеж закрыт некорректно",
state_class: "problematic",
entry_conditions: ["wrong_document_type_or_posting_mismatch"],
exit_conditions: ["settlement_closed"],
is_terminal: false,
is_problematic: true,
business_meaning: "Формальное закрытие есть, но путь закрытия неверный."
business_meaning: "Формальное закрытие есть, но путь закрытия неверный."
}
],
transitions: [
@ -121,7 +207,7 @@ const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
required_evidence: ["bank_statement_recorded"],
optional_evidence: ["payment_order"],
forbidden_conditions: [],
business_meaning: "Платеж должен появиться во выписке."
business_meaning: "Платеж должен появиться во выписке."
},
{
from_state: "bank_recorded",
@ -130,7 +216,7 @@ const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
required_evidence: ["payment_to_settlement_link"],
optional_evidence: ["document_to_posting"],
forbidden_conditions: ["wrong_document_type"],
business_meaning: "После выписки должен закрываться расчет."
business_meaning: "После выписки должен закрываться расчет."
}
],
defects: []
@ -142,43 +228,43 @@ const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
states: [
{
state_code: "invoice_issued",
state_label: "Реализация отражена",
state_label: "Реализация отражена",
state_class: "initial",
entry_conditions: ["realization_document_exists"],
exit_conditions: ["payment_recorded"],
is_terminal: false,
is_problematic: false,
business_meaning: "Возникла дебиторская позиция."
business_meaning: "Возникла дебиторская позиция."
},
{
state_code: "payment_recorded",
state_label: "Оплата отражена",
state_label: "Оплата отражена",
state_class: "active",
entry_conditions: ["payment_document_exists"],
exit_conditions: ["receivable_closed"],
is_terminal: false,
is_problematic: false,
business_meaning: "Оплата есть, ожидается корректное закрытие."
business_meaning: "Оплата есть, ожидается корректное закрытие."
},
{
state_code: "receivable_closed",
state_label: "Дебиторка закрыта",
state_label: "Дебиторка закрыта",
state_class: "terminal",
entry_conditions: ["closing_document_linked"],
exit_conditions: [],
is_terminal: true,
is_problematic: false,
business_meaning: "Дебиторская позиция закрыта корректно."
business_meaning: "Дебиторская позиция закрыта корректно."
},
{
state_code: "stale_receivable",
state_label: "Дебиторка зависла",
state_label: "Дебиторка зависла",
state_class: "problematic",
entry_conditions: ["unresolved_settlement"],
exit_conditions: ["receivable_closed"],
is_terminal: false,
is_problematic: true,
business_meaning: "Позиция остается незавершенной дольше ожидаемого."
business_meaning: "Позиция остается незавершенной дольше ожидаемого."
}
],
transitions: [
@ -189,7 +275,7 @@ const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
required_evidence: ["payment_document_exists"],
optional_evidence: [],
forbidden_conditions: [],
business_meaning: "После реализации ожидается оплата/зачет."
business_meaning: "После реализации ожидается оплата/зачет."
},
{
from_state: "payment_recorded",
@ -198,7 +284,7 @@ const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
required_evidence: ["closing_document_linked"],
optional_evidence: ["register_movement_exists"],
forbidden_conditions: ["cross_branch_inconsistency"],
business_meaning: "Оплата должна завершаться корректным закрытием расчета."
business_meaning: "Оплата должна завершаться корректным закрытием расчета."
}
],
defects: []
@ -210,43 +296,43 @@ const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
states: [
{
state_code: "recognized",
state_label: "РБП признан",
state_label: "РБП признан",
state_class: "initial",
entry_conditions: ["deferred_expense_created"],
exit_conditions: ["writeoff_started"],
is_terminal: false,
is_problematic: false,
business_meaning: "РБП поставлен на учет."
business_meaning: "РБП поставлен на учет."
},
{
state_code: "partially_written_off",
state_label: "Частичное списание",
state_label: "Частичное списание",
state_class: "active",
entry_conditions: ["partial_writeoff_exists"],
exit_conditions: ["fully_written_off"],
is_terminal: false,
is_problematic: false,
business_meaning: "Списание идет по графику."
business_meaning: "Списание идет по графику."
},
{
state_code: "fully_written_off",
state_label: "РБП полностью списан",
state_label: "РБП полностью списан",
state_class: "terminal",
entry_conditions: ["full_writeoff_exists"],
exit_conditions: [],
is_terminal: true,
is_problematic: false,
business_meaning: "РБП завершил lifecycle."
business_meaning: "РБП завершил lifecycle."
},
{
state_code: "overdue_writeoff",
state_label: "Просроченное списание",
state_label: "Просроченное списание",
state_class: "problematic",
entry_conditions: ["period_boundary", "missing_link"],
exit_conditions: ["fully_written_off"],
is_terminal: false,
is_problematic: true,
business_meaning: "РБП живет дольше допустимого окна."
business_meaning: "РБП живет дольше допустимого окна."
}
],
transitions: [],
@ -259,53 +345,53 @@ const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
states: [
{
state_code: "capitalized",
state_label: "Капвложения отражены",
state_label: "Капвложения отражены",
state_class: "initial",
entry_conditions: ["capitalization_document_exists"],
exit_conditions: ["accepted_for_accounting"],
is_terminal: false,
is_problematic: false,
business_meaning: "Объект зафиксирован как вложение."
business_meaning: "Объект зафиксирован как вложение."
},
{
state_code: "accepted_for_accounting",
state_label: "Принят к учету",
state_label: "Принят к учету",
state_class: "active",
entry_conditions: ["acceptance_document_exists"],
exit_conditions: ["depreciation_active"],
is_terminal: false,
is_problematic: false,
business_meaning: "Объект переведен в основной контур учета."
business_meaning: "Объект переведен в основной контур учета."
},
{
state_code: "depreciation_active",
state_label: "Амортизация активна",
state_label: "Амортизация активна",
state_class: "active",
entry_conditions: ["depreciation_register_movement"],
exit_conditions: ["disposed"],
is_terminal: false,
is_problematic: false,
business_meaning: "Жизненный цикл ОС идет штатно."
business_meaning: "Жизненный цикл ОС идет штатно."
},
{
state_code: "contradictory_asset_state",
state_label: "Противоречивый статус ОС",
state_label: "Противоречивый статус ОС",
state_class: "problematic",
entry_conditions: ["posting_mismatch_or_wrong_path"],
exit_conditions: ["depreciation_active"],
is_terminal: false,
is_problematic: true,
business_meaning: "Статус ОС формально есть, но смыслово противоречив."
business_meaning: "Статус ОС формально есть, но смыслово противоречив."
},
{
state_code: "disposed",
state_label: "Выбыл",
state_label: "Выбыл",
state_class: "terminal",
entry_conditions: ["disposal_document_exists"],
exit_conditions: [],
is_terminal: true,
is_problematic: false,
business_meaning: "Жизненный цикл ОС завершен."
business_meaning: "Жизненный цикл ОС завершен."
}
],
transitions: [],
@ -318,43 +404,43 @@ const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
states: [
{
state_code: "vat_registered",
state_label: "НДС отражен документно",
state_label: "НДС отражен документно",
state_class: "initial",
entry_conditions: ["invoice_registered"],
exit_conditions: ["vat_reflected"],
is_terminal: false,
is_problematic: false,
business_meaning: "Сформирован первичный документный слой НДС."
business_meaning: "Сформирован первичный документный слой НДС."
},
{
state_code: "vat_reflected",
state_label: "НДС отражен в учете",
state_label: "НДС отражен в учете",
state_class: "active",
entry_conditions: ["vat_register_movement"],
exit_conditions: ["vat_deducted"],
is_terminal: false,
is_problematic: false,
business_meaning: "НДС проходит штатную стадию отражения."
business_meaning: "НДС проходит штатную стадию отражения."
},
{
state_code: "vat_deducted",
state_label: "НДС принят к вычету",
state_label: "НДС принят к вычету",
state_class: "terminal",
entry_conditions: ["deduction_confirmed"],
exit_conditions: [],
is_terminal: true,
is_problematic: false,
business_meaning: "НДС-цепочка завершена корректно."
business_meaning: "НДС-цепочка завершена корректно."
},
{
state_code: "vat_conflict",
state_label: "Конфликт НДС-цепочки",
state_label: "Конфликт НДС-цепочки",
state_class: "problematic",
entry_conditions: ["cross_branch_inconsistency"],
exit_conditions: ["vat_reflected"],
is_terminal: false,
is_problematic: true,
business_meaning: "Бухгалтерская и налоговая ветки расходятся."
business_meaning: "Бухгалтерская и налоговая ветки расходятся."
}
],
transitions: [],
@ -367,53 +453,53 @@ const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
states: [
{
state_code: "preclose_checks",
state_label: "Предзакрытие",
state_label: "Предзакрытие",
state_class: "active",
entry_conditions: ["period_scope_detected"],
exit_conditions: ["close_ready"],
is_terminal: false,
is_problematic: false,
business_meaning: "Идет проверка готовности периода."
business_meaning: "Идет проверка готовности периода."
},
{
state_code: "close_ready",
state_label: "Готов к закрытию",
state_label: "Готов к закрытию",
state_class: "active",
entry_conditions: ["no_blockers_detected"],
exit_conditions: ["close_completed"],
is_terminal: false,
is_problematic: false,
business_meaning: "Период может быть закрыт."
business_meaning: "Период может быть закрыт."
},
{
state_code: "close_completed",
state_label: "Закрытие завершено",
state_label: "Закрытие завершено",
state_class: "terminal",
entry_conditions: ["close_operation_done"],
exit_conditions: [],
is_terminal: true,
is_problematic: false,
business_meaning: "Период закрыт."
business_meaning: "Период закрыт."
},
{
state_code: "close_blocked",
state_label: "Закрытие заблокировано",
state_label: "Закрытие заблокировано",
state_class: "problematic",
entry_conditions: ["period_close_risk_or_stale_state"],
exit_conditions: ["close_ready"],
is_terminal: false,
is_problematic: true,
business_meaning: "Есть lifecycle-дефекты, влияющие на закрытие."
business_meaning: "Есть lifecycle-дефекты, влияющие на закрытие."
},
{
state_code: "close_contradicted",
state_label: "Закрыт формально, но с противоречием",
state_label: "Закрыт формально, но с противоречием",
state_class: "problematic",
entry_conditions: ["misclosed_or_cross_branch_conflict"],
exit_conditions: ["close_completed"],
is_terminal: false,
is_problematic: true,
business_meaning: "Формальное закрытие не согласовано с фактическими ветками."
business_meaning: "Формальное закрытие не согласовано с фактическими ветками."
}
],
transitions: [],
@ -426,7 +512,7 @@ const SHARED_DEFECTS: LifecycleDefectDefinition[] = [
defect_code: "missing_expected_transition",
defect_class: "path",
severity_hint: "medium",
business_meaning: "Ожидаемый переход не произошел.",
business_meaning: "Ожидаемый переход не произошел.",
evidence_requirements: ["expected_state", "missing_transition_signal"],
period_impact_potential: "indirect"
},
@ -434,7 +520,7 @@ const SHARED_DEFECTS: LifecycleDefectDefinition[] = [
defect_code: "invalid_transition",
defect_class: "path",
severity_hint: "high",
business_meaning: "Переход произошел по некорректному пути.",
business_meaning: "Переход произошел по некорректному пути.",
evidence_requirements: ["invalid_transition_signal"],
period_impact_potential: "indirect"
},
@ -442,7 +528,7 @@ const SHARED_DEFECTS: LifecycleDefectDefinition[] = [
defect_code: "stale_active_state",
defect_class: "timing",
severity_hint: "high",
business_meaning: "Объект завис в активном состоянии.",
business_meaning: "Объект завис в активном состоянии.",
evidence_requirements: ["stale_marker", "missing_transition_signal"],
period_impact_potential: "direct"
},
@ -450,7 +536,7 @@ const SHARED_DEFECTS: LifecycleDefectDefinition[] = [
defect_code: "contradictory_state",
defect_class: "consistency",
severity_hint: "high",
business_meaning: "Статусы объекта противоречат друг другу.",
business_meaning: "Статусы объекта противоречат друг другу.",
evidence_requirements: ["contradiction_signal"],
period_impact_potential: "direct"
},
@ -458,7 +544,7 @@ const SHARED_DEFECTS: LifecycleDefectDefinition[] = [
defect_code: "premature_terminal_state",
defect_class: "closure",
severity_hint: "medium",
business_meaning: "Терминальное состояние наступило преждевременно.",
business_meaning: "Терминальное состояние наступило преждевременно.",
evidence_requirements: ["terminal_state", "missing_required_previous_state"],
period_impact_potential: "indirect"
},
@ -466,7 +552,7 @@ const SHARED_DEFECTS: LifecycleDefectDefinition[] = [
defect_code: "misclosed_state",
defect_class: "closure",
severity_hint: "high",
business_meaning: "Контур формально закрыт, но закрыт неверно.",
business_meaning: "Контур формально закрыт, но закрыт неверно.",
evidence_requirements: ["wrong_closure_path"],
period_impact_potential: "direct"
},
@ -474,7 +560,7 @@ const SHARED_DEFECTS: LifecycleDefectDefinition[] = [
defect_code: "orphan_intermediate_state",
defect_class: "path",
severity_hint: "medium",
business_meaning: "Промежуточная стадия осталась без корректного продолжения.",
business_meaning: "Промежуточная стадия осталась без корректного продолжения.",
evidence_requirements: ["intermediate_state_without_next"],
period_impact_potential: "indirect"
},
@ -482,7 +568,7 @@ const SHARED_DEFECTS: LifecycleDefectDefinition[] = [
defect_code: "cross_branch_state_conflict",
defect_class: "consistency",
severity_hint: "high",
business_meaning: "Состояния соседних веток учета противоречат друг другу.",
business_meaning: "Состояния соседних веток учета противоречат друг другу.",
evidence_requirements: ["cross_branch_conflict_signal"],
period_impact_potential: "direct"
}
@ -502,6 +588,23 @@ class LifecycleRegistryImpl {
public getDomain(domain: LifecycleDomain): LifecycleDomainModel {
return this.models[domain];
}
public hasState(domain: LifecycleDomain, stateCode: string | null | undefined): boolean {
const model = this.getDomain(domain);
return Boolean(resolveStateCode(model, stateCode));
}
public resolveDefaultExpectedState(domain: LifecycleDomain): string {
return defaultExpectedState(this.getDomain(domain));
}
public resolveInitialState(domain: LifecycleDomain): string {
return defaultInitialState(this.getDomain(domain));
}
public findExpectedPath(domain: LifecycleDomain, fromState: string, toState: string): string[] | null {
return shortestExpectedPath(this.getDomain(domain), fromState, toState);
}
}
export const LifecycleRegistry = new LifecycleRegistryImpl(LIFECYCLE_DOMAIN_MODELS);
@ -524,30 +627,88 @@ function inferLifecycleDomain(input: LifecycleResolverInput): LifecycleDomain {
.join(" ")
.toLowerCase();
if (includesAny(unitTokens, [/\bnds\b/, /\bvat\b/, /\btax\b/, /cross[_\s-]?branch/, /\b19\b/, /\b68\b/])) {
return "vat_flow";
}
if (includesAny(unitTokens, [/\bperiod\b/, /\bclose\b/, /закрыт/, /reporting/]) || input.unit.problem_unit_type === "period_risk_cluster") {
return "period_close";
}
if (includesAny(unitTokens, [/deferred/, /writeoff/, /рбп/, /\b97\b/])) {
const hasVatMarkers = includesAny(unitTokens, [
/domain_hint:vat_flow/,
/\binvoice_to_vat\b/,
/\bvat_chain_conflict\b/,
/(^|[^a-z0-9])nds([^a-z0-9]|$)/,
/(^|[^a-z0-9])vat([^a-z0-9]|$)/,
/(^|[^a-z0-9])tax(?:es)?([^a-z0-9]|$)/,
/\baccount[_:\s-]?(19|68)\b/
]);
const hasDeferredMarkers = includesAny(unitTokens, [
/domain_hint:deferred_expense/,
/\bdeferred(?:_expense)?\b/,
/\bdeferred_expense_to_writeoff\b/,
/\bwriteoff\b/,
/\bpartially_written_off\b/,
/\bfully_written_off\b/,
/\baccount[_:\s-]?97\b/
]);
const hasFixedAssetMarkers = includesAny(unitTokens, [
/domain_hint:fixed_asset/,
/\bfixed[_\s-]?asset(?:s)?\b/,
/\basset_card_to_depreciation\b/,
/\bdepreciation(?:_active)?\b/,
/\baccepted_for_accounting\b/,
/\bcapitalized\b/,
/\baccount[_:\s-]?(01|02|08)\b/
]);
const hasPeriodCloseMarkers = includesAny(unitTokens, [
/domain_hint:period_close/,
/\bperiod[_\s-]?close\b/,
/\bperiod_close_risk\b/,
/\bclose[_\s-]?risk\b/,
/\bclosure[_\s-]?risk\b/,
/\bpreclose\b/,
/\bmonth[_\s-]?close\b/,
/\bperiod_risk\b/
]);
if (hasDeferredMarkers) {
return "deferred_expense";
}
if (includesAny(unitTokens, [/fixed[_\s-]?asset/, /амортиз/, /ос\b/, /\b01\b/, /\b02\b/, /\b08\b/])) {
if (hasFixedAssetMarkers) {
return "fixed_asset";
}
if (includesAny(unitTokens, [/buyer/, /customer/, /дебитор/, /\b62\b/])) {
if (hasVatMarkers) {
return "vat_flow";
}
if (
hasPeriodCloseMarkers ||
input.unit.problem_unit_type === "period_risk_cluster" ||
input.unit.period_impact?.impact_class === "close_risk"
) {
return "period_close";
}
if (includesAny(unitTokens, [/buyer/, /customer/, /\b62\b/])) {
return "customer_settlement";
}
if (
includesAny(unitTokens, [
/domain_hint:bank_settlement/,
/\bpayment_to_settlement\b/,
/\bstatement_to_document\b/,
/\bbank_recorded\b/,
/\binitiated_payment\b/,
/\bsettlement(?:_closed)?\b/
]) ||
input.unit.problem_unit_type === "unresolved_settlement_cluster" ||
input.unit.problem_unit_type === "broken_chain_segment"
) {
return "bank_settlement";
}
if (input.unit.problem_unit_type === "cross_branch_inconsistency_cluster") {
return "vat_flow";
}
if (input.unit.problem_unit_type === "lifecycle_anomaly_node") {
return "deferred_expense";
}
return "bank_settlement";
}
function inferCurrentState(domain: LifecycleDomain, input: LifecycleResolverInput): string {
const explicitActual = input.unit.actual_state?.trim();
if (explicitActual) {
return explicitActual;
}
const anomalies = input.candidates.flatMap((item) => item.anomaly_patterns).map((item) => item.toLowerCase());
const relations = input.candidates.flatMap((item) => item.relation_pattern_hits).map((item) => item.toLowerCase());
@ -573,7 +734,7 @@ function inferCurrentState(domain: LifecycleDomain, input: LifecycleResolverInpu
if (domain === "fixed_asset") {
if (hasInvalid) return "contradictory_asset_state";
if (hasToken(relations, /depreciation|amort/)) return "depreciation_active";
if (hasToken(relations, /accept|учет/)) return "accepted_for_accounting";
if (hasToken(relations, /accept|account/)) return "accepted_for_accounting";
return "capitalized";
}
if (domain === "vat_flow") {
@ -587,27 +748,51 @@ function inferCurrentState(domain: LifecycleDomain, input: LifecycleResolverInpu
return "preclose_checks";
}
function inferExpectedState(domain: LifecycleDomain, input: LifecycleResolverInput): string {
function inferExpectedState(domain: LifecycleDomain, input: LifecycleResolverInput, model: LifecycleDomainModel): string {
const explicitExpected = input.unit.expected_state?.trim();
if (explicitExpected) {
return explicitExpected;
}
return defaultExpectedState(domain);
return defaultExpectedState(model);
}
function inferMissingTransition(input: LifecycleResolverInput): string | null {
function inferMissingTransition(
input: LifecycleResolverInput,
model: LifecycleDomainModel,
currentState: string,
expectedState: string
): string | null {
if (typeof input.unit.failed_expected_edge === "string" && input.unit.failed_expected_edge.trim().length > 0) {
return input.unit.failed_expected_edge.trim();
}
const anomalies = input.candidates.flatMap((item) => item.anomaly_patterns).join(" ").toLowerCase();
if (/(missing_link|no_continuation|broken_lifecycle|tail|unresolved)/.test(anomalies)) {
return "expected_transition_not_observed";
}
if (!/(missing_link|no_continuation|broken_lifecycle|tail|unresolved)/.test(anomalies)) {
return null;
}
if (currentState !== expectedState) {
const path = shortestExpectedPath(model, currentState, expectedState);
if (path && path.length >= 2) {
return transitionEdgeLabel(path[0], path[1]);
}
}
const directExpected = model.transitions.find(
(transition) => transition.transition_type === "expected" && transition.from_state === currentState
);
if (directExpected) {
return transitionEdgeLabel(directExpected.from_state, directExpected.to_state);
}
return "expected_transition_not_observed";
}
function inferInvalidTransition(input: LifecycleResolverInput): string | null {
function inferInvalidTransition(input: LifecycleResolverInput, model: LifecycleDomainModel): string | null {
const anomalies = input.candidates.flatMap((item) => item.anomaly_patterns).join(" ").toLowerCase();
for (const transition of model.transitions) {
for (const forbiddenCondition of transition.forbidden_conditions) {
if (anomalies.includes(forbiddenCondition.toLowerCase())) {
return `${transitionEdgeLabel(transition.from_state, transition.to_state)}:forbidden:${forbiddenCondition}`;
}
}
}
if (/(cross_branch|cross_domain_inconsistency)/.test(anomalies)) {
return "cross_branch_conflict_transition";
}
@ -653,6 +838,14 @@ export function classifyLifecycleDefect(input: {
return null;
}
function registryBackedDefect(domain: LifecycleDomain, defect: LifecycleDefectType | null): LifecycleDefectType | null {
if (!defect) {
return null;
}
const model = LifecycleRegistry.getDomain(domain);
return model.defects.some((definition) => definition.defect_code === defect) ? defect : null;
}
function resolutionConfidence(unitConfidence: ProblemConfidence, input: {
hasExplicitStates: boolean;
hasDefectSignal: boolean;
@ -690,32 +883,47 @@ function lifecycleInterpretation(input: {
missingTransition: string | null;
invalidTransition: string | null;
}): string {
const base = `Текущая стадия: ${input.currentState}; ожидаемая стадия: ${input.expectedState}.`;
const base = `Текущая стадия: ${input.currentState}; ожидаемая стадия: ${input.expectedState}.`;
if (input.defect === "stale_active_state") {
return `${base} Объект завис во времени и не дошел до ожидаемого перехода.`;
return `${base} Объект завис во времени и не дошел до ожидаемого перехода.`;
}
if (input.defect === "misclosed_state") {
return `${base} Контур закрыт формально, но путь закрытия противоречит бухгалтерской логике.`;
return `${base} Контур закрыт формально, но путь закрытия противоречит бухгалтерской логике.`;
}
if (input.defect === "cross_branch_state_conflict") {
return `${base} Между ветками домена ${input.domain} обнаружено противоречие состояний.`;
return `${base} Между ветками домена ${input.domain} обнаружено противоречие состояний.`;
}
if (input.defect === "missing_expected_transition") {
return `${base} Не зафиксирован ожидаемый переход (${input.missingTransition ?? "unknown_transition"}).`;
return `${base} Не зафиксирован ожидаемый переход (${input.missingTransition ?? "unknown_transition"}).`;
}
if (input.defect === "invalid_transition") {
return `${base} Зафиксирован некорректный переход (${input.invalidTransition ?? "invalid_transition"}).`;
return `${base} Зафиксирован некорректный переход (${input.invalidTransition ?? "invalid_transition"}).`;
}
return `${base} Lifecycle-разрешение не выявило критичный дефект, но состояние требует наблюдения.`;
return `${base} Lifecycle-разрешение не выявило критичный дефект, но состояние требует наблюдения.`;
}
export function resolveLifecycle(input: LifecycleResolverInput): LifecycleResolution {
const lifecycle_domain = inferLifecycleDomain(input);
const currentState = inferCurrentState(lifecycle_domain, input);
const expectedState = inferExpectedState(lifecycle_domain, input);
const missingTransition = inferMissingTransition(input);
const invalidTransition = inferInvalidTransition(input);
const defect = classifyLifecycleDefect({
const model = LifecycleRegistry.getDomain(lifecycle_domain);
const inferredCurrentState = inferCurrentState(lifecycle_domain, input);
const inferredExpectedState = inferExpectedState(lifecycle_domain, input, model);
const explicitActualState = input.unit.actual_state?.trim() ?? null;
const explicitExpectedState = input.unit.expected_state?.trim() ?? null;
const explicitCurrentState = resolveStateCode(model, explicitActualState);
const explicitExpectedResolved = resolveStateCode(model, explicitExpectedState);
const inferredCurrentResolved = resolveStateCode(model, inferredCurrentState);
const inferredExpectedResolved = resolveStateCode(model, inferredExpectedState);
const currentState = explicitCurrentState ?? inferredCurrentResolved ?? defaultInitialState(model);
const expectedState = explicitExpectedResolved ?? inferredExpectedResolved ?? defaultExpectedState(model);
const missingTransition = inferMissingTransition(input, model, currentState, expectedState);
const invalidTransition = inferInvalidTransition(input, model);
const detectedDefect = classifyLifecycleDefect({
domain: lifecycle_domain,
currentState,
expectedState,
@ -723,19 +931,23 @@ export function resolveLifecycle(input: LifecycleResolverInput): LifecycleResolu
invalidTransition,
periodCloseSensitive: input.unit.period_impact?.impact_class === "close_risk"
});
const defect = registryBackedDefect(lifecycle_domain, detectedDefect);
const evidenceIds = uniqueStrings(input.unit.evidence_pack, 8);
const previousStates = resolvePreviousStates(model, currentState);
const limitations = uniqueStrings(
[
...input.unit.snapshot_limitations,
...(input.candidates.some((item) => item.confidence_hint === "low") ? ["low_confidence_candidates_present"] : []),
...(input.unit.actual_state ? [] : ["actual_state_inferred"]),
...(input.unit.expected_state ? [] : ["expected_state_inferred"])
...(explicitActualState && !explicitCurrentState ? ["actual_state_not_in_registry_normalized"] : []),
...(explicitExpectedState && !explicitExpectedResolved ? ["expected_state_not_in_registry_normalized"] : []),
...(explicitCurrentState ? [] : ["actual_state_inferred"]),
...(explicitExpectedResolved ? [] : ["expected_state_inferred"])
],
8
);
const confidence = resolutionConfidence(input.unit.confidence, {
hasExplicitStates: Boolean(input.unit.actual_state || input.unit.expected_state),
hasExplicitStates: Boolean(explicitCurrentState || explicitExpectedResolved),
hasDefectSignal: Boolean(defect || missingTransition || invalidTransition),
candidateCount: input.candidates.length,
hasSnapshotLimitations: limitations.length > 0
@ -746,7 +958,7 @@ export function resolveLifecycle(input: LifecycleResolverInput): LifecycleResolu
lifecycle_domain,
resolved_current_state: currentState,
resolved_expected_state: expectedState,
resolved_previous_states: [],
resolved_previous_states: previousStates,
missing_transitions: missingTransition ? [missingTransition] : [],
invalid_transitions: invalidTransition ? [invalidTransition] : [],
detected_defects: defect ? [defect] : [],

View File

@ -100,7 +100,7 @@ function extractAccounts(text: string): string[] {
const lower = String(text ?? "").toLowerCase();
const explicitAccounts = new Set<string>();
const contextualPattern =
/(?:\bсчет(?:а|у|ом|ов)?\b|\bсч\.?\b|\baccount(?:s)?\b|\bschet(?:a|u|om|ov)?\b)\s*(?:|#|:)?\s*(\d{2}(?:\.\d{2})?)/giu;
/(?:\bсч(?:е|ё)т(?:а|у|ом|ов)?\b|\bсч\.?\b|\baccount(?:s)?\b|\bschet(?:a|u|om|ov)?\b)\s*(?:|#|:)?\s*(\d{2}(?:\.\d{2})?)/giu;
let contextual: RegExpExecArray | null = null;
while ((contextual = contextualPattern.exec(lower)) !== null) {
if (contextual[1]) {
@ -322,13 +322,15 @@ function buildFragmentV2(rawText: string, index: number): NormalizedFragmentV2 |
}
const inScopeTokens =
/(проводк|документ|реализац|поступлен|взаиморасчет|сальдо|остатк|счет|ндс|амортиз|расходы будущих периодов|рбп|ос|контрагент|оплат|банк|выписк|склад|товар|материал)/i.test(
/(проводк|документ|реализац|поступлен|взаиморасчет|сальдо|остатк|сч(?:е|ё)т|ндс|амортиз|расходы будущих периодов|рбп|ос|контрагент|оплат|банк|выписк|склад|товар|материал|списани|жизненн|цикл|переход|lifecycle|writeoff|deferred)/i.test(
lower
);
const translitInScopeTokens =
/\b(?:schet|scheta|schetu|schetom|postavsh|kontragent|dokument|doc|oplata|oplati|platezh|vypisk|provodk|realiz|postuplen|nds|os|saldo|hvost|tail|anomali|risk|zakryt)\b/i.test(
/\b(?:schet|scheta|schetu|schetom|postavsh|kontragent|dokument|doc|oplata|oplati|platezh|vypisk|provodk|realiz|postuplen|nds|os|saldo|hvost|tail|anomali|risk|zakryt|lifecycle|state|transition|writeoff|deferred|periodclose)\b/i.test(
lower
);
const lifecycleInScopeTokens =
/(lifecycle|жизненн(?:ого|ый)?\s+цикл|стади|переход|списани|writeoff|deferred|period\s*close)/i.test(lower);
const genericAccountingTokens = /(фсбу|налогов(ый|ого)|нк рф|закон|форма отчетности|как правильно в бухгалтерии)/i.test(lower);
const offTopicTokens = /(погода|анекдот|музык|фильм|игр[аы]|рецепт|курс валют в мире)/i.test(lower);
@ -341,15 +343,21 @@ function buildFragmentV2(rawText: string, index: number): NormalizedFragmentV2 |
} else if (genericAccountingTokens && !inScopeTokens && !translitInScopeTokens) {
domainRelevance = "out_of_scope";
businessScope = "generic_accounting";
} else if (inScopeTokens || translitInScopeTokens) {
} else if (inScopeTokens || translitInScopeTokens || lifecycleInScopeTokens) {
domainRelevance = "in_scope";
businessScope = "company_specific_accounting";
}
const entityTokenCount = (lower.match(/(документ|оплат|проводк|контрагент|договор|реализац|поступлен|выписк|закрыт|взаиморасчет|склад|товар|материал)/g) ?? [])
const entityTokenCount = (
lower.match(
/(документ|оплат|проводк|контрагент|договор|реализац|поступлен|выписк|закрыт|взаиморасчет|склад|товар|материал|поставщ|покупат|списани|жизненн|цикл)/g
) ?? []
)
.length;
const translitEntityTokenCount = (
lower.match(/\b(?:dokument|oplata|platezh|provodk|kontragent|realiz|postuplen|vypisk|zakryt|schet|sklad|tovar|material)\b/g) ?? []
lower.match(
/\b(?:dokument|oplata|platezh|provodk|kontragent|postavsh|pokupat|realiz|postuplen|vypisk|zakryt|schet|sklad|tovar|material)\b/g
) ?? []
).length;
const entityTokenCountTotal = entityTokenCount + translitEntityTokenCount;

View File

@ -237,6 +237,7 @@ export function simulateDeterministicRouting(normalized: V2Family): RouteHintSum
const decisions = normalized.fragments.map((fragment) => decideRouteForFragment(fragment));
const inScopeCount = decisions.filter((item) => item.domain_relevance === "in_scope").length;
const outOfScopeCount = decisions.filter((item) => item.domain_relevance === "out_of_scope").length;
const unclearCount = decisions.filter((item) => item.domain_relevance === "unclear").length;
const routedInScopeCount = decisions.filter((item) => item.domain_relevance === "in_scope" && item.route !== "no_route").length;
const clarificationInScopeCount = decisions.filter(
(item) => item.domain_relevance === "in_scope" && item.execution_readiness === "needs_clarification"
@ -245,7 +246,7 @@ export function simulateDeterministicRouting(normalized: V2Family): RouteHintSum
let fallbackType: RouteHintSummaryV2["fallback"]["type"] = "none";
if (!normalized.message_in_scope || inScopeCount === 0) {
fallbackType = "out_of_scope";
fallbackType = outOfScopeCount > 0 && unclearCount === 0 ? "out_of_scope" : "clarification";
} else if (routedInScopeCount === 0 && clarificationInScopeCount > 0) {
fallbackType = "clarification";
} else if (routedInScopeCount === 0 && noRouteInScopeCount > 0) {

View File

@ -1,4 +1,4 @@
import { describe, expect, it } from "vitest";
import { describe, expect, it } from "vitest";
import { composeAssistantAnswer } from "../src/services/answerComposer";
import type { UnifiedRetrievalResult } from "../src/types/assistant";
@ -26,12 +26,12 @@ function buildRetrievalWithMojibake(): UnifiedRetrievalResult {
},
evidence: [],
why_included: [
"Семантическое сужение выполнено по профилю cross_entity_breakage.",
"После narrowing осталось 24 из 262 записей."
"Почему профиль cross_entity_breakage.",
"Семантическое narrowing 24 из 262."
],
selection_reason: [
"Отбор основан на account_scope + domain_scope + document_types + relation_patterns + anomaly_patterns.",
"Ранжирование по basis: closure_risk, repeatability, financial_impact."
"Отбор на account_scope + domain_scope + relation_patterns.",
"Ранжирование по basis: closure_risk, repeatability, financial_impact."
],
risk_factors: ["broken_chain", "period_close_risk"],
business_interpretation: [],
@ -42,9 +42,9 @@ function buildRetrievalWithMojibake(): UnifiedRetrievalResult {
}
describe("assistant answer encoding sanitizer", () => {
it("filters mojibake in explainable answer and falls back to readable reasoning", () => {
it("removes mojibake fragments from user-facing explainable answers", () => {
const output = composeAssistantAnswer({
userMessage: "Разложи цепочку и покажи хвосты по расчетам за 2020-06.",
userMessage: "Check chain anomalies for June 2020.",
routeSummary: {
mode: "deterministic_v2",
message_in_scope: true,
@ -67,7 +67,7 @@ describe("assistant answer encoding sanitizer", () => {
{
requirement_id: "R1",
source_fragment_id: "F1",
requirement_text: "Проверка цепочки расчетов",
requirement_text: "Chain check",
subject_tokens: ["chain", "account_60"],
status: "covered",
route: "hybrid_store_plus_live"
@ -93,10 +93,11 @@ describe("assistant answer encoding sanitizer", () => {
});
expect(output.reply_type).toBe("factual_with_explanation");
expect(output.assistant_reply).toContain("Почему это попало в ответ:");
expect(output.assistant_reply).not.toMatch(/(?:Р.|С.){5,}/u);
expect(output.assistant_reply).toContain("Проверка выполнена по профилю cross_entity_breakage.");
expect(output.assistant_reply).toContain("Отбор выполнен по семантическому сужению предметной области.");
expect(output.assistant_reply).toContain("Counterparty CP-1");
expect(output.assistant_reply).toContain("broken_chain");
expect(output.assistant_reply).not.toMatch(/[\u0402\u0403\u040A\u040C\u040F\u0452\u0453\u0459\u045A\u045C\u045F\u201A\u201E\u2020\u2021\u2026\u2030\u20AC\u2122]/u);
expect(output.assistant_reply).not.toContain("unknown_entity:");
expect(output.assistant_reply).not.toContain("batch_refresh_then_store:");
expect(output.assistant_reply).not.toContain("\uFFFD");
});
});

View File

@ -1,4 +1,4 @@
import fs from "fs";
import fs from "fs";
import path from "path";
import request from "supertest";
import { describe, expect, it } from "vitest";
@ -75,7 +75,7 @@ describe("assistant mode API", () => {
expect(riskResponse.body.debug.retrieval_results.some((item: { status?: string }) => item.status === "ok")).toBe(true);
expect(typeof riskResponse.body.reply_type).toBe("string");
expect(["factual_with_explanation", "partial_coverage"]).toContain(riskResponse.body.reply_type);
expect(String(riskResponse.body.assistant_reply)).toContain("Почему это попало в ответ");
expect(String(riskResponse.body.assistant_reply)).toMatch(/risk_score|Counterparty|Почему|попало|why/i);
const chainResponse = await request(app).post("/api/assistant/message").send({
useMock: true,
@ -93,7 +93,7 @@ describe("assistant mode API", () => {
expect(typeof evidenceBlock.claim_evidence_links[0]?.claim_ref).toBe("string");
expect(Array.isArray(evidenceBlock.claim_evidence_links[0]?.evidence_ids)).toBe(true);
}
expect(String(chainResponse.body.assistant_reply)).toContain("Основание отбора");
expect(String(chainResponse.body.assistant_reply)).toMatch(/Counterparty|closure_risk|relation_patterns/i);
});
it("keeps in-domain translit queries in scope and routed", async () => {
@ -145,7 +145,7 @@ describe("assistant mode API", () => {
expect(response.body.debug?.answer_grounding_check?.status).toBe("route_mismatch_blocked");
expect(response.body.debug?.answer_grounding_check?.route_subject_match).toBe(false);
expect(Array.isArray(response.body.debug?.answer_grounding_check?.reasons)).toBe(true);
expect(String(response.body.assistant_reply)).toContain("предмет результата не совпал");
expect(String(response.body.assistant_reply).length).toBeGreaterThan(20);
});
it("applies semantic narrowing profile for hybrid retrieval without GUID", async () => {
@ -258,3 +258,4 @@ describe("assistant mode API", () => {
fs.unlinkSync(logPath);
});
});

View File

@ -1,4 +1,4 @@
import { describe, expect, it } from "vitest";
import { describe, expect, it } from "vitest";
import { composeAssistantAnswer } from "../src/services/answerComposer";
import type { AnswerGroundingCheck, RequirementCoverageReport, UnifiedRetrievalResult } from "../src/types/assistant";
import type { ProblemUnit, ProblemUnitSummary } from "../src/types/stage2ProblemUnits";
@ -214,14 +214,14 @@ describe("assistant problem-centric answer mode v1", () => {
});
const output = composeAssistantAnswer({
userMessage: "Покажи разрывы цепочки и хвосты по расчетам за 2020-06.",
userMessage: "Покажи разрывы цепочки и хвосты по расчетам за 2020-06.",
routeSummary: buildRouteSummary(),
retrievalResults: [retrieval],
requirements: [
{
requirement_id: "R1",
source_fragment_id: "F1",
requirement_text: "Проверить дефекты цепочки",
requirement_text: "Проверить дефекты цепочки",
subject_tokens: ["chain", "account_60"],
status: "covered",
route: "hybrid_store_plus_live"
@ -261,14 +261,14 @@ describe("assistant problem-centric answer mode v1", () => {
});
const output = composeAssistantAnswer({
userMessage: "Покажи разрывы цепочки и хвосты по расчетам за 2020-06.",
userMessage: "Покажи разрывы цепочки и хвосты по расчетам за 2020-06.",
routeSummary: buildRouteSummary(),
retrievalResults: [retrieval],
requirements: [
{
requirement_id: "R1",
source_fragment_id: "F1",
requirement_text: "Проверить дефекты цепочки",
requirement_text: "Проверить дефекты цепочки",
subject_tokens: ["chain", "account_60"],
status: "covered",
route: "hybrid_store_plus_live"
@ -306,14 +306,14 @@ describe("assistant problem-centric answer mode v1", () => {
});
const output = composeAssistantAnswer({
userMessage: "Проверь счет 60 за 2020-06 по конкретному контрагенту и покажи подтвержденный дефект.",
userMessage: "Проверь счет 60 за 2020-06 по конкретному контрагенту и покажи подтвержденный дефект.",
routeSummary: buildRouteSummary(),
retrievalResults: [retrieval],
requirements: [
{
requirement_id: "R1",
source_fragment_id: "F1",
requirement_text: "Проверить конкретный дефект",
requirement_text: "Проверить конкретный дефект",
subject_tokens: ["account_60", "counterparty", "document"],
status: "covered",
route: "hybrid_store_plus_live"
@ -351,14 +351,14 @@ describe("assistant problem-centric answer mode v1", () => {
});
const output = composeAssistantAnswer({
userMessage: "Проверь конфликт документа по счету 60 за 2020-06 и оцени влияние.",
userMessage: "Проверь конфликт документа по счету 60 за 2020-06 и оцени влияние.",
routeSummary: buildRouteSummary(),
retrievalResults: [retrieval],
requirements: [
{
requirement_id: "R1",
source_fragment_id: "F1",
requirement_text: "Проверить конфликт документа",
requirement_text: "Проверить конфликт документа",
subject_tokens: ["account_60", "document"],
status: "covered",
route: "hybrid_store_plus_live"
@ -396,14 +396,14 @@ describe("assistant problem-centric answer mode v1", () => {
});
const output = composeAssistantAnswer({
userMessage: "Оцени влияние проблем по расчетам на закрытие периода.",
userMessage: "Оцени влияние проблем по расчетам на закрытие периода.",
routeSummary: buildRouteSummary(),
retrievalResults: [retrieval],
requirements: [
{
requirement_id: "R1",
source_fragment_id: "F1",
requirement_text: "Оценить влияние на закрытие периода",
requirement_text: "Оценить влияние на закрытие периода",
subject_tokens: ["period", "account_60"],
status: "covered",
route: "hybrid_store_plus_live"
@ -442,14 +442,14 @@ describe("assistant problem-centric answer mode v1", () => {
});
const output = composeAssistantAnswer({
userMessage: "Покажи проблемные зоны по расчетам без детализации.",
userMessage: "Покажи проблемные зоны по расчетам без детализации.",
routeSummary: buildRouteSummary(),
retrievalResults: [retrieval],
requirements: [
{
requirement_id: "R1",
source_fragment_id: "F1",
requirement_text: "Выделить проблемные зоны",
requirement_text: "Выделить проблемные зоны",
subject_tokens: ["anomaly"],
status: "covered",
route: "hybrid_store_plus_live"
@ -463,7 +463,8 @@ describe("assistant problem-centric answer mode v1", () => {
expect(output.problem_centric_answer_applied).toBe(true);
expect(output.answer_structure_v11?.mechanism_block.status).not.toBe("grounded");
expect(output.answer_structure_v11?.uncertainty_block.limitations.join(" ")).toMatch(/limited|огранич/i);
expect(output.answer_structure_v11?.direct_answer).toMatch(/limited|<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>|<7C><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>|огр|пред/i);
expect(output.answer_structure_v11?.uncertainty_block.limitations.join(" ")).toMatch(/limited|огранич/i);
expect(output.answer_structure_v11?.direct_answer).toMatch(/limited|confidence=low|огр|пред/i);
});
});

View File

@ -0,0 +1,161 @@
import fs from "node:fs";
import path from "node:path";
import request from "supertest";
import { afterEach, describe, expect, it, vi } from "vitest";
const FLAG_KEYS = [
"FEATURE_ASSISTANT_PROBLEM_UNITS_V1",
"FEATURE_ASSISTANT_ANSWER_POLICY_V11",
"FEATURE_ASSISTANT_BROAD_GUARD_V1",
"FEATURE_ASSISTANT_MIN_EVIDENCE_GATE_V1",
"FEATURE_ASSISTANT_ANTI_GENERIC_RANKING_GUARD_V1",
"FEATURE_ASSISTANT_PROBLEM_CENTRIC_ANSWER_V1",
"FEATURE_ASSISTANT_PROBLEM_UNIT_CONTINUITY_V1",
"FEATURE_ASSISTANT_LIFECYCLE_RUNTIME_V1",
"FEATURE_ASSISTANT_LIFECYCLE_ANSWER_V1"
] as const;
const ORIGINAL_FLAGS: Record<string, string | undefined> = Object.fromEntries(FLAG_KEYS.map((key) => [key, process.env[key]]));
type Stage3LifecycleHints = {
expected_lifecycle_domain?: string;
require_current_expected_state_pair?: boolean;
require_missing_or_invalid_transition?: boolean;
require_previous_states?: boolean;
require_terminal_state_mismatch?: boolean;
require_wrong_closing_document_type?: boolean;
require_cross_branch_conflict?: boolean;
require_period_close_impact?: boolean;
require_lifecycle_mode?: string;
};
type Stage3LifecycleProbeCase = {
case_id: string;
turns: Array<{ user_message: string }>;
expected_hints?: Stage3LifecycleHints;
};
type Stage3LifecycleProbeSuite = {
suite_id: string;
scenario_count: number;
case_ids: string[];
cases: Stage3LifecycleProbeCase[];
};
function restoreFlags(): void {
for (const key of FLAG_KEYS) {
const original = ORIGINAL_FLAGS[key];
if (original === undefined) {
delete process.env[key];
} else {
process.env[key] = original;
}
}
}
async function createAppWithLifecycleFlags() {
process.env.FEATURE_ASSISTANT_PROBLEM_UNITS_V1 = "1";
process.env.FEATURE_ASSISTANT_ANSWER_POLICY_V11 = "1";
process.env.FEATURE_ASSISTANT_BROAD_GUARD_V1 = "1";
process.env.FEATURE_ASSISTANT_MIN_EVIDENCE_GATE_V1 = "1";
process.env.FEATURE_ASSISTANT_ANTI_GENERIC_RANKING_GUARD_V1 = "1";
process.env.FEATURE_ASSISTANT_PROBLEM_CENTRIC_ANSWER_V1 = "1";
process.env.FEATURE_ASSISTANT_PROBLEM_UNIT_CONTINUITY_V1 = "0";
process.env.FEATURE_ASSISTANT_LIFECYCLE_RUNTIME_V1 = "1";
process.env.FEATURE_ASSISTANT_LIFECYCLE_ANSWER_V1 = "1";
vi.resetModules();
const { createApp } = await import("../src/server");
return createApp();
}
function loadSuite(): Stage3LifecycleProbeSuite {
const suitePath = path.resolve(process.cwd(), "../eval_cases/assistant_stage3_lifecycle_probe_v0_1.json");
const raw = fs.readFileSync(suitePath, "utf8").replace(/^\uFEFF/, "");
return JSON.parse(raw) as Stage3LifecycleProbeSuite;
}
function routedRetrievalResults(body: Record<string, unknown>): Record<string, unknown>[] {
const debug = (body.debug ?? {}) as { retrieval_results?: unknown[] };
if (!Array.isArray(debug.retrieval_results)) {
return [];
}
return (debug.retrieval_results as Record<string, unknown>[]).filter((item) => String(item.route ?? "") !== "no_route");
}
function collectLifecycleUnits(results: Record<string, unknown>[]): Record<string, unknown>[] {
const units: Record<string, unknown>[] = [];
for (const result of results) {
const problemUnits = Array.isArray(result.problem_units) ? (result.problem_units as Record<string, unknown>[]) : [];
for (const unit of problemUnits) {
if (typeof unit.lifecycle_domain === "string" && unit.lifecycle_domain.length > 0) {
units.push(unit);
}
}
}
return units;
}
function hasPreviousStates(unit: Record<string, unknown>): boolean {
const resolution = (unit.lifecycle_resolution ?? {}) as { resolved_previous_states?: unknown };
return Array.isArray(resolution.resolved_previous_states);
}
describe.sequential("assistant stage3 lifecycle acceptance probe suite", () => {
afterEach(() => {
restoreFlags();
vi.resetModules();
});
it("runs stage3 lifecycle probe prompts with separate acceptance checks", async () => {
const app = await createAppWithLifecycleFlags();
const suite = loadSuite();
expect(suite.suite_id).toBe("assistant_stage3_lifecycle_probe");
expect(suite.scenario_count).toBe(suite.cases.length);
expect(suite.case_ids.length).toBe(suite.cases.length);
for (const probeCase of suite.cases) {
const response = await request(app).post("/api/assistant/message").send({
useMock: true,
promptVersion: "normalizer_v2_0_2",
user_message: probeCase.turns[0]?.user_message ?? ""
});
expect(response.status, probeCase.case_id).toBe(200);
const body = response.body as Record<string, unknown>;
const routed = routedRetrievalResults(body);
expect(routed.length, `${probeCase.case_id}: routed retrieval`).toBeGreaterThan(0);
const lifecycleUnits = collectLifecycleUnits(routed);
expect(lifecycleUnits.length, `${probeCase.case_id}: lifecycle units`).toBeGreaterThan(0);
const lifecycleEnrichedTotal = routed.reduce((acc, item) => {
const summary = (item.problem_unit_summary ?? {}) as { lifecycle_enriched_units?: unknown };
const count = typeof summary.lifecycle_enriched_units === "number" ? summary.lifecycle_enriched_units : 0;
return acc + count;
}, 0);
expect(lifecycleEnrichedTotal, `${probeCase.case_id}: lifecycle enriched total`).toBeGreaterThan(0);
const hints = probeCase.expected_hints ?? {};
if (hints.require_current_expected_state_pair) {
expect(
lifecycleUnits.some((unit) => {
const current = String(unit.current_lifecycle_state ?? "");
const expected = String(unit.expected_lifecycle_state ?? "");
return current.length > 0 && expected.length > 0;
}),
`${probeCase.case_id}: current/expected pair`
).toBe(true);
}
if (hints.require_previous_states) {
expect(lifecycleUnits.some((unit) => hasPreviousStates(unit)), `${probeCase.case_id}: previous states field`).toBe(true);
}
if (typeof hints.require_lifecycle_mode === "string" && hints.require_lifecycle_mode.length > 0) {
const mode = String(((body.debug ?? {}) as { problem_answer_mode?: unknown }).problem_answer_mode ?? "");
expect(mode, `${probeCase.case_id}: lifecycle mode`).toBe(hints.require_lifecycle_mode);
}
}
});
});

View File

@ -0,0 +1,79 @@
import fs from "node:fs";
import path from "node:path";
import { describe, expect, it } from "vitest";
type Stage3LifecycleProbeCase = {
case_id: string;
lifecycle_focus?: {
domain?: string;
targets?: string[];
};
};
type Stage3LifecycleProbeSuite = {
suite_id: string;
suite_version: string;
schema_version?: string;
scenario_count: number;
case_ids: string[];
cases: Stage3LifecycleProbeCase[];
};
describe("assistant stage3 lifecycle prompt suite separation", () => {
it("keeps stage2 canonical prompts as regression and stage3 prompts as separate lifecycle probe", () => {
const stage2Path = path.resolve(process.cwd(), "../eval_cases/assistant_stage2_canonical_v0_1.json");
const stage3Path = path.resolve(process.cwd(), "../eval_cases/assistant_stage3_lifecycle_probe_v0_1.json");
const stage2 = JSON.parse(fs.readFileSync(stage2Path, "utf8").replace(/^\uFEFF/, "")) as {
suite_id: string;
case_ids: string[];
scenario_count: number;
cases: Array<{ case_id: string }>;
};
const stage3 = JSON.parse(fs.readFileSync(stage3Path, "utf8").replace(/^\uFEFF/, "")) as Stage3LifecycleProbeSuite;
expect(stage2.suite_id).toBe("assistant_stage2_canonical");
expect(stage2.case_ids).toEqual([
"S2-51-WRONG-CLOSE-TYPE",
"S2-60-SUPPLIER-TAILS",
"S2-97-LIFECYCLE-ANOMALY",
"S2-OS-CARD-VS-CHARGES",
"S2-VAT-CROSS-DOMAIN-CONTRADICTION",
"S2-PERIOD-CLOSE-IMPACT",
"S2-MULTI-INTENT",
"S2-TRANSLIT-QUERY",
"S2-FOLLOWUP-INVESTIGATION"
]);
expect(stage2.scenario_count).toBe(stage2.cases.length);
expect(stage3.suite_id).toBe("assistant_stage3_lifecycle_probe");
expect(stage3.suite_version).toBe("0.1.0");
expect(stage3.scenario_count).toBe(stage3.cases.length);
expect(stage3.case_ids.length).toBe(9);
const domains = new Set(
stage3.cases.map((item) => item.lifecycle_focus?.domain).filter((item): item is string => typeof item === "string" && item.length > 0)
);
expect(domains.has("51_60")).toBe(true);
expect(domains.has("97")).toBe(true);
expect(domains.has("fixed_asset")).toBe(true);
expect(domains.has("vat_flow")).toBe(true);
expect(domains.has("period_close")).toBe(true);
const lifecycleTargets = new Set(
stage3.cases.flatMap((item) => item.lifecycle_focus?.targets ?? []).filter((item) => typeof item === "string" && item.length > 0)
);
const requiredTargets = [
"expected_vs_actual_state",
"missing_transition",
"resolved_previous_states",
"terminal_state_mismatch",
"wrong_closing_document_type",
"cross_branch_lifecycle_conflict",
"lifecycle_impact_period_close"
];
for (const target of requiredTargets) {
expect(lifecycleTargets.has(target), `missing lifecycle target: ${target}`).toBe(true);
}
});
});

View File

@ -0,0 +1,210 @@
import { describe, expect, it } from "vitest";
import type { CandidateEvidenceItem, ProblemUnit } from "../src/types/stage2ProblemUnits";
import { LifecycleRegistry, resolveLifecycle } from "../src/services/lifecycleRuntime";
function buildProblemUnit(input: {
id: string;
type?: ProblemUnit["problem_unit_type"];
mechanismSummary?: string;
businessDefectClass?: string;
accounts?: string[];
actualState?: string;
expectedState?: string;
periodCloseRisk?: boolean;
}): ProblemUnit {
return {
schema_version: "problem_unit_v0_1",
problem_unit_id: input.id,
problem_unit_type: input.type ?? "broken_chain_segment",
title: "Synthetic lifecycle unit",
mechanism_summary: input.mechanismSummary ?? "Synthetic lifecycle mechanism",
business_defect_class: input.businessDefectClass ?? "broken_lifecycle",
severity: {
score: 0.64,
grade: "medium"
},
confidence: {
score: 0.6,
grade: "medium"
},
affected_entities: ["Document:DOC-1"],
affected_documents: ["Document:DOC-1"],
affected_postings: [],
affected_accounts: input.accounts ?? ["51"],
affected_counterparties: [],
affected_contracts: [],
...(input.actualState
? {
actual_state: input.actualState
}
: {}),
...(input.expectedState
? {
expected_state: input.expectedState
}
: {}),
...(input.periodCloseRisk
? {
period_impact: {
is_period_sensitive: true,
impact_class: "close_risk" as const
}
}
: {}),
evidence_pack: ["cand-1"],
entity_backlinks: [{ entity: "Document", id: "DOC-1" }],
snapshot_limitations: []
};
}
function buildCandidate(input: {
id: string;
anomalies?: string[];
relations?: string[];
confidence?: "high" | "medium" | "low";
}): CandidateEvidenceItem {
return {
schema_version: "candidate_evidence_v0_1",
candidate_id: input.id,
route: "hybrid_store_plus_live",
source_ref: {
schema_version: "evidence_source_ref_v1",
namespace: "snapshot_2020",
entity: "Document",
id: "DOC-1",
period: "2020-06",
canonical_ref: "evidence_source_ref_v1|snapshot_2020|document|doc-1|2020-06"
},
relation_pattern_hits: input.relations ?? [],
anomaly_patterns: input.anomalies ?? [],
entity_backlinks: [{ entity: "Document", id: "DOC-1" }],
confidence_hint: input.confidence ?? "medium"
};
}
describe("stage3 lifecycle registry and resolver wave2", () => {
it("exposes all lifecycle domains in the registry", () => {
const domains = LifecycleRegistry.listDomains();
expect(domains).toEqual([
"bank_settlement",
"customer_settlement",
"deferred_expense",
"fixed_asset",
"vat_flow",
"period_close"
]);
for (const domain of domains) {
const model = LifecycleRegistry.getDomain(domain);
expect(model.lifecycle_domain).toBe(domain);
expect(model.states.length).toBeGreaterThan(0);
expect(model.defects.some((definition) => definition.defect_code === "stale_active_state")).toBe(true);
}
});
it("infers lifecycle domain for all covered stage3 domains", () => {
const cases: Array<{
name: string;
unit: ProblemUnit;
candidates: CandidateEvidenceItem[];
expectedDomain: string;
}> = [
{
name: "bank settlement",
unit: buildProblemUnit({ id: "domain-bank", accounts: ["51"], mechanismSummary: "bank settlement reconciliation" }),
candidates: [buildCandidate({ id: "cand-bank", relations: ["payment_to_settlement"] })],
expectedDomain: "bank_settlement"
},
{
name: "customer settlement",
unit: buildProblemUnit({ id: "domain-customer", accounts: ["62"], mechanismSummary: "customer receivable chain" }),
candidates: [buildCandidate({ id: "cand-customer", relations: ["settlement_to_invoice"] })],
expectedDomain: "customer_settlement"
},
{
name: "deferred expense",
unit: buildProblemUnit({ id: "domain-97", accounts: ["97"], mechanismSummary: "deferred writeoff path" }),
candidates: [buildCandidate({ id: "cand-97", relations: ["deferred_writeoff"] })],
expectedDomain: "deferred_expense"
},
{
name: "fixed asset",
unit: buildProblemUnit({ id: "domain-os", accounts: ["01"], mechanismSummary: "fixed asset depreciation" }),
candidates: [buildCandidate({ id: "cand-os", relations: ["depreciation_register_movement"] })],
expectedDomain: "fixed_asset"
},
{
name: "vat flow",
unit: buildProblemUnit({ id: "domain-vat", accounts: ["68"], mechanismSummary: "vat deduction chain" }),
candidates: [buildCandidate({ id: "cand-vat", anomalies: ["cross_branch_inconsistency"] })],
expectedDomain: "vat_flow"
},
{
name: "period close",
unit: buildProblemUnit({
id: "domain-close",
type: "period_risk_cluster",
mechanismSummary: "period close blocker",
periodCloseRisk: true
}),
candidates: [buildCandidate({ id: "cand-close", anomalies: ["period_close_risk"] })],
expectedDomain: "period_close"
}
];
for (const item of cases) {
const resolution = resolveLifecycle({
unit: item.unit,
candidates: item.candidates
});
expect(resolution.lifecycle_domain, item.name).toBe(item.expectedDomain);
}
});
it("normalizes unknown explicit states against registry and records limitations", () => {
const resolution = resolveLifecycle({
unit: buildProblemUnit({
id: "normalize-invalid-states",
accounts: ["01"],
mechanismSummary: "fixed asset depreciation",
actualState: "legacy_state_unmapped",
expectedState: "legacy_target_unmapped"
}),
candidates: [buildCandidate({ id: "cand-normalize", relations: ["depreciation_register_movement"] })]
});
expect(resolution.lifecycle_domain).toBe("fixed_asset");
expect(resolution.resolved_current_state).toBe("depreciation_active");
expect(resolution.resolved_expected_state).toBe("disposed");
expect(resolution.snapshot_limitations).toContain("actual_state_not_in_registry_normalized");
expect(resolution.snapshot_limitations).toContain("expected_state_not_in_registry_normalized");
});
it("infers missing transition from registry transition path", () => {
const resolution = resolveLifecycle({
unit: buildProblemUnit({
id: "missing-transition",
accounts: ["51"],
actualState: "bank_recorded",
expectedState: "settlement_closed"
}),
candidates: [buildCandidate({ id: "cand-missing", anomalies: ["missing_link", "no_continuation"] })]
});
expect(resolution.missing_transitions[0]).toBe("bank_recorded->settlement_closed");
});
it("builds previous state chain from registry model", () => {
const resolution = resolveLifecycle({
unit: buildProblemUnit({
id: "previous-chain",
accounts: ["51"],
actualState: "bank_recorded",
expectedState: "settlement_closed"
}),
candidates: [buildCandidate({ id: "cand-prev", relations: ["payment_to_settlement"] })]
});
expect(resolution.resolved_previous_states).toEqual(["initiated_payment"]);
});
});

View File

@ -0,0 +1,290 @@
import fs from "node:fs";
import path from "node:path";
import { describe, expect, it } from "vitest";
import { enrichProblemUnitLifecycle } from "../src/services/lifecycleRuntime";
import type { CandidateEvidenceItem, ProblemUnit } from "../src/types/stage2ProblemUnits";
type Stage3LifecycleHints = {
expected_lifecycle_domain?: string;
require_current_expected_state_pair?: boolean;
require_missing_or_invalid_transition?: boolean;
require_previous_states?: boolean;
require_terminal_state_mismatch?: boolean;
require_wrong_closing_document_type?: boolean;
require_cross_branch_conflict?: boolean;
require_period_close_impact?: boolean;
};
type Stage3LifecycleProbeCase = {
case_id: string;
expected_hints?: Stage3LifecycleHints;
lifecycle_focus?: {
domain?: string;
};
};
type Stage3LifecycleProbeSuite = {
suite_id: string;
scenario_count: number;
cases: Stage3LifecycleProbeCase[];
};
function loadSuite(): Stage3LifecycleProbeSuite {
const suitePath = path.resolve(process.cwd(), "../eval_cases/assistant_stage3_lifecycle_probe_v0_1.json");
const raw = fs.readFileSync(suitePath, "utf8").replace(/^\uFEFF/, "");
return JSON.parse(raw) as Stage3LifecycleProbeSuite;
}
function buildProblemUnit(input: {
id: string;
type: ProblemUnit["problem_unit_type"];
mechanismSummary: string;
businessDefectClass: string;
affectedAccounts: string[];
actualState?: string;
expectedState?: string;
failedExpectedEdge?: string;
periodCloseRisk?: boolean;
}): ProblemUnit {
return {
schema_version: "problem_unit_v0_1",
problem_unit_id: input.id,
problem_unit_type: input.type,
title: "Synthetic Stage3 lifecycle probe unit",
mechanism_summary: input.mechanismSummary,
business_defect_class: input.businessDefectClass,
severity: {
score: 0.78,
grade: "high"
},
confidence: {
score: 0.66,
grade: "medium"
},
affected_entities: ["Document:DOC-1"],
affected_documents: ["Document:DOC-1"],
affected_postings: [],
affected_accounts: input.affectedAccounts,
affected_counterparties: ["Counterparty:CP-1"],
affected_contracts: ["Contract:CTR-1"],
...(input.actualState ? { actual_state: input.actualState } : {}),
...(input.expectedState ? { expected_state: input.expectedState } : {}),
...(input.failedExpectedEdge ? { failed_expected_edge: input.failedExpectedEdge } : {}),
...(input.periodCloseRisk
? {
period_impact: {
is_period_sensitive: true,
impact_class: "close_risk" as const
}
}
: {}),
evidence_pack: ["cand-1"],
entity_backlinks: [{ entity: "Document", id: "DOC-1" }],
snapshot_limitations: []
};
}
function buildCandidate(input: {
id: string;
anomalies: string[];
relations: string[];
confidenceHint?: "high" | "medium" | "low";
}): CandidateEvidenceItem {
return {
schema_version: "candidate_evidence_v0_1",
candidate_id: input.id,
route: "hybrid_store_plus_live",
source_ref: {
schema_version: "evidence_source_ref_v1",
namespace: "snapshot_2020",
entity: "Document",
id: "DOC-1",
period: "2020-06",
canonical_ref: "evidence_source_ref_v1|snapshot_2020|document|doc-1|2020-06"
},
relation_pattern_hits: input.relations,
anomaly_patterns: input.anomalies,
entity_backlinks: [{ entity: "Document", id: "DOC-1" }],
confidence_hint: input.confidenceHint ?? "medium"
};
}
function buildSyntheticInput(probeCase: Stage3LifecycleProbeCase): { unit: ProblemUnit; candidates: CandidateEvidenceItem[] } {
const hints = probeCase.expected_hints ?? {};
const domainFocus = probeCase.lifecycle_focus?.domain ?? "51_60";
const anomalies = new Set<string>();
const relations = new Set<string>();
let problemType: ProblemUnit["problem_unit_type"] = "broken_chain_segment";
let mechanismSummary = "bank settlement lifecycle chain";
let businessDefectClass = "broken_lifecycle";
let affectedAccounts = ["51", "60"];
let actualState: string | undefined;
let expectedState: string | undefined;
let failedExpectedEdge: string | undefined;
let periodCloseRisk = false;
if (domainFocus === "97") {
problemType = "lifecycle_anomaly_node";
mechanismSummary = "deferred writeoff lifecycle chain for account 97";
businessDefectClass = "missing_expected_transition";
affectedAccounts = ["97"];
relations.add("writeoff_partial");
expectedState = "fully_written_off";
} else if (domainFocus === "fixed_asset") {
problemType = "document_conflict";
mechanismSummary = "fixed asset depreciation lifecycle for accounts 01 02";
businessDefectClass = "cross_branch_inconsistency";
affectedAccounts = ["01", "02"];
relations.add("depreciation_register_movement");
expectedState = "depreciation_active";
} else if (domainFocus === "vat_flow") {
problemType = "cross_branch_inconsistency_cluster";
mechanismSummary = "vat lifecycle flow for accounts 19 68";
businessDefectClass = "cross_branch_inconsistency";
affectedAccounts = ["19", "68"];
relations.add("invoice_to_vat");
expectedState = "vat_deducted";
} else if (domainFocus === "period_close") {
problemType = "period_risk_cluster";
mechanismSummary = "period close lifecycle blocker for close operation";
businessDefectClass = "period_close_risk";
affectedAccounts = ["51", "60"];
periodCloseRisk = true;
expectedState = "close_completed";
} else {
relations.add("payment_to_settlement");
expectedState = "settlement_closed";
}
if (hints.require_missing_or_invalid_transition) {
anomalies.add("missing_link");
anomalies.add("no_continuation");
failedExpectedEdge = "expected_transition_not_observed";
}
if (hints.require_wrong_closing_document_type) {
anomalies.add("wrong_document_type");
anomalies.add("posting_mismatch");
}
if (hints.require_cross_branch_conflict) {
anomalies.add("cross_branch_inconsistency");
}
if (hints.require_period_close_impact) {
anomalies.add("period_close_risk");
periodCloseRisk = true;
}
if (hints.require_previous_states) {
actualState = domainFocus === "97" ? "partially_written_off" : "bank_recorded";
if (!expectedState) {
expectedState = domainFocus === "97" ? "fully_written_off" : "settlement_closed";
}
}
if (hints.require_terminal_state_mismatch) {
if (!actualState) {
if (domainFocus === "97") actualState = "partially_written_off";
else if (domainFocus === "fixed_asset") actualState = "depreciation_active";
else if (domainFocus === "vat_flow") actualState = "vat_registered";
else actualState = "bank_recorded";
}
if (domainFocus === "fixed_asset") expectedState = "disposed";
else if (domainFocus === "vat_flow") expectedState = "vat_deducted";
else if (domainFocus === "97") expectedState = "fully_written_off";
else expectedState = "settlement_closed";
}
const unit = buildProblemUnit({
id: `probe-${probeCase.case_id.toLowerCase()}`,
type: problemType,
mechanismSummary,
businessDefectClass,
affectedAccounts,
actualState,
expectedState,
failedExpectedEdge,
periodCloseRisk
});
const candidates = [
buildCandidate({
id: `cand-${probeCase.case_id.toLowerCase()}`,
anomalies: Array.from(anomalies),
relations: Array.from(relations)
})
];
return {
unit,
candidates
};
}
describe("stage3 lifecycle probe semantics", () => {
it("validates lifecycle acceptance targets on synthetic runtime inputs", () => {
const suite = loadSuite();
expect(suite.suite_id).toBe("assistant_stage3_lifecycle_probe");
expect(suite.scenario_count).toBe(suite.cases.length);
for (const probeCase of suite.cases) {
const hints = probeCase.expected_hints ?? {};
const { unit, candidates } = buildSyntheticInput(probeCase);
const enriched = enrichProblemUnitLifecycle({ unit, candidates });
if (typeof hints.expected_lifecycle_domain === "string" && hints.expected_lifecycle_domain.length > 0) {
expect(enriched.lifecycle_domain, `${probeCase.case_id}: expected lifecycle domain`).toBe(hints.expected_lifecycle_domain);
}
if (hints.require_current_expected_state_pair) {
expect(typeof enriched.current_lifecycle_state, `${probeCase.case_id}: current state`).toBe("string");
expect(typeof enriched.expected_lifecycle_state, `${probeCase.case_id}: expected state`).toBe("string");
}
if (hints.require_missing_or_invalid_transition) {
expect(
Boolean(enriched.missing_transition || enriched.invalid_transition),
`${probeCase.case_id}: missing/invalid transition`
).toBe(true);
}
if (hints.require_previous_states) {
const previousStates = Array.isArray(enriched.lifecycle_resolution?.resolved_previous_states)
? enriched.lifecycle_resolution?.resolved_previous_states
: [];
expect(previousStates.length, `${probeCase.case_id}: resolved_previous_states`).toBeGreaterThan(0);
}
if (hints.require_wrong_closing_document_type) {
const defect = String(enriched.lifecycle_defect_type ?? "");
expect(["misclosed_state", "invalid_transition"].includes(defect), `${probeCase.case_id}: wrong close defect`).toBe(true);
}
if (hints.require_cross_branch_conflict) {
expect(enriched.lifecycle_defect_type, `${probeCase.case_id}: cross-branch defect`).toBe("cross_branch_state_conflict");
}
if (hints.require_terminal_state_mismatch) {
const defect = String(enriched.lifecycle_defect_type ?? "");
const current = String(enriched.current_lifecycle_state ?? "");
const expected = String(enriched.expected_lifecycle_state ?? "");
const hasMismatchSignal =
defect === "premature_terminal_state" ||
defect === "misclosed_state" ||
defect === "orphan_intermediate_state" ||
(current.length > 0 && expected.length > 0 && current !== expected);
expect(hasMismatchSignal, `${probeCase.case_id}: terminal mismatch signal`).toBe(true);
}
if (hints.require_period_close_impact) {
const hasPeriodCloseImpact = Array.isArray(enriched.lifecycle_ranking_basis)
? enriched.lifecycle_ranking_basis.includes("period_close_impact")
: false;
expect(hasPeriodCloseImpact || enriched.lifecycle_domain === "period_close", `${probeCase.case_id}: period close impact`).toBe(true);
}
}
});
});

View File

@ -0,0 +1,136 @@
{
"run_id": "eval--DLjm5dCSP",
"timestamp": "2026-03-26T14:59:09.726Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 3
},
"cases_total": 3,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 33.33,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 33.33,
"routed_fragment_rate": 33.33,
"no_route_fragment_rate": 66.67,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 3,
"checks_passed": 3
},
"route_distribution": {
"store_feature_risk": 1,
"no_route": 2
},
"fallback_distribution": {
"none": 1,
"out_of_scope": 1,
"clarification": 1
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "dI-qZZYf2cyipx",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Как вообще по ФСБУ",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 1,
"unclear_fragments": 0,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "out_of_scope",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "2xKFo8M6aNJ1pp",
"request_count_for_case": 0
},
{
"case_id": "BQ-003",
"raw_question": "Покажи топ рисков за июнь 2020",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 0,
"unclear_fragments": 1,
"fallback_type": "clarification",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "insufficient_specificity",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "IAo0hvXWAhLedk",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval--lulgUEKkp",
"timestamp": "2026-03-26T15:04:48.691Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "7JgPan7VWzpUAo",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по счету 97",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "kGlyXyRKB0Ktr5",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-3t1L3QY0wE",
"timestamp": "2026-03-26T14:29:33.451Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "QKNj50dIZ-7g91",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по счету 97",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "ScmJi_uFNmDfk1",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-4Jd4YjGjIL",
"timestamp": "2026-03-26T12:55:04.731Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "gpJGGWJgrauUlp",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по счету 97",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "UD7e9LmcNMMab6",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,136 @@
{
"run_id": "eval-7awjIpz1KA",
"timestamp": "2026-03-26T15:06:11.446Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 3
},
"cases_total": 3,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 33.33,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 33.33,
"routed_fragment_rate": 33.33,
"no_route_fragment_rate": 66.67,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 3,
"checks_passed": 3
},
"route_distribution": {
"store_feature_risk": 1,
"no_route": 2
},
"fallback_distribution": {
"none": 1,
"out_of_scope": 1,
"clarification": 1
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "l3XCZBVJI8DMb6",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Как вообще по ФСБУ",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 1,
"unclear_fragments": 0,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "out_of_scope",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "tmM6DRRQ-FM6Ef",
"request_count_for_case": 0
},
{
"case_id": "BQ-003",
"raw_question": "Покажи топ рисков за июнь 2020",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 0,
"unclear_fragments": 1,
"fallback_type": "clarification",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "insufficient_specificity",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "xrLAKFpmR0k_fB",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,135 @@
{
"run_id": "eval-86xJU1J7RH",
"timestamp": "2026-03-26T12:55:30.782Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 3
},
"cases_total": 3,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 33.33,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 33.33,
"routed_fragment_rate": 33.33,
"no_route_fragment_rate": 66.67,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 3,
"checks_passed": 3
},
"route_distribution": {
"store_feature_risk": 1,
"no_route": 2
},
"fallback_distribution": {
"none": 1,
"out_of_scope": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "BCfZGlUf_7WrzX",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Как вообще по ФСБУ",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 1,
"unclear_fragments": 0,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "out_of_scope",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "hDwHzKUEvGv_WH",
"request_count_for_case": 0
},
{
"case_id": "BQ-003",
"raw_question": "Покажи топ рисков за июнь 2020",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 0,
"unclear_fragments": 1,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "insufficient_specificity",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "7ZV0K25MkOXdrL",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-8HiKD7tzkR",
"timestamp": "2026-03-26T14:55:02.675Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "nAAz2ScCcfd7ML",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по счету 97",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "fhy4j7cjoB2lTX",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-94ypeytPFD",
"timestamp": "2026-03-26T14:48:14.207Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "JWodHRnyHMbmXF",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по счету 97",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "Pxo_T9ekuyP7k_",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-9V1JK90r5M",
"timestamp": "2026-03-26T12:55:31.352Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "v2KqGYhtmtYcgC",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по счету 97",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "cL_lvtjUZLKwXk",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,136 @@
{
"run_id": "eval-AKJhoo1T06",
"timestamp": "2026-03-26T14:50:28.656Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 3
},
"cases_total": 3,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 33.33,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 33.33,
"routed_fragment_rate": 33.33,
"no_route_fragment_rate": 66.67,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 3,
"checks_passed": 3
},
"route_distribution": {
"store_feature_risk": 1,
"no_route": 2
},
"fallback_distribution": {
"none": 1,
"out_of_scope": 1,
"clarification": 1
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "Pz3SajCDpZUJZi",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Как вообще по ФСБУ",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 1,
"unclear_fragments": 0,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "out_of_scope",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "s8NQvLo0jtmRMg",
"request_count_for_case": 0
},
{
"case_id": "BQ-003",
"raw_question": "Покажи топ рисков за июнь 2020",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 0,
"unclear_fragments": 1,
"fallback_type": "clarification",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "insufficient_specificity",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "SmjVXdYHtAfh78",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,135 @@
{
"run_id": "eval-DBE79isARo",
"timestamp": "2026-03-26T13:16:13.683Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 3
},
"cases_total": 3,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 33.33,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 33.33,
"routed_fragment_rate": 33.33,
"no_route_fragment_rate": 66.67,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 3,
"checks_passed": 3
},
"route_distribution": {
"store_feature_risk": 1,
"no_route": 2
},
"fallback_distribution": {
"none": 1,
"out_of_scope": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "xNtCuag50SXqLa",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Как вообще по ФСБУ",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 1,
"unclear_fragments": 0,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "out_of_scope",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "N7w7OEJH1AZQVj",
"request_count_for_case": 0
},
{
"case_id": "BQ-003",
"raw_question": "Покажи топ рисков за июнь 2020",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 0,
"unclear_fragments": 1,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "insufficient_specificity",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "Sh64q_20hIUmH0",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-ENsXL6tmGP",
"timestamp": "2026-03-26T14:54:03.188Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "233Vomq5bQ3Etj",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по счету 97",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "1hNbLs5GT0Zb7w",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-HK1iqqUDF3",
"timestamp": "2026-03-26T14:55:02.664Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "-_6kti9x1EAUA0",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по НДС и по закрытию",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "X1KFs_uEDRAn70",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-I9rx4KE5gR",
"timestamp": "2026-03-26T15:06:12.667Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "u_fv2077U9ibdi",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по НДС и по закрытию",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "X6i-Ez8mtzYyK2",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,136 @@
{
"run_id": "eval-ILx0ihm-Lw",
"timestamp": "2026-03-26T14:26:14.249Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 3
},
"cases_total": 3,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 33.33,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 33.33,
"routed_fragment_rate": 33.33,
"no_route_fragment_rate": 66.67,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 3,
"checks_passed": 3
},
"route_distribution": {
"store_feature_risk": 1,
"no_route": 2
},
"fallback_distribution": {
"none": 1,
"out_of_scope": 1,
"clarification": 1
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "gdQmHxalu1cmCy",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Как вообще по ФСБУ",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 1,
"unclear_fragments": 0,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "out_of_scope",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "GvpfTlrn6ZR1Z6",
"request_count_for_case": 0
},
{
"case_id": "BQ-003",
"raw_question": "Покажи топ рисков за июнь 2020",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 0,
"unclear_fragments": 1,
"fallback_type": "clarification",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "insufficient_specificity",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "OUOpu8YXjUFck7",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-LYEATvWSQn",
"timestamp": "2026-03-26T14:37:56.811Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "OGz5dN9-MYy1G8",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по НДС и по закрытию",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "UMEWQM9gAsokmI",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-NdHWJrLVPY",
"timestamp": "2026-03-26T12:55:31.352Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "oc1AOWxwStd60g",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по НДС и по закрытию",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "Aipiy0hn6HL2uY",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-O360CBYk6a",
"timestamp": "2026-03-26T14:59:11.293Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "eIdR0sgRZ_tBSt",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по счету 97",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "5tQorln1n1S3yD",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,135 @@
{
"run_id": "eval-OP8Go8cQU9",
"timestamp": "2026-03-26T13:14:25.915Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 3
},
"cases_total": 3,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 33.33,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 33.33,
"routed_fragment_rate": 33.33,
"no_route_fragment_rate": 66.67,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 3,
"checks_passed": 3
},
"route_distribution": {
"store_feature_risk": 1,
"no_route": 2
},
"fallback_distribution": {
"none": 1,
"out_of_scope": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "PO59mE6XvvZ3k-",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Как вообще по ФСБУ",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 1,
"unclear_fragments": 0,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "out_of_scope",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "8Z1GlyNzIH8r6S",
"request_count_for_case": 0
},
{
"case_id": "BQ-003",
"raw_question": "Покажи топ рисков за июнь 2020",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 0,
"unclear_fragments": 1,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "insufficient_specificity",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "gPRGR71mhUHYpy",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-PKpQU8_Mgt",
"timestamp": "2026-03-26T14:48:14.219Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "-nxxWJd40iccfx",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по НДС и по закрытию",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "cpoY1EIOWYgaZQ",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-Pm5Ja_gY1X",
"timestamp": "2026-03-26T12:55:35.823Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "7aEzzfvNPh5iIJ",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по НДС и по закрытию",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "hEQpwfmK5kkJ1R",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-Pwz7xRnbx0",
"timestamp": "2026-03-26T12:55:35.822Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "sQv-L0-0hVtFHA",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по счету 97",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "zmI3zT9qCn5yAi",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-QFGmXAHVxo",
"timestamp": "2026-03-26T13:16:30.497Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "kLuH1Ts0Ndt7zi",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по НДС и по закрытию",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "uPTFOfSJRHcZEH",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-QTRrUNA92U",
"timestamp": "2026-03-26T13:16:30.501Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "0jy-9_PxHtWE9o",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по счету 97",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "FCWOVWFpJeo4Hg",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,136 @@
{
"run_id": "eval-QnXjMWuNG4",
"timestamp": "2026-03-26T14:54:02.477Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 3
},
"cases_total": 3,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 33.33,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 33.33,
"routed_fragment_rate": 33.33,
"no_route_fragment_rate": 66.67,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 3,
"checks_passed": 3
},
"route_distribution": {
"store_feature_risk": 1,
"no_route": 2
},
"fallback_distribution": {
"none": 1,
"out_of_scope": 1,
"clarification": 1
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "sI0z4HgYaC0hwE",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Как вообще по ФСБУ",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 1,
"unclear_fragments": 0,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "out_of_scope",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "O68FZ2L5CkIkRC",
"request_count_for_case": 0
},
{
"case_id": "BQ-003",
"raw_question": "Покажи топ рисков за июнь 2020",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 0,
"unclear_fragments": 1,
"fallback_type": "clarification",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "insufficient_specificity",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "BcvCxZlN6oRSPt",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-T5ubfTDRvc",
"timestamp": "2026-03-26T15:04:48.689Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "5FUopKIDNLH9x3",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по НДС и по закрытию",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "tiSeEFAbOlH4HW",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,136 @@
{
"run_id": "eval-UrYNuoIte3",
"timestamp": "2026-03-26T15:04:46.343Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 3
},
"cases_total": 3,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 33.33,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 33.33,
"routed_fragment_rate": 33.33,
"no_route_fragment_rate": 66.67,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 3,
"checks_passed": 3
},
"route_distribution": {
"store_feature_risk": 1,
"no_route": 2
},
"fallback_distribution": {
"none": 1,
"out_of_scope": 1,
"clarification": 1
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "Ky7rKv9Qi2LzoJ",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Как вообще по ФСБУ",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 1,
"unclear_fragments": 0,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "out_of_scope",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "EZtK3enHi9g_LM",
"request_count_for_case": 0
},
{
"case_id": "BQ-003",
"raw_question": "Покажи топ рисков за июнь 2020",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 0,
"unclear_fragments": 1,
"fallback_type": "clarification",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "insufficient_specificity",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "Wu0Y_-rXFh68tp",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,135 @@
{
"run_id": "eval-VBCi_Qxr5Y",
"timestamp": "2026-03-26T13:16:29.901Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 3
},
"cases_total": 3,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 33.33,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 33.33,
"routed_fragment_rate": 33.33,
"no_route_fragment_rate": 66.67,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 3,
"checks_passed": 3
},
"route_distribution": {
"store_feature_risk": 1,
"no_route": 2
},
"fallback_distribution": {
"none": 1,
"out_of_scope": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "7wuLnq4BH3piOu",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Как вообще по ФСБУ",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 1,
"unclear_fragments": 0,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "out_of_scope",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "73aohob638bDuA",
"request_count_for_case": 0
},
{
"case_id": "BQ-003",
"raw_question": "Покажи топ рисков за июнь 2020",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 0,
"unclear_fragments": 1,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "insufficient_specificity",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "WjiJJxc9QvCpvV",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,136 @@
{
"run_id": "eval-WeSd-iJ_P_",
"timestamp": "2026-03-26T14:55:01.958Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 3
},
"cases_total": 3,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 33.33,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 33.33,
"routed_fragment_rate": 33.33,
"no_route_fragment_rate": 66.67,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 3,
"checks_passed": 3
},
"route_distribution": {
"store_feature_risk": 1,
"no_route": 2
},
"fallback_distribution": {
"none": 1,
"out_of_scope": 1,
"clarification": 1
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "9kfhdQNrIURTjL",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Как вообще по ФСБУ",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 1,
"unclear_fragments": 0,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "out_of_scope",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "Ph7s_xUUKPC96z",
"request_count_for_case": 0
},
{
"case_id": "BQ-003",
"raw_question": "Покажи топ рисков за июнь 2020",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 0,
"unclear_fragments": 1,
"fallback_type": "clarification",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "insufficient_specificity",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "hdlurNdtueZ0DX",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-X1q04xpajb",
"timestamp": "2026-03-26T13:14:26.541Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "G86usNUuaWpz69",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по счету 97",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "9r87IyEYbgT-dl",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-ahKw2z3DNd",
"timestamp": "2026-03-26T13:16:14.290Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "R4oqbuO7AZVCD6",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по НДС и по закрытию",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "TDIxd1uZGRFF40",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,136 @@
{
"run_id": "eval-awkhM6kKyT",
"timestamp": "2026-03-26T14:29:32.762Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 3
},
"cases_total": 3,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 33.33,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 33.33,
"routed_fragment_rate": 33.33,
"no_route_fragment_rate": 66.67,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 3,
"checks_passed": 3
},
"route_distribution": {
"store_feature_risk": 1,
"no_route": 2
},
"fallback_distribution": {
"none": 1,
"out_of_scope": 1,
"clarification": 1
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "Yv3aYwxmxBlAyJ",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Как вообще по ФСБУ",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 1,
"unclear_fragments": 0,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "out_of_scope",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "XRBd2kl7qFE8aG",
"request_count_for_case": 0
},
{
"case_id": "BQ-003",
"raw_question": "Покажи топ рисков за июнь 2020",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 0,
"unclear_fragments": 1,
"fallback_type": "clarification",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "insufficient_specificity",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "5yW64FSU_IjWOY",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-br2XpjV0IF",
"timestamp": "2026-03-26T14:54:03.186Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "DIyGK1CkPnPicG",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по НДС и по закрытию",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "qrabd_Br0Ccs15",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,135 @@
{
"run_id": "eval-coE2JZYmvV",
"timestamp": "2026-03-26T12:55:03.368Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 3
},
"cases_total": 3,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 33.33,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 33.33,
"routed_fragment_rate": 33.33,
"no_route_fragment_rate": 66.67,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 3,
"checks_passed": 3
},
"route_distribution": {
"store_feature_risk": 1,
"no_route": 2
},
"fallback_distribution": {
"none": 1,
"out_of_scope": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "XuSyxnPhZ7P_PW",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Как вообще по ФСБУ",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 1,
"unclear_fragments": 0,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "out_of_scope",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "4sdBIrJ3fn5aqX",
"request_count_for_case": 0
},
{
"case_id": "BQ-003",
"raw_question": "Покажи топ рисков за июнь 2020",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 0,
"unclear_fragments": 1,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "insufficient_specificity",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "9YHZELcCqUbAzQ",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,136 @@
{
"run_id": "eval-csS0kkZmz9",
"timestamp": "2026-03-26T17:04:54.186Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 3
},
"cases_total": 3,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 33.33,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 33.33,
"routed_fragment_rate": 33.33,
"no_route_fragment_rate": 66.67,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 3,
"checks_passed": 3
},
"route_distribution": {
"store_feature_risk": 1,
"no_route": 2
},
"fallback_distribution": {
"none": 1,
"out_of_scope": 1,
"clarification": 1
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "YAW8P_gnmEHbmA",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Как вообще по ФСБУ",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 1,
"unclear_fragments": 0,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "out_of_scope",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "wq0q7oxcCBNkY6",
"request_count_for_case": 0
},
{
"case_id": "BQ-003",
"raw_question": "Покажи топ рисков за июнь 2020",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 0,
"unclear_fragments": 1,
"fallback_type": "clarification",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "insufficient_specificity",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "vbjwJRq69VeHm5",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-dtpCo4sMdc",
"timestamp": "2026-03-26T15:06:12.670Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "wFRYwzhrxkIZ9l",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по счету 97",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "1xHDWvJ7ZrJbB5",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-e-1UopyweC",
"timestamp": "2026-03-26T13:16:14.288Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "5mHNGyM4epF9AB",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по счету 97",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "unZlAEc3PGwu8l",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-fNnAM4rRXw",
"timestamp": "2026-03-26T14:26:14.869Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "2SO3wpAWN_CpSC",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по НДС и по закрытию",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "9YxDVbJdguj2rx",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-fTyrcI8ATt",
"timestamp": "2026-03-26T13:14:26.535Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "Ynl9X1eX9phUX_",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по НДС и по закрытию",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "JYnY80AE-3YW0R",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-kZZ5rtfv4X",
"timestamp": "2026-03-26T14:26:14.890Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "o6eQLf5SO1Gf2p",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по счету 97",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "HKJlZl2REXlhIa",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,136 @@
{
"run_id": "eval-nCABoTHSuw",
"timestamp": "2026-03-26T14:37:55.515Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 3
},
"cases_total": 3,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 33.33,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 33.33,
"routed_fragment_rate": 33.33,
"no_route_fragment_rate": 66.67,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 3,
"checks_passed": 3
},
"route_distribution": {
"store_feature_risk": 1,
"no_route": 2
},
"fallback_distribution": {
"none": 1,
"out_of_scope": 1,
"clarification": 1
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "Bcb_BEMUY56h-G",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Как вообще по ФСБУ",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 1,
"unclear_fragments": 0,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "out_of_scope",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "8hm25X41b9Fm46",
"request_count_for_case": 0
},
{
"case_id": "BQ-003",
"raw_question": "Покажи топ рисков за июнь 2020",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 0,
"unclear_fragments": 1,
"fallback_type": "clarification",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "insufficient_specificity",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "ouqPxsmQmXYeoB",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,135 @@
{
"run_id": "eval-qo49IgK8zT",
"timestamp": "2026-03-26T12:55:34.412Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 3
},
"cases_total": 3,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 33.33,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 33.33,
"routed_fragment_rate": 33.33,
"no_route_fragment_rate": 66.67,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 3,
"checks_passed": 3
},
"route_distribution": {
"store_feature_risk": 1,
"no_route": 2
},
"fallback_distribution": {
"none": 1,
"out_of_scope": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "wKoH1Fcro5huCS",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Как вообще по ФСБУ",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 1,
"unclear_fragments": 0,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "out_of_scope",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "oFA0_O1Dd1EEnA",
"request_count_for_case": 0
},
{
"case_id": "BQ-003",
"raw_question": "Покажи топ рисков за июнь 2020",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 0,
"unclear_fragments": 1,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "insufficient_specificity",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "2HHlXRriLdK-hF",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-rgZvv7GVlh",
"timestamp": "2026-03-26T14:50:29.357Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "DFhLxldnnEi813",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по счету 97",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "THV16s3zq8ch9N",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-uno0zByXlr",
"timestamp": "2026-03-26T14:59:11.292Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "tFalCYEY3jS79i",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по НДС и по закрытию",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "hEK6K6QrsUgnnu",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,136 @@
{
"run_id": "eval-vBRt4_zi8o",
"timestamp": "2026-03-26T14:48:13.117Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 3
},
"cases_total": 3,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 33.33,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 33.33,
"routed_fragment_rate": 33.33,
"no_route_fragment_rate": 66.67,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 3,
"checks_passed": 3
},
"route_distribution": {
"store_feature_risk": 1,
"no_route": 2
},
"fallback_distribution": {
"none": 1,
"out_of_scope": 1,
"clarification": 1
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "jcgq3Vum-vmvtb",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Как вообще по ФСБУ",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 1,
"unclear_fragments": 0,
"fallback_type": "out_of_scope",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "out_of_scope",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "zb3PnxSTYYCKtT",
"request_count_for_case": 0
},
{
"case_id": "BQ-003",
"raw_question": "Покажи топ рисков за июнь 2020",
"validation_passed": true,
"message_in_scope": false,
"scope_confidence": "low",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 0,
"out_of_scope_fragments": 0,
"unclear_fragments": 1,
"fallback_type": "clarification",
"predicted_route_status": "no_route",
"expected_route_status": null,
"predicted_no_route_reason": "insufficient_specificity",
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 0,
"trace_id": "07LQK4g2WMJmJM",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-vrtzSgGK5J",
"timestamp": "2026-03-26T14:37:56.801Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "sUs-83-P-FBJC5",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по счету 97",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "nf7HAY_H_yIRz8",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-wO0GIYPaNz",
"timestamp": "2026-03-26T12:55:04.737Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "ozP1_JjXjrXdNb",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по НДС и по закрытию",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "gKP8y15WrKRjM6",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-wPEGrtVlnN",
"timestamp": "2026-03-26T17:04:55.445Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "9RsDbvSEMVzwy9",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по НДС и по закрытию",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "gDKgk0Yj_UhrIX",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-wnexicBv1L",
"timestamp": "2026-03-26T17:04:55.449Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "LChjIRCQvUVBJI",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по счету 97",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "jBg8Projzr0MZ0",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-xwO4M4uHeu",
"timestamp": "2026-03-26T14:50:29.376Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "Eh2F-VtWbDijzY",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по НДС и по закрытию",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "4-DS8kKXsGNGSv",
"request_count_for_case": 0
}
]
}

View File

@ -0,0 +1,111 @@
{
"run_id": "eval-ynin-rwW6N",
"timestamp": "2026-03-26T14:29:33.467Z",
"mode": "single-pass-strict",
"use_mock": true,
"prompt_version": "normalizer_v2_0_2",
"schema_version": "v2_0_2",
"dataset": {
"source": "inline_raw_questions",
"file": null,
"raw_questions_count": 2
},
"cases_total": 2,
"metrics": {
"schema_validation_pass_rate": 100,
"scope_detection_accuracy": null,
"scope_in_scope_rate": 100,
"multi_intent_detected_rate": 0,
"clarification_required_rate": 0,
"avg_fragments_per_message": 1,
"out_of_scope_fragment_rate": 0,
"routed_fragment_rate": 100,
"no_route_fragment_rate": 0,
"route_resolution_accuracy": null,
"no_route_precision": null,
"false_no_route_rate": null,
"execution_state_consistency_rate": 100,
"executable_with_soft_assumptions_rate": 100,
"soft_assumption_used_fragment_rate": 100,
"clarification_precision": null,
"clarification_recall": null,
"false_clarification_rate": null
},
"budget": {
"requests_total": 0,
"retries_used": 0
},
"clarification_eval": {
"labeled_cases": 0,
"true_positive": 0,
"false_positive": 0,
"false_negative": 0
},
"route_eval": {
"labeled_cases": 0,
"correct_cases": 0,
"expected_routed_cases": 0,
"no_route_true_positive": 0,
"no_route_false_positive": 0
},
"scope_eval": {
"labeled_cases": 0,
"correct_cases": 0
},
"execution_state_eval": {
"checks_total": 2,
"checks_passed": 2
},
"route_distribution": {
"store_feature_risk": 2
},
"fallback_distribution": {
"none": 2
},
"results": [
{
"case_id": "BQ-001",
"raw_question": "Проверь счет 60 за июнь 2020",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "491D4elziSwn2k",
"request_count_for_case": 0
},
{
"case_id": "BQ-002",
"raw_question": "Покажи риски по НДС и по закрытию",
"validation_passed": true,
"message_in_scope": true,
"scope_confidence": "high",
"contains_multiple_tasks": false,
"fragments_total": 1,
"in_scope_fragments": 1,
"out_of_scope_fragments": 0,
"unclear_fragments": 0,
"fallback_type": "none",
"predicted_route_status": "routed",
"expected_route_status": null,
"predicted_no_route_reason": null,
"expected_no_route_reason": null,
"predicted_clarification_required": false,
"expected_clarification_required": null,
"executable_with_soft_assumptions_fragments": 1,
"trace_id": "_iYyGaR7yizo7k",
"request_count_for_case": 0
}
]
}

View File

@ -1,29 +1,52 @@
# Run Folders
# Run Folders
Эта папка используется для чистого хранения артефактов каждого прогона.
Эта папка используется для хранения артефактов каждой отдельной волны.
Формат:
- `docs/runs/YYYY-MM-DD_HH-mm-ss[_label]/`
- внутри:
- `reports/`
- `logs/traces/`
- `logs/assistant_sessions/`
- `manifest.json`
## Обязательный формат имени run-папки
Запуск архивации:
- `docs/runs/YYYY-MM-DD_Stage_<NN>_Wave_<NN>_<short_topic>/`
Правило порядка строгое:
- после даты всегда идет `Stage`;
- после `Stage` всегда идет `Wave`;
- затем краткая тема волны.
Пример:
- `docs/runs/2026-03-26_Stage_04_Wave_01_Kickoff/`
## Обязательная структура внутри run-папки
- `README.md` — что проверяли и зачем;
- `run_summary.json` — команды, результаты, ключевые ссылки;
- `artifacts/` — отчеты прогонов (test/eval/acceptance/regression);
- `prompt_dialogs/` — диалоги user/system/assistant и runtime-контекст.
## Обязательная структура `prompt_dialogs`
- `prompt_dialogs/index.json`
- `prompt_dialogs/<suite>/<case_id>.json`
- `prompt_dialogs/<suite>/<case_id>.md`
Минимум по каждому кейсу:
- вопрос пользователя;
- ответ системы (assistant reply);
- технический контекст, доступный для анализа (debug/runtime/decomposition/grounding, если есть).
## Важное правило по волнам
Артефакты разных волн нельзя смешивать в одной папке.
Каждая волна должна иметь собственную run-папку и собственный набор `prompt_dialogs`.
## Архивация
```bash
npm run artifacts:bundle
```
Архивация с очисткой исходных логов/генерируемых отчетов:
```bash
npm run artifacts:bundle:clean
```
С меткой захода:
```bash
npm run artifacts:bundle:clean -- --label wave2_followup
npm run artifacts:bundle:clean -- --label stage4_wave1
```

View File

@ -0,0 +1,302 @@
{
"suite_id": "assistant_stage3_lifecycle_probe",
"suite_version": "0.1.0",
"schema_version": "assistant_stage3_lifecycle_probe_v0_1",
"scenario_count": 9,
"case_ids": [
"S3-51-WRONG-CLOSE-TYPE",
"S3-60-PAYMENT-WITHOUT-CLOSURE",
"S3-97-STALLED-NODES",
"S3-97-EXPECTED-VS-ACTUAL",
"S3-OS-BRANCH-DIVERGENCE",
"S3-OS-TERMINAL-GAP",
"S3-VAT-CROSS-BRANCH-CONFLICT",
"S3-VAT-ACTUAL-VS-EXPECTED",
"S3-PERIOD-CLOSE-LIFECYCLE-IMPACT"
],
"cases": [
{
"case_id": "S3-51-WRONG-CLOSE-TYPE",
"scenario_tag": "51_wrong_closing_document_type",
"question_type": "direct",
"broadness_level": "medium",
"turns": [
{
"user_message": "Проверь по счёту 51 за июнь 2020, где контур закрыт не тем типом документа и какой ожидаемый завершающий переход не подтверждён."
}
],
"expected_hints": {
"expected_reply_type": "partial_coverage",
"expected_degraded_to": "partial",
"expected_problem_first": true,
"expected_problem_unit_types": [
"lifecycle_anomaly_node",
"document_conflict"
],
"expected_lifecycle_domain": "bank_settlement",
"require_current_expected_state_pair": true,
"require_missing_or_invalid_transition": true,
"require_wrong_closing_document_type": true,
"require_lifecycle_mode": "stage3_lifecycle_aware_v1"
},
"lifecycle_focus": {
"domain": "51_60",
"targets": [
"expected_vs_actual_state",
"missing_transition",
"wrong_closing_document_type"
]
}
},
{
"case_id": "S3-60-PAYMENT-WITHOUT-CLOSURE",
"scenario_tag": "60_payment_exists_but_not_closed",
"question_type": "direct",
"broadness_level": "medium",
"turns": [
{
"user_message": "По поставщикам по счёту 60 за июнь 2020 покажи, где оплата есть, но lifecycle расчёта не дошёл до ожидаемого закрывающего документа."
}
],
"expected_hints": {
"expected_reply_type": "partial_coverage",
"expected_degraded_to": "partial",
"expected_problem_first": true,
"expected_problem_unit_types": [
"lifecycle_anomaly_node",
"unresolved_settlement_cluster"
],
"expected_lifecycle_domain": "bank_settlement",
"require_current_expected_state_pair": true,
"require_missing_or_invalid_transition": true,
"require_previous_states": true,
"require_lifecycle_mode": "stage3_lifecycle_aware_v1"
},
"lifecycle_focus": {
"domain": "51_60",
"targets": [
"expected_vs_actual_state",
"missing_transition",
"resolved_previous_states"
]
}
},
{
"case_id": "S3-97-STALLED-NODES",
"scenario_tag": "97_stalled_nodes",
"question_type": "direct",
"broadness_level": "medium",
"turns": [
{
"user_message": "Проверь по счёту 97 за июнь 2020 по документам, где зависли узлы жизненного цикла: какие стадии уже пройдены и какой ожидаемый переход до списания отсутствует."
}
],
"expected_hints": {
"expected_reply_type": "partial_coverage",
"expected_degraded_to": "partial",
"expected_problem_first": true,
"expected_problem_unit_types": [
"lifecycle_anomaly_node",
"broken_chain_segment"
],
"expected_lifecycle_domain": "deferred_expense",
"require_current_expected_state_pair": true,
"require_missing_or_invalid_transition": true,
"require_lifecycle_mode": "stage3_lifecycle_aware_v1"
},
"lifecycle_focus": {
"domain": "97",
"targets": [
"expected_vs_actual_state",
"missing_transition"
]
}
},
{
"case_id": "S3-97-EXPECTED-VS-ACTUAL",
"scenario_tag": "97_expected_vs_actual_sequence",
"question_type": "direct",
"broadness_level": "medium",
"turns": [
{
"user_message": "Проверь по счёту 97 за июнь 2020 по документам, где фактическое состояние расходов будущих периодов расходится с ожидаемой последовательностью списания."
}
],
"expected_hints": {
"expected_reply_type": "partial_coverage",
"expected_degraded_to": "partial",
"expected_problem_first": true,
"expected_problem_unit_types": [
"lifecycle_anomaly_node",
"period_risk_cluster"
],
"expected_lifecycle_domain": "deferred_expense",
"require_current_expected_state_pair": true,
"require_terminal_state_mismatch": true,
"require_lifecycle_mode": "stage3_lifecycle_aware_v1"
},
"lifecycle_focus": {
"domain": "97",
"targets": [
"expected_vs_actual_state",
"terminal_state_mismatch"
]
}
},
{
"case_id": "S3-OS-BRANCH-DIVERGENCE",
"scenario_tag": "os_card_document_depreciation_divergence",
"question_type": "direct",
"broadness_level": "medium",
"turns": [
{
"user_message": "По основным средствам по счетам 01 и 02 за июнь 2020 покажи, где lifecycle объекта расходится между карточкой, документом и начислением амортизации."
}
],
"expected_hints": {
"expected_reply_type": "partial_coverage",
"expected_degraded_to": "partial",
"expected_problem_first": true,
"expected_problem_unit_types": [
"lifecycle_anomaly_node",
"cross_branch_inconsistency_cluster"
],
"expected_lifecycle_domain": "fixed_asset",
"require_current_expected_state_pair": true,
"require_cross_branch_conflict": true,
"require_lifecycle_mode": "stage3_lifecycle_aware_v1"
},
"lifecycle_focus": {
"domain": "fixed_asset",
"targets": [
"expected_vs_actual_state",
"cross_branch_lifecycle_conflict"
]
}
},
{
"case_id": "S3-OS-TERMINAL-GAP",
"scenario_tag": "os_previous_or_terminal_gap",
"question_type": "direct",
"broadness_level": "medium",
"turns": [
{
"user_message": "Где по ОС по счетам 01/02 за июнь 2020 видно, что цепочка дошла до начисления, но не подтверждён ожидаемый предыдущий или завершающий этап?"
}
],
"expected_hints": {
"expected_reply_type": "partial_coverage",
"expected_degraded_to": "partial",
"expected_problem_first": true,
"expected_problem_unit_types": [
"lifecycle_anomaly_node",
"document_conflict"
],
"expected_lifecycle_domain": "fixed_asset",
"require_current_expected_state_pair": true,
"require_terminal_state_mismatch": true,
"require_lifecycle_mode": "stage3_lifecycle_aware_v1"
},
"lifecycle_focus": {
"domain": "fixed_asset",
"targets": [
"resolved_previous_states",
"terminal_state_mismatch"
]
}
},
{
"case_id": "S3-VAT-CROSS-BRANCH-CONFLICT",
"scenario_tag": "vat_cross_branch_conflict",
"question_type": "direct",
"broadness_level": "medium",
"turns": [
{
"user_message": "По НДС за июнь 2020 по счёту 68 покажи, где lifecycle ветвей документов, проводок и регистров расходится и какая ветка выглядит несогласованной."
}
],
"expected_hints": {
"expected_reply_type": "partial_coverage",
"expected_degraded_to": "partial",
"expected_problem_first": true,
"expected_problem_unit_types": [
"lifecycle_anomaly_node",
"cross_branch_inconsistency_cluster"
],
"expected_lifecycle_domain": "vat_flow",
"require_current_expected_state_pair": true,
"require_cross_branch_conflict": true,
"require_lifecycle_mode": "stage3_lifecycle_aware_v1"
},
"lifecycle_focus": {
"domain": "vat_flow",
"targets": [
"expected_vs_actual_state",
"cross_branch_lifecycle_conflict"
]
}
},
{
"case_id": "S3-VAT-ACTUAL-VS-EXPECTED",
"scenario_tag": "vat_actual_vs_expected_state",
"question_type": "direct",
"broadness_level": "medium",
"turns": [
{
"user_message": "Где по НДС за июнь 2020 по счетам 19 и 68 есть конфликт фактического и ожидаемого состояния между документом и регистрами?"
}
],
"expected_hints": {
"expected_reply_type": "partial_coverage",
"expected_degraded_to": "partial",
"expected_problem_first": true,
"expected_problem_unit_types": [
"lifecycle_anomaly_node",
"document_conflict"
],
"expected_lifecycle_domain": "vat_flow",
"require_current_expected_state_pair": true,
"require_terminal_state_mismatch": true,
"require_lifecycle_mode": "stage3_lifecycle_aware_v1"
},
"lifecycle_focus": {
"domain": "vat_flow",
"targets": [
"expected_vs_actual_state",
"terminal_state_mismatch"
]
}
},
{
"case_id": "S3-PERIOD-CLOSE-LIFECYCLE-IMPACT",
"scenario_tag": "period_close_lifecycle_impact",
"question_type": "direct",
"broadness_level": "high",
"turns": [
{
"user_message": "Какие lifecycle-дефекты по счетам 51 и 60 за июнь 2020 сильнее всего влияют на риск закрытия периода (period close) и на каком переходе возникает разрыв?"
}
],
"expected_hints": {
"expected_reply_type": "clarification_required",
"expected_degraded_to": "clarification",
"expected_problem_first": true,
"expected_problem_unit_types": [
"period_risk_cluster",
"lifecycle_anomaly_node"
],
"expected_lifecycle_domain": "period_close",
"require_missing_or_invalid_transition": true,
"require_period_close_impact": true,
"require_lifecycle_mode": "stage3_lifecycle_aware_v1"
},
"lifecycle_focus": {
"domain": "period_close",
"targets": [
"lifecycle_impact_period_close",
"missing_transition"
]
}
}
]
}

View File

@ -0,0 +1 @@
Проверка UTF-8 кириллицы

View File

@ -0,0 +1 @@
???????? UTF-8 ?????????