Stage 3: улучшена логика жизненного цикла и очищены ответы ассистента
This commit is contained in:
parent
d0b842adb0
commit
914843a8ba
|
|
@ -0,0 +1,272 @@
|
|||
# ACCEPTANCE_CHECKLIST_STAGE_04
|
||||
|
||||
## Назначение документа
|
||||
|
||||
Этот документ используется для приёмки реализации Stage 4.
|
||||
Его задача — проверить, что graph core внедрён как рабочий runtime-слой, а не как формальная схема.
|
||||
|
||||
Документ обязателен для:
|
||||
- Codex;
|
||||
- разработчика;
|
||||
- ручного review;
|
||||
- финальной фиксации Stage 4.
|
||||
|
||||
---
|
||||
|
||||
## Статус документа
|
||||
|
||||
- Статус: чеклист приёмки Stage 4
|
||||
- Язык: русский
|
||||
- Режим использования: обязателен при завершении каждой волны и при финальной приёмке Stage 4
|
||||
- При конфликте по scope приоритет имеет `STAGE_04_TASK_CARD.md`
|
||||
- При конфликте по архитектурным ограничениям приоритет имеет `ARCHITECTURE_GUARDRAILS.md`
|
||||
- При конфликте по platform logic приоритет имеет `TZ_Platform_Core_Accounting_Assistant_Mode.md`
|
||||
|
||||
---
|
||||
|
||||
## Правила оценки
|
||||
|
||||
Допустимые статусы:
|
||||
|
||||
- `PASS` — выполнено полностью
|
||||
- `PARTIAL` — выполнено частично, требуется доработка
|
||||
- `FAIL` — не выполнено
|
||||
- `N/A` — не применимо (только с явным обоснованием)
|
||||
|
||||
Для каждого пункта обязателен комментарий:
|
||||
- что проверялось;
|
||||
- где реализовано;
|
||||
- чем подтверждается;
|
||||
- какие ограничения остались.
|
||||
|
||||
---
|
||||
|
||||
## Общая логика приёмки
|
||||
|
||||
Stage 4 считается принятым только если одновременно выполнено:
|
||||
|
||||
1. Закрыт именно Stage 4, без скрытого выезда в Stage 5–6.
|
||||
2. Graph contracts реализованы и используются runtime.
|
||||
3. Graph traversal реально участвует в graph-eligible retrieval.
|
||||
4. Problem assembly использует graph connectivity.
|
||||
5. Lifecycle reasoning использует graph transitions.
|
||||
6. Answer layer использует graph-backed causal explanation.
|
||||
7. Есть benchmark/eval подтверждение value.
|
||||
8. Рабочий контур не разрушен.
|
||||
|
||||
---
|
||||
|
||||
# Блок A. Scope discipline
|
||||
|
||||
## A1. Реализован именно Stage 4
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## A2. Нет скрытого выезда в Stage 5
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## A3. Нет скрытого выезда в Stage 6
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## A4. Нет большого ненужного platform refactor
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
---
|
||||
|
||||
# Блок B. Graph model
|
||||
|
||||
## B1. Реализована schema `AccountingGraphNode`
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## B2. Реализована schema `AccountingGraphEdge`
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## B3. Внедрён `GraphSchemaRegistry`
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## B4. Узлы/связи имеют provenance/confidence
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## B5. Нет generic edges уровня `related_to` как основного механизма
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
---
|
||||
|
||||
# Блок C. Graph runtime
|
||||
|
||||
## C1. Реализован `GraphBuilder`
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## C2. Реализован `GraphTraversalPolicy`
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## C3. Реализован `GraphValidationLayer`
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## C4. Missing/conflicting links детектируются как runtime-сигналы
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## C5. Runtime устойчив к неполным данным
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
---
|
||||
|
||||
# Блок D. Интеграция слоёв
|
||||
|
||||
## D1. Planner поддерживает graph eligibility
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## D2. Execution использует typed graph traversal в graph-eligible запросах
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## D3. Problem assembly использует graph connectivity
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## D4. Lifecycle checks используют graph transitions
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## D5. Answer layer использует graph-backed causal path
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
---
|
||||
|
||||
# Блок E. Quality / eval
|
||||
|
||||
## E1. Добавлены unit tests для graph contracts/runtime
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## E2. Добавлены integration tests для planner/execution graph path
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## E3. Regression по Stage 2/Stage 3 не сломан
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## E4. Есть benchmark suite Stage 4
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## E5. Есть before/after value report
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
---
|
||||
|
||||
# Блок F. Observability / compatibility
|
||||
|
||||
## F1. Graph decisions и traversal диагностируемы
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## F2. Contracts и source of truth документированы
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## F3. Изменения совместимы с roadmap Stage 5
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## F4. Миграционная дисциплина соблюдена
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
---
|
||||
|
||||
# Блок G. Documentation completeness
|
||||
|
||||
## G1. Есть актуальный `STAGE_04_TASK_CARD.md`
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## G2. Есть acceptance mapping `изменение -> критерий`
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## G3. Есть explicit non-scope список
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## G4. Run-артефакты оформлены по стандарту `date -> Stage -> Wave`, включая `prompt_dialogs`
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
---
|
||||
|
||||
# Блок H. Финальное решение по этапу
|
||||
|
||||
## H1. Stage 4 можно считать принятым
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
## H2. Stage 4 нельзя считать принятым
|
||||
Статус:
|
||||
Комментарий:
|
||||
|
||||
---
|
||||
|
||||
# Итоговая сводка по приёмке
|
||||
|
||||
## Общий итог
|
||||
- Результат: `PASS / PARTIAL / FAIL`
|
||||
- Дата проверки:
|
||||
- Проверял:
|
||||
- Версия / ветка / commit:
|
||||
- Связанные документы:
|
||||
|
||||
## Ключевые сильные стороны
|
||||
1.
|
||||
2.
|
||||
3.
|
||||
|
||||
## Ключевые недочёты
|
||||
1.
|
||||
2.
|
||||
3.
|
||||
|
||||
## Что обязательно исправить до приёмки
|
||||
1.
|
||||
2.
|
||||
3.
|
||||
|
||||
## Что допустимо перенести в следующий этап
|
||||
1.
|
||||
2.
|
||||
3.
|
||||
|
||||
## Явно подтверждено как non-scope текущего этапа
|
||||
1.
|
||||
2.
|
||||
3.
|
||||
|
||||
## Финальное решение
|
||||
- `Принять Stage 4`
|
||||
- `Принять Stage 4 условно`
|
||||
- `Вернуть на доработку`
|
||||
|
||||
Комментарий:
|
||||
|
||||
---
|
||||
|
||||
## Короткая практическая формула
|
||||
|
||||
Stage 4 считается успешным тогда, когда graph layer реально работает в runtime и улучшает retrieval/problem/lifecycle/answer, а не только добавляет новую схему данных.
|
||||
|
|
@ -20,7 +20,7 @@
|
|||
- Статус: основной управляющий бриф для Codex
|
||||
- Язык: русский
|
||||
- Режим использования: обязателен к прочтению перед любыми изменениями в коде
|
||||
- При конфликте с рабочим scope текущей итерации приоритет имеет `STAGE_03_TASK_CARD.md`
|
||||
- При конфликте с рабочим scope текущей итерации приоритет имеет `STAGE_04_TASK_CARD.md`
|
||||
- При конфликте по архитектурным ограничениям приоритет имеет `TZ_Platform_Core_Accounting_Assistant_Mode.md`
|
||||
|
||||
---
|
||||
|
|
@ -37,29 +37,29 @@
|
|||
- возвращать ответ пользователю.
|
||||
|
||||
При этом текущая система ещё не является полноценным accountant-grade investigation copilot.
|
||||
На текущем переходе считаем этапы 1 и 2 выполненными и переходим к **Stage 3 / Lifecycle Formalization**.
|
||||
Stage 3 зафиксирован как завершённый (accepted), и текущий переход — **Stage 4 / Accounting Ontology Graph Core**.
|
||||
|
||||
Основные текущие ограничения, которые Stage 3 должен закрыть:
|
||||
Основные текущие ограничения, которые Stage 4 должен закрыть:
|
||||
|
||||
- lifecycle-семантика остаётся частично эвристической;
|
||||
- отсутствует формализованная модель допустимых состояний/переходов по ключевым доменам;
|
||||
- problem units недостаточно насыщены temporal и stage-based смыслом;
|
||||
- ranking по ряду классов вопросов всё ещё тяготеет к frequency/sum/entity сигналам;
|
||||
- ответы местами остаются на уровне generic lifecycle labels.
|
||||
- отсутствует единое graph-представление бухгалтерских сущностей и связей;
|
||||
- причинно-следственные цепочки до сих пор частично собираются эвристически;
|
||||
- missing/conflicting links не являются first-class runtime-объектами;
|
||||
- lifecycle/problem reasoning недостаточно использует структурную graph-связность;
|
||||
- cross-branch traversal и period-impact проверки ограничены локальными rule bundles.
|
||||
|
||||
---
|
||||
|
||||
## Цель работы Codex на текущей итерации
|
||||
|
||||
Codex должен помочь реализовать **только Stage 3**, не разрушая текущий рабочий контур и не подтягивая prematurely решения из следующих этапов.
|
||||
Codex должен помочь реализовать **только Stage 4**, не разрушая текущий рабочий контур и не подтягивая prematurely решения из следующих этапов.
|
||||
|
||||
Текущая цель:
|
||||
|
||||
- ввести формальную lifecycle-модель по целевым доменам Stage 3;
|
||||
- внедрить lifecycle runtime-компоненты и их использование в рабочем пути;
|
||||
- интегрировать lifecycle в problem units, ranking и answer synthesis;
|
||||
- ввести рабочее graph-ядро бухгалтерских сущностей и типизированных связей;
|
||||
- внедрить graph runtime-компоненты в retrieval/planning/problem assembly/lifecycle binding;
|
||||
- интегрировать graph-связность в reasoning и answer synthesis;
|
||||
- подтвердить полезность через domain-eval и before/after проверку;
|
||||
- не превращать текущий этап в скрытую реализацию Stage 4–6.
|
||||
- не превращать текущий этап в скрытую реализацию Stage 5–6.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -68,7 +68,7 @@ Codex должен помочь реализовать **только Stage 3**,
|
|||
При чтении и интерпретации материалов использовать следующий порядок приоритета.
|
||||
|
||||
### 1. Текущий рабочий scope
|
||||
- `03_execution/STAGE_03_TASK_CARD.md`
|
||||
- `03_execution/STAGE_04_TASK_CARD.md`
|
||||
|
||||
Это главный документ по тому, что делать прямо сейчас.
|
||||
|
||||
|
|
@ -84,15 +84,16 @@ Codex должен помочь реализовать **только Stage 3**,
|
|||
- security;
|
||||
- live bridge policy.
|
||||
|
||||
### 3. Детальное ТЗ третьего этапа
|
||||
### 3. Детальное ТЗ четвёртого этапа
|
||||
- `02_stages/TZ_Stage_4_Accounting_Ontology_Graph_Core_Assistant_Mode.md`
|
||||
|
||||
Этот документ определяет содержимое Stage 4.
|
||||
|
||||
### 4. Зависимости Stage 4
|
||||
- `02_stages/TZ_Stage_3_Lifecycle_Formalization_Assistant_Mode.md`
|
||||
|
||||
Этот документ определяет содержимое Stage 3.
|
||||
|
||||
### 4. Зависимости Stage 3
|
||||
- `02_stages/TZ_Stage_2_Retrieval_Unit_Shift_Assistant_Mode.md`
|
||||
|
||||
Stage 3 опирается на problem-centric слой Stage 2 и не должен его ломать.
|
||||
Stage 4 опирается на problem-centric слой Stage 2 и lifecycle слой Stage 3 и не должен их ломать.
|
||||
|
||||
### 5. Текущий статус и общая логика развития
|
||||
- `00_context/Assistant_Mode_GLOBAL_STATUS_2026-03-24.md`
|
||||
|
|
@ -103,11 +104,10 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
|
|||
Эти документы нужны для понимания:
|
||||
- что уже сделано;
|
||||
- где реальные потолки системы;
|
||||
- почему сейчас выполняется Stage 3;
|
||||
- как Stage 3 стыкуется с дальнейшими этапами.
|
||||
- почему сейчас выполняется Stage 4;
|
||||
- как Stage 4 стыкуется с дальнейшими этапами.
|
||||
|
||||
### 6. Этапы 4–6
|
||||
- `02_stages/TZ_Stage_4_...`
|
||||
### 6. Этапы 5–6
|
||||
- `02_stages/TZ_Stage_5_...`
|
||||
- `02_stages/TZ_Stage_6_...`
|
||||
|
||||
|
|
@ -122,18 +122,17 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
|
|||
|
||||
## Scope текущей итерации
|
||||
|
||||
Разрешено делать только то, что относится к Stage 3 и необходимо для его корректной реализации.
|
||||
Разрешено делать только то, что относится к Stage 4 и необходимо для его корректной реализации.
|
||||
|
||||
К текущему scope относятся:
|
||||
|
||||
- формализация lifecycle-доменов и lifecycle-сущностей Stage 3;
|
||||
- описание states/transitions/defects с привязкой к доступным evidence;
|
||||
- реализация runtime-слоя (`LifecycleRegistry`, `LifecycleResolver`, `LifecycleDefectClassifier`, `LifecycleEnricher`);
|
||||
- обновление `problem_unit_schema` lifecycle-полями;
|
||||
- интеграция lifecycle-факторов в ranking policy;
|
||||
- интеграция lifecycle-логики в answer policy;
|
||||
- lifecycle-aware тесты и benchmark контур по ключевым доменам;
|
||||
- before/after eval отчёт по продуктовой ценности Stage 3.
|
||||
- формализация graph-ядра (`AccountingGraphNode`, `AccountingGraphEdge`, typed relations);
|
||||
- реализация runtime-слоя (`GraphSchemaRegistry`, `GraphBuilder`, `GraphTraversalPolicy`, `GraphValidationLayer`);
|
||||
- интеграция graph-сигналов в retrieval planning/execution;
|
||||
- интеграция graph connectivity в problem assembly и lifecycle binding;
|
||||
- интеграция graph-based объяснений в answer policy;
|
||||
- graph-aware тесты и benchmark контур по ключевым доменам;
|
||||
- before/after eval отчёт по продуктовой ценности Stage 4.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -141,12 +140,13 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
|
|||
|
||||
На этой итерации нельзя фактически реализовывать как core-runtime следующие слои:
|
||||
|
||||
- полноразмерный ontology / graph runtime из Stage 4;
|
||||
- полноценный investigation orchestrator из Stage 5;
|
||||
- live verification runtime core и full product mode split из Stage 6;
|
||||
- полноразмерный enterprise-wide graph beyond accounting core Stage 4;
|
||||
- переезд на новую полную сервисную архитектуру;
|
||||
- переписывание ассистента вокруг новых abstraction layers без крайней необходимости;
|
||||
- домены, которые не поддерживаются текущими данными/evidence mapping;
|
||||
- попытки закрывать graph-gap только prompt-инженерией;
|
||||
- большие инфраструктурные переделки ради “красоты”.
|
||||
|
||||
---
|
||||
|
|
@ -154,20 +154,20 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
|
|||
## Главный принцип текущей работы
|
||||
|
||||
**Не строить целевую систему раньше времени.**
|
||||
Нужно сделать Stage 3 так, чтобы lifecycle-модели были не формальными таблицами, а реально работающим runtime-слоем и базой для следующих этапов.
|
||||
Нужно сделать Stage 4 так, чтобы graph-модели были не формальными схемами, а реально работающим runtime-слоем и базой для следующих этапов.
|
||||
|
||||
---
|
||||
|
||||
## Жёсткие архитектурные ограничения
|
||||
|
||||
### 1. Нельзя ломать текущий рабочий контур без прямой причины
|
||||
Если существующий transport / endpoint / base routing / normalizer pipeline работает, он должен сохраняться, если только изменение не является обязательным условием Stage 3.
|
||||
Если существующий transport / endpoint / base routing / normalizer pipeline работает, он должен сохраняться, если только изменение не является обязательным условием Stage 4.
|
||||
|
||||
### 2. Нельзя подменять архитектурные изменения промптами
|
||||
Проблемы lifecycle-state, transition logic, defect classification, ranking integration и answer grounding не должны решаться только промптами или “умной формулировкой ответа”.
|
||||
Проблемы graph connectivity, relation semantics, traversal logic, problem assembly integration и answer grounding не должны решаться только промптами или “умной формулировкой ответа”.
|
||||
|
||||
### 3. Нельзя преждевременно тащить Stage 4–6 в кодовую базу
|
||||
Если какое-либо изменение фактически реализует future-stage runtime, оно должно быть отклонено или отложено, если не доказана его необходимость для Stage 3.
|
||||
### 3. Нельзя преждевременно тащить Stage 5–6 в кодовую базу
|
||||
Если какое-либо изменение фактически реализует future-stage runtime, оно должно быть отклонено или отложено, если не доказана его необходимость для Stage 4.
|
||||
|
||||
### 4. Нельзя делать большие рефакторы ради абстрактной чистоты
|
||||
Разрешены только те изменения, которые:
|
||||
|
|
@ -175,15 +175,15 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
|
|||
- повышают устойчивость текущего слоя;
|
||||
- не разрушают траекторию дальнейшего развития.
|
||||
|
||||
### 5. Каждый lifecycle-элемент обязан иметь полный контур реализации
|
||||
Для каждого lifecycle-элемента должны существовать:
|
||||
### 5. Каждый graph-элемент обязан иметь полный контур реализации
|
||||
Для каждого graph-элемента должны существовать:
|
||||
- spec-level описание;
|
||||
- runtime-level вычисление;
|
||||
- retrieval/ranking-level использование;
|
||||
- answer-level интерпретация.
|
||||
|
||||
### 6. Нельзя вводить состояния и дефекты без evidence mapping
|
||||
Если состояние/переход/дефект нельзя определить по реально доступным данным, его нельзя вводить как runtime-элемент Stage 3.
|
||||
### 6. Нельзя вводить узлы/связи без provenance и evidence mapping
|
||||
Если node/edge нельзя определить по реально доступным данным и привязать к источнику, его нельзя вводить как runtime-элемент Stage 4.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -195,14 +195,14 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
|
|||
Сначала изучить:
|
||||
- текущий статус;
|
||||
- platform core ТЗ;
|
||||
- Stage 3;
|
||||
- зависимость от Stage 2;
|
||||
- Stage 4;
|
||||
- зависимости от Stage 3 и Stage 2;
|
||||
- roadmap;
|
||||
- контекст следующих этапов.
|
||||
|
||||
### Шаг B. Анализ текущего кода
|
||||
До внесения изменений определить:
|
||||
- какие части lifecycle already/partially реализованы;
|
||||
- какие части graph/lifecycle/problem layers already/partially реализованы;
|
||||
- где находятся реальные точки расширения;
|
||||
- какие элементы являются хрупкими;
|
||||
- какие изменения потребуют новых contracts;
|
||||
|
|
@ -221,7 +221,7 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
|
|||
Только после плана переходить к реализации.
|
||||
|
||||
Изменения должны вноситься малыми порциями, чтобы можно было проверить:
|
||||
- не вышел ли scope за Stage 3;
|
||||
- не вышел ли scope за Stage 4;
|
||||
- не сломан ли текущий контур;
|
||||
- не появились ли premature abstractions.
|
||||
|
||||
|
|
@ -233,6 +233,61 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
|
|||
- список ограничений;
|
||||
- список нерешённых вопросов;
|
||||
- оценку совместимости с дальнейшими этапами.
|
||||
- ссылку на run-папку в `llm_normalizer/docs/runs` с артефактами по стандарту структуры волны.
|
||||
|
||||
---
|
||||
|
||||
## Стандарт структуры run-артефактов (обязательный)
|
||||
|
||||
Для каждой волны тестов и приёмки нужно создавать отдельную run-папку в:
|
||||
|
||||
- `llm_normalizer/docs/runs`
|
||||
|
||||
### 1. Обязательный формат имени run-папки
|
||||
|
||||
Имя папки должно быть в формате:
|
||||
|
||||
- `YYYY-MM-DD_Stage_<NN>_Wave_<NN>_<short_topic>`
|
||||
|
||||
Где:
|
||||
- после даты обязательно идёт `Stage`;
|
||||
- после `Stage` обязательно идёт `Wave`;
|
||||
- только потом добавляется краткая тема прогона.
|
||||
|
||||
Пример:
|
||||
- `2026-03-26_Stage_03_Wave_03_Lifecycle_Prompts`
|
||||
|
||||
### 2. Обязательный состав артефактов run-папки
|
||||
|
||||
В каждой run-папке должны быть минимум:
|
||||
|
||||
- `README.md` (контекст волны и что проверяли);
|
||||
- `run_summary.json` (команды, результаты, ссылки на артефакты);
|
||||
- артефакты тестов/прогонов (eval, acceptance, regression и т.д.);
|
||||
- отдельная папка `prompt_dialogs`.
|
||||
|
||||
### 3. Обязательная папка диалогов `prompt_dialogs`
|
||||
|
||||
Папка `prompt_dialogs` должна содержать данные диалога в формате "вопрос пользователя -> ответ системы" и runtime-контекст:
|
||||
|
||||
- `prompt_dialogs/index.json` (индекс всех кейсов и файлов);
|
||||
|
||||
По каждому кейсу:
|
||||
|
||||
- `prompt_dialogs/<suite>/<case_id>.json` (сырой JSON диалога, debug/runtime поля, decomposition/grounding если доступны);
|
||||
- `prompt_dialogs/<suite>/<case_id>.md` (быстро читаемая версия user/system/assistant).
|
||||
|
||||
Эти файлы обязательны для разборов wave-результатов, чтобы быстро видеть:
|
||||
|
||||
- что именно спросил пользователь;
|
||||
- что вернула система;
|
||||
- что было декомпозировано и на чём основан ответ;
|
||||
- что отсутствует или отфильтровано в pipeline.
|
||||
|
||||
### 4. Запрет на смешивание волн
|
||||
|
||||
Нельзя складывать артефакты разных волн в одну и ту же run-папку.
|
||||
Каждая волна должна иметь собственную папку и собственный набор `prompt_dialogs`.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -243,20 +298,20 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
|
|||
### 1. Summary текущего состояния
|
||||
Краткое описание того, как текущая реализация устроена по коду.
|
||||
|
||||
### 2. Gap analysis относительно Stage 3
|
||||
Перечень того, чего не хватает для соответствия Stage 3.
|
||||
### 2. Gap analysis относительно Stage 4
|
||||
Перечень того, чего не хватает для соответствия Stage 4.
|
||||
|
||||
### 3. Предлагаемый file-level plan
|
||||
Какие файлы нужно менять, создавать или расширять.
|
||||
|
||||
### 4. Предлагаемые contracts / types / schemas
|
||||
Какие lifecycle-сущности и интерфейсы появятся.
|
||||
Какие graph/lifecycle/problem-сущности и интерфейсы появятся.
|
||||
|
||||
### 5. Test plan
|
||||
Какие тесты будут добавлены или обновлены.
|
||||
|
||||
### 6. Acceptance mapping
|
||||
Какие критерии Stage 3 покрываются какими изменениями.
|
||||
Какие критерии Stage 4 покрываются какими изменениями.
|
||||
|
||||
### 7. Explicit non-scope
|
||||
Что сознательно не будет делаться сейчас.
|
||||
|
|
@ -272,7 +327,7 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
|
|||
1. Что было проанализировано
|
||||
2. Что обнаружено
|
||||
3. Что предлагается изменить
|
||||
4. Почему это соответствует Stage 3
|
||||
4. Почему это соответствует Stage 4
|
||||
5. Что не входит в текущий scope
|
||||
6. Какие файлы затрагиваются
|
||||
7. Какие риски есть
|
||||
|
|
@ -295,8 +350,8 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
|
|||
|
||||
### 2. Явные contracts
|
||||
Всё, что касается:
|
||||
- lifecycle states/transitions/defects;
|
||||
- lifecycle resolution;
|
||||
- graph nodes/edges/relation semantics;
|
||||
- graph traversal/resolution;
|
||||
- enrichment contracts;
|
||||
- ranking factors;
|
||||
- answer interpretation;
|
||||
|
|
@ -306,11 +361,11 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
|
|||
|
||||
### 3. Контролируемая расширяемость
|
||||
Расширяемость допустима, но только в той мере, в которой она:
|
||||
- реально нужна Stage 3;
|
||||
- реально нужна Stage 4;
|
||||
- не заставляет внедрять всю будущую архитектуру заранее.
|
||||
|
||||
### 4. Наблюдаемость изменений
|
||||
Если добавляется новая lifecycle-логика, нужно продумать:
|
||||
Если добавляется новая graph-логика, нужно продумать:
|
||||
- как она тестируется;
|
||||
- как она логируется;
|
||||
- как проверяется её корректность;
|
||||
|
|
@ -329,10 +384,10 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
|
|||
|
||||
Следующие действия считаются ошибочными:
|
||||
|
||||
- “красивые lifecycle-таблицы” без рабочего resolver;
|
||||
- lifecycle-поля в логах без влияния на ranking/answer;
|
||||
- ответы вида “broken_lifecycle” без state/transition логики;
|
||||
- скрытая реализация Stage 4–6 под видом Stage 3;
|
||||
- “красивая ontology-схема” без рабочего graph runtime;
|
||||
- graph-поля в payload без влияния на retrieval/problem assembly/answer;
|
||||
- формальный builder без typed traversal и causal value;
|
||||
- скрытая реализация Stage 5–6 под видом Stage 4;
|
||||
- создание новых абстракций без runtime-пользы;
|
||||
- переписывание рабочего контура ради абстрактной чистоты;
|
||||
- неявное изменение scope;
|
||||
|
|
@ -344,12 +399,12 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
|
|||
|
||||
Если в процессе работы появляется одно или несколько из следующих явлений, нужно остановиться и пересобрать plan:
|
||||
|
||||
- предлагается graph runtime как обязательный путь Stage 3;
|
||||
- graph-модель проектируется без runtime-использования в retrieval/problem assembly/lifecycle;
|
||||
- предлагается full investigation orchestration для “удобства”;
|
||||
- lifecycle-модели проектируются без data/evidence mapping;
|
||||
- ranking и answer не получают lifecycle-интеграцию;
|
||||
- для Stage 3 предлагается большой platform refactor;
|
||||
- формируется новый data model слой без связи с acceptance criteria Stage 3.
|
||||
- relation semantics задаются без provenance/evidence mapping;
|
||||
- answer слой не получает graph-based объяснения;
|
||||
- для Stage 4 предлагается большой platform refactor;
|
||||
- формируется новый data model слой без связи с acceptance criteria Stage 4.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -357,13 +412,14 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
|
|||
|
||||
Текущая волна считается завершённой только если выполнены одновременно все условия:
|
||||
|
||||
1. Реализован scope Stage 3, а не произвольный “улучшенный вариант”.
|
||||
1. Реализован scope Stage 4, а не произвольный “улучшенный вариант”.
|
||||
2. Текущий рабочий контур не разрушен.
|
||||
3. Новые lifecycle contracts описаны явно.
|
||||
3. Новые graph contracts описаны явно.
|
||||
4. Есть тесты и/или проверяемые критерии для внесённых изменений.
|
||||
5. Нет скрытого уезда в Stage 4–6.
|
||||
5. Нет скрытого уезда в Stage 5–6.
|
||||
6. Изменения совместимы с platform core ТЗ.
|
||||
7. Зафиксировано, что сознательно осталось за пределами текущего этапа.
|
||||
8. Run-артефакты оформлены по стандарту `дата -> Stage -> Wave` и включают обязательную папку `prompt_dialogs`.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -371,10 +427,10 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
|
|||
|
||||
Текущая итерация должна дать следующий результат:
|
||||
|
||||
- lifecycle-aware problem reasoning вместо generic lifecycle labels;
|
||||
- stage/transition-aware ranking на covered-доменах;
|
||||
- более прикладные ответы по сценариям 51/60, 97, ОС, НДС и period close;
|
||||
- рабочий lifecycle runtime-контур, пригодный для дальнейшего развития.
|
||||
- graph-backed causal reasoning вместо локальных эвристических связок;
|
||||
- typed edge traversal в сценариях cross-branch и period-impact;
|
||||
- более прикладные ответы с явным путём проблемы по связям;
|
||||
- рабочий graph runtime-контур, пригодный для Stage 5 investigation engine.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -394,6 +450,6 @@ Stage 3 опирается на problem-centric слой Stage 2 и не дол
|
|||
|
||||
Главный вопрос перед любым изменением:
|
||||
|
||||
**Это действительно необходимо для Stage 3, или это попытка преждевременно реализовать Stage 4–6?**
|
||||
**Это действительно необходимо для Stage 4, или это попытка преждевременно реализовать Stage 5–6?**
|
||||
|
||||
Если ответ неочевиден, изменение откладывается и выносится на отдельное согласование.
|
||||
|
|
|
|||
|
|
@ -0,0 +1,58 @@
|
|||
# STAGE_03_CLOSEOUT_2026-03-26
|
||||
|
||||
## Статус
|
||||
|
||||
- Этап: Stage 3 / Lifecycle Formalization
|
||||
- Решение: `Accepted / Closed`
|
||||
- Дата фиксации: 2026-03-26
|
||||
|
||||
---
|
||||
|
||||
## Что подтверждено
|
||||
|
||||
1. `03_S3-97-STALLED-NODES` выведен из `out_of_scope` в `in_scope`.
|
||||
2. Схлопывание доменов в `bank_settlement` устранено.
|
||||
3. Synthetic placeholders удалены из user-facing `assistant_reply`.
|
||||
4. Stage 2 regression и Stage 3 probe разделение сохранено.
|
||||
5. Mojibake cleanup в user-facing layer завершён и подтверждён на всех 9 Stage 3 lifecycle probe кейсах.
|
||||
|
||||
---
|
||||
|
||||
## Финальные артефакты Stage 3
|
||||
|
||||
- Основной финальный run:
|
||||
- `llm_normalizer/docs/runs/2026-03-26_Stage_3_Wave_6_Mojibake_Final_MicroPatch`
|
||||
|
||||
- В run-папке присутствует `prompt_dialogs/stage3_lifecycle_probe`:
|
||||
- `01_S3-51-WRONG-CLOSE-TYPE`
|
||||
- `02_S3-60-PAYMENT-WITHOUT-CLOSURE`
|
||||
- `03_S3-97-STALLED-NODES`
|
||||
- `04_S3-97-EXPECTED-VS-ACTUAL`
|
||||
- `05_S3-OS-BRANCH-DIVERGENCE`
|
||||
- `06_S3-OS-TERMINAL-GAP`
|
||||
- `07_S3-VAT-CROSS-BRANCH-CONFLICT`
|
||||
- `08_S3-VAT-ACTUAL-VS-EXPECTED`
|
||||
- `09_S3-PERIOD-CLOSE-LIFECYCLE-IMPACT`
|
||||
|
||||
---
|
||||
|
||||
## Техническая валидация на момент закрытия
|
||||
|
||||
- `npm test` (backend): PASS
|
||||
- `npm run build` (backend): PASS
|
||||
|
||||
---
|
||||
|
||||
## Что переносится в Stage 4
|
||||
|
||||
1. Ввести graph-backed causal layer поверх текущих Stage 2/3 контрактов.
|
||||
2. Перевести retrieval/problem/lifecycle reasoning на typed graph connectivity.
|
||||
3. Подготовить архитектурную базу под Stage 5 investigation engine без преждевременной реализации Stage 5.
|
||||
|
||||
---
|
||||
|
||||
## Scope-дисциплина
|
||||
|
||||
- Stage 3 закрыт без изменения prompt-set как механизма решения runtime-проблем.
|
||||
- Stage 3 завершён без redesign transport/endpoint/base routing.
|
||||
- Stage 3 завершён без скрытой реализации Stage 4–6.
|
||||
|
|
@ -12,12 +12,22 @@
|
|||
|
||||
## Статус документа
|
||||
|
||||
- Статус: рабочая карта реализации Stage 3
|
||||
- Статус: Stage 3 завершён и зафиксирован как accepted (2026-03-26)
|
||||
- Язык: русский
|
||||
- Режим использования: обязателен к прочтению перед любыми изменениями по Stage 3
|
||||
- При конфликте по архитектурным ограничениям приоритет имеет `TZ_Platform_Core_Accounting_Assistant_Mode.md`
|
||||
- При конфликте по общему режиму работы Codex приоритет имеет `CODEX_MASTER_BRIEF.md`
|
||||
|
||||
### Фиксация закрытия Stage 3 (2026-03-26)
|
||||
|
||||
- Приёмка Stage 3 подтверждена.
|
||||
- Финальный micro-patch по mojibake закрыт.
|
||||
- Full test suite после финальной стабилизации: `npm test` = PASS.
|
||||
- Финальные run-артефакты Stage 3:
|
||||
- `llm_normalizer/docs/runs/2026-03-26_Stage_3_Wave_6_Mojibake_Final_MicroPatch`
|
||||
|
||||
Документ сохраняется как reference-card завершённого этапа и как baseline для Stage 4.
|
||||
|
||||
---
|
||||
|
||||
## Контекст
|
||||
|
|
@ -191,7 +201,9 @@ Codex должен вернуть не только код, но и набор
|
|||
- что сделано;
|
||||
- что не сделано сознательно;
|
||||
- какие риски остались;
|
||||
- что подготовлено для Stage 4.
|
||||
- что подготовлено для Stage 4;
|
||||
- run-папка в `llm_normalizer/docs/runs` с именем по схеме `YYYY-MM-DD_Stage_<NN>_Wave_<NN>_<short_topic>`;
|
||||
- обязательная папка `prompt_dialogs` с логами диалогов по кейсам (`index.json`, `<case_id>.json`, `<case_id>.md`).
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,277 @@
|
|||
# STAGE_04_TASK_CARD
|
||||
|
||||
## Назначение документа
|
||||
|
||||
Этот документ фиксирует **рабочий implementation scope четвёртого этапа** для Codex и разработчика.
|
||||
Документ не заменяет Stage 4 ТЗ и не заменяет platform core ТЗ.
|
||||
Его задача — перевести Stage 4 в практический рабочий контур без расползания в Stage 5–6.
|
||||
|
||||
---
|
||||
|
||||
## Статус документа
|
||||
|
||||
- Статус: рабочая карта реализации Stage 4
|
||||
- Язык: русский
|
||||
- Режим использования: обязателен к прочтению перед любыми изменениями по Stage 4
|
||||
- При конфликте по архитектурным ограничениям приоритет имеет `TZ_Platform_Core_Accounting_Assistant_Mode.md`
|
||||
- При конфликте по общему режиму работы Codex приоритет имеет `CODEX_MASTER_BRIEF.md`
|
||||
|
||||
---
|
||||
|
||||
## Контекст
|
||||
|
||||
- Stage 3 закрыт и принят (2026-03-26).
|
||||
- Текущий baseline: lifecycle-aware reasoning работает, Stage 2 regression и Stage 3 probe разделены.
|
||||
- Следующий шаг — не расширять prompt-слой, а ввести graph-backed causal layer как основу для дальнейшего investigation режима.
|
||||
|
||||
Опорные артефакты закрытия Stage 3:
|
||||
- `llm_normalizer/docs/runs/2026-03-26_Stage_3_Wave_6_Mojibake_Final_MicroPatch`
|
||||
|
||||
Стартовая run-папка Stage 4 Wave 1:
|
||||
- `llm_normalizer/docs/runs/2026-03-26_Stage_04_Wave_01_Kickoff`
|
||||
|
||||
---
|
||||
|
||||
## Цель Stage 4
|
||||
|
||||
Stage 4 должен дать **рабочее graph-ядро бухгалтерской предметной области**, чтобы retrieval, lifecycle и problem assembly опирались на единое причинно-следственное представление.
|
||||
|
||||
Практический результат этапа:
|
||||
|
||||
- типизированные graph-узлы и связи для ключевых бухгалтерских сущностей;
|
||||
- runtime-построение графа с provenance/confidence;
|
||||
- graph-aware planning/execution для graph-eligible запросов;
|
||||
- graph-backed problem assembly и lifecycle binding;
|
||||
- более причинные и проверяемые пользовательские ответы;
|
||||
- измеримое улучшение по benchmark/eval.
|
||||
|
||||
---
|
||||
|
||||
## Scope текущей реализации
|
||||
|
||||
### В scope входят
|
||||
|
||||
1. **Graph contracts и schema layer**
|
||||
- `AccountingGraphNode`;
|
||||
- `AccountingGraphEdge`;
|
||||
- `GraphSchemaRegistry`;
|
||||
- доменные типы узлов/связей для покрываемых сценариев.
|
||||
|
||||
2. **Graph runtime core**
|
||||
- `GraphBuilder`;
|
||||
- `GraphTraversalPolicy`;
|
||||
- `GraphProvenanceLayer`;
|
||||
- `GraphValidationLayer`.
|
||||
|
||||
3. **Интеграция в retrieval path**
|
||||
- graph eligibility в planner;
|
||||
- typed traversal в execution для graph-eligible кейсов;
|
||||
- детекция missing/conflicting links как runtime сигналов.
|
||||
|
||||
4. **Интеграция в problem/lifecycle layers**
|
||||
- graph-backed problem assembly;
|
||||
- graph-backed lifecycle transition checks;
|
||||
- корректная передача graph evidence в answer layer.
|
||||
|
||||
5. **Интеграция в answer layer**
|
||||
- user-facing объяснение по causal path;
|
||||
- явная фиксация отсутствующих/конфликтных связей;
|
||||
- сохранение честных ограничений confidence/coverage.
|
||||
|
||||
6. **Quality контур Stage 4**
|
||||
- unit/integration тесты graph core;
|
||||
- regression на Stage 2/Stage 3 маршрутах;
|
||||
- benchmark/eval до/после по graph value сценариям.
|
||||
|
||||
---
|
||||
|
||||
## Что не входит в scope
|
||||
|
||||
### Не делать сейчас
|
||||
|
||||
- полноценный Investigation Engine Stage 5;
|
||||
- full orchestration case-runtime с глубокой ветвизацией;
|
||||
- live verification core path и full mode split Stage 6;
|
||||
- глобальный enterprise graph beyond accounting core Stage 4;
|
||||
- большой рефактор transport/endpoint/base routing;
|
||||
- попытки закрыть graph-gap только prompt-изменениями.
|
||||
|
||||
---
|
||||
|
||||
## Обязательные результаты этапа
|
||||
|
||||
### 1. Рабочая graph-модель по целевым доменам
|
||||
|
||||
Должны быть внедрены типизированные узлы/связи как минимум для доменов, критичных для текущего набора кейсов:
|
||||
|
||||
- 51/60 расчётные цепочки;
|
||||
- 97 (расходы будущих периодов);
|
||||
- ОС;
|
||||
- НДС;
|
||||
- period_close.
|
||||
|
||||
### 2. Рабочий graph runtime
|
||||
|
||||
Должен существовать runtime-контур, который:
|
||||
|
||||
- строит graph из нормализованных сущностей;
|
||||
- хранит provenance/confidence;
|
||||
- поддерживает typed traversal;
|
||||
- выявляет missing/conflicting edges.
|
||||
|
||||
### 3. Graph-backed retrieval и problem assembly
|
||||
|
||||
- graph-eligible queries реально используют traversal;
|
||||
- problem units используют graph connectivity, а не только proximity/heuristics.
|
||||
|
||||
### 4. Graph-backed lifecycle binding
|
||||
|
||||
- lifecycle transition checks используют graph relations;
|
||||
- missing/invalid transitions имеют graph-опору.
|
||||
|
||||
### 5. Улучшение user-facing объяснений
|
||||
|
||||
- ответы показывают причинный путь проблемы;
|
||||
- видны узлы/связи разрыва;
|
||||
- сохраняется прозрачность uncertainty.
|
||||
|
||||
### 6. Измеримость ценности
|
||||
|
||||
- есть benchmark suite;
|
||||
- есть before/after evidence;
|
||||
- есть отчёт, где именно graph layer даёт прирост качества.
|
||||
|
||||
---
|
||||
|
||||
## Ожидаемые сущности Stage 4
|
||||
|
||||
Минимальный набор сущностей/компонентов:
|
||||
|
||||
1. `AccountingGraphNode`
|
||||
2. `AccountingGraphEdge`
|
||||
3. `GraphSchemaRegistry`
|
||||
4. `GraphBuilder`
|
||||
5. `GraphTraversalPolicy`
|
||||
6. `GraphProvenanceLayer`
|
||||
7. `GraphValidationLayer`
|
||||
8. `GraphBackedProblemAssembly`
|
||||
9. `GraphBackedLifecycleBinding`
|
||||
|
||||
---
|
||||
|
||||
## Жёсткие implementation-ограничения
|
||||
|
||||
### 1. Не ломать рабочий контур
|
||||
|
||||
Без прямой необходимости не переписывать:
|
||||
- transport;
|
||||
- endpoint;
|
||||
- base routing;
|
||||
- normalizer pipeline.
|
||||
|
||||
### 2. Graph только с runtime-value
|
||||
|
||||
Graph считается внедрённым только если влияет на:
|
||||
- retrieval execution;
|
||||
- problem assembly;
|
||||
- lifecycle reasoning;
|
||||
- user-facing explanation.
|
||||
|
||||
### 3. Никаких бездоказательных узлов/связей
|
||||
|
||||
Нельзя добавлять node/edge, если нет:
|
||||
- источника данных;
|
||||
- evidence mapping;
|
||||
- provenance trace.
|
||||
|
||||
### 4. Stage 5/6 не реализовывать внутри Stage 4
|
||||
|
||||
Любая попытка внедрить full investigation orchestration или live verification core отклоняется как non-scope.
|
||||
|
||||
---
|
||||
|
||||
## Порядок работы по Stage 4
|
||||
|
||||
### Шаг 1. Прочитать материалы
|
||||
|
||||
Обязательно прочитать:
|
||||
- `CODEX_MASTER_BRIEF.md`
|
||||
- `TZ_Platform_Core_Accounting_Assistant_Mode.md`
|
||||
- `TZ_Stage_4_Accounting_Ontology_Graph_Core_Assistant_Mode.md`
|
||||
- `TZ_Stage_3_Lifecycle_Formalization_Assistant_Mode.md`
|
||||
- `TZ_Stage_2_Retrieval_Unit_Shift_Assistant_Mode.md`
|
||||
|
||||
### Шаг 2. Сделать code-level mapping
|
||||
|
||||
Нужно определить:
|
||||
- где безопасно встраивать graph builder;
|
||||
- где planner/execution могут включать graph traversal;
|
||||
- где problem/lifecycle layers принимают graph evidence;
|
||||
- где answer layer получает causal path.
|
||||
|
||||
### Шаг 3. План без кода
|
||||
|
||||
До начала реализации Codex обязан вернуть:
|
||||
- gap analysis;
|
||||
- file-level plan;
|
||||
- contracts/types plan;
|
||||
- test/eval plan;
|
||||
- explicit non-scope.
|
||||
|
||||
### Шаг 4. Реализация малыми волнами
|
||||
|
||||
Рекомендуемая последовательность:
|
||||
|
||||
- Волна 1: graph schema + registry;
|
||||
- Волна 2: graph builder + provenance;
|
||||
- Волна 3: retrieval planner/execution graph integration;
|
||||
- Волна 4: problem/lifecycle graph binding;
|
||||
- Волна 5: answer integration;
|
||||
- Волна 6: benchmark/eval + hardening.
|
||||
|
||||
---
|
||||
|
||||
## Acceptance criteria (кратко)
|
||||
|
||||
Stage 4 считается закрытым только если одновременно:
|
||||
|
||||
1. Graph contracts реализованы и используются runtime.
|
||||
2. Graph traversal реально участвует в graph-eligible запросах.
|
||||
3. Problem assembly использует graph connectivity.
|
||||
4. Lifecycle checks используют graph transitions.
|
||||
5. User-facing ответы отражают causal graph path.
|
||||
6. Есть before/after подтверждение улучшения.
|
||||
7. Нет скрытого выезда в Stage 5–6.
|
||||
8. Run-артефакты оформлены по стандарту `date -> Stage -> Wave`, включая `prompt_dialogs`.
|
||||
|
||||
---
|
||||
|
||||
## Что Codex обязан явно указать в конце работы
|
||||
|
||||
1. Что сделано
|
||||
2. Какие файлы изменены
|
||||
3. Какие graph-сущности и компоненты введены
|
||||
4. Какие тесты добавлены
|
||||
5. Какие acceptance criteria закрыты
|
||||
6. Что сознательно НЕ реализовано
|
||||
7. Какие риски и ограничения остались
|
||||
8. Что подготовлено для Stage 5
|
||||
|
||||
---
|
||||
|
||||
## Definition of Done
|
||||
|
||||
Stage 4 завершён, если одновременно:
|
||||
|
||||
- graph-ядро работает в runtime, а не только в документации;
|
||||
- retrieval/problem/lifecycle/answer слои используют graph signals;
|
||||
- ответы становятся причинно связными и проверяемыми;
|
||||
- есть измеримая прибавка по benchmark/eval;
|
||||
- рабочий контур не разрушен;
|
||||
- нет premature implementation Stage 5–6.
|
||||
|
||||
---
|
||||
|
||||
## Короткая практическая формула этапа
|
||||
|
||||
**Stage 4 = переход от lifecycle-aware reasoning к graph-backed accounting causality.**
|
||||
File diff suppressed because it is too large
Load Diff
|
|
@ -14,15 +14,85 @@ const UUID_PATTERN = /\b[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]
|
|||
const LONG_HEX_PATTERN = /\b[0-9a-f]{24,}\b/gi;
|
||||
const RAW_REF_BLOB_PATTERN = /\bevidence_source_ref_v1\|[^\s,;]+/gi;
|
||||
const RAW_REF_TOKEN_PATTERN = /\b(?:source_ref|canonical_ref|entity_id|fragment_id|guid|uuid)\b/gi;
|
||||
const SYNTHETIC_PLACEHOLDER_PATTERN = /\bunknown_entity(?::[^\s,;]+)?\b/gi;
|
||||
const SYNTHETIC_FALLBACK_MARKER_PATTERN = /\b(?:unknown_source|unknown_record)\b/gi;
|
||||
const SYNTHETIC_ROUTE_TOKEN_PATTERN = /\bbatch_refresh_then_store:[^\s,;]+/gi;
|
||||
const CYRILLIC_MOJIBAKE_FRAGMENT_PATTERN = /(?:[\u0420\u0421][\u0080-\u04FF\u2000-\u20CF]){2,}/u;
|
||||
const LATIN_MOJIBAKE_FRAGMENT_PATTERN = /(?:[\u00D0\u00D1][\u0080-\u00FF]){2,}/u;
|
||||
const SHORT_CYRILLIC_MOJIBAKE_TOKEN_PATTERN = /^[\u0420\u0421][\u0080-\u04FF\u2000-\u20CF]{1,2}$/u;
|
||||
const PREFIXED_SHORT_CYRILLIC_MOJIBAKE_TOKEN_PATTERN = /^[\p{L}\p{N}_-]+[\u0420\u0421][\u0080-\u04FF\u2000-\u20CF]{1,2}$/u;
|
||||
const MOJIBAKE_SINGLE_MARKER_PATTERN = /^[\u0420\u0421\u00D0\u00D1]$/u;
|
||||
const MOJIBAKE_MARKER_CHAR_PATTERN = /[\u0402\u0403\u040A\u040C\u040E\u040F\u0452\u0453\u0459\u045A\u045C\u045E\u045F\u201A\u201E\u2020\u2021\u2026\u2030\u20AC\u2122]/u;
|
||||
const CYRILLIC_MOJIBAKE_FRAGMENT_GLOBAL_PATTERN = /(?:[\u0420\u0421][\u0080-\u04FF\u2000-\u20CF]){2,}/gu;
|
||||
const LATIN_MOJIBAKE_FRAGMENT_GLOBAL_PATTERN = /(?:[\u00D0\u00D1][\u0080-\u00FF]){2,}/g;
|
||||
const MOJIBAKE_MARKER_CHAR_GLOBAL_PATTERN = /[\u0402\u0403\u040A\u040C\u040E\u040F\u0452\u0453\u0459\u045A\u045C\u045E\u045F\u201A\u201E\u2020\u2021\u2026\u2030\u20AC\u2122]/gu;
|
||||
function normalizeToken(value) {
|
||||
return value.replace(/^[^\p{L}\p{N}_-]+|[^\p{L}\p{N}_-]+$/gu, "");
|
||||
}
|
||||
function isLikelyMojibakeToken(value) {
|
||||
const token = normalizeToken(String(value ?? ""));
|
||||
if (!token) {
|
||||
return false;
|
||||
}
|
||||
if (MOJIBAKE_SINGLE_MARKER_PATTERN.test(token)) {
|
||||
return true;
|
||||
}
|
||||
if (SHORT_CYRILLIC_MOJIBAKE_TOKEN_PATTERN.test(token)) {
|
||||
return true;
|
||||
}
|
||||
if (token.length <= 8 && PREFIXED_SHORT_CYRILLIC_MOJIBAKE_TOKEN_PATTERN.test(token)) {
|
||||
return true;
|
||||
}
|
||||
return CYRILLIC_MOJIBAKE_FRAGMENT_PATTERN.test(token) || LATIN_MOJIBAKE_FRAGMENT_PATTERN.test(token);
|
||||
}
|
||||
function countMojibakeTokens(value) {
|
||||
return String(value ?? "")
|
||||
.split(/[\s,.;:!?()[\]{}"']+/g)
|
||||
.filter((token) => token.length > 0)
|
||||
.filter((token) => isLikelyMojibakeToken(token)).length;
|
||||
}
|
||||
function countMojibakeSingleMarkers(value) {
|
||||
return String(value ?? "")
|
||||
.split(/[\s,.;:!?()[\]{}"']+/g)
|
||||
.filter((token) => token.length > 0)
|
||||
.map((token) => normalizeToken(token))
|
||||
.filter((token) => MOJIBAKE_SINGLE_MARKER_PATTERN.test(token)).length;
|
||||
}
|
||||
function stripMojibakeFragments(value) {
|
||||
const removedByToken = String(value ?? "")
|
||||
.split(/(\s+)/g)
|
||||
.map((part) => {
|
||||
if (/^\s+$/u.test(part)) {
|
||||
return part;
|
||||
}
|
||||
return isLikelyMojibakeToken(part) ? "" : part;
|
||||
})
|
||||
.join("");
|
||||
return removedByToken
|
||||
.replace(CYRILLIC_MOJIBAKE_FRAGMENT_GLOBAL_PATTERN, "")
|
||||
.replace(LATIN_MOJIBAKE_FRAGMENT_GLOBAL_PATTERN, "")
|
||||
.replace(MOJIBAKE_MARKER_CHAR_GLOBAL_PATTERN, "")
|
||||
.replace(/\s+([,.;:!?])/g, "$1")
|
||||
.replace(/\s{2,}/g, " ")
|
||||
.trim();
|
||||
}
|
||||
function looksLikeMojibake(value) {
|
||||
const text = String(value ?? "");
|
||||
if (!text.trim()) {
|
||||
return false;
|
||||
}
|
||||
if (/(?:Р.|С.){5,}/u.test(text)) {
|
||||
const tokenHits = countMojibakeTokens(text);
|
||||
const singleMarkers = countMojibakeSingleMarkers(text);
|
||||
if (tokenHits >= 2 || (tokenHits >= 1 && singleMarkers >= 1) || singleMarkers >= 3) {
|
||||
return true;
|
||||
}
|
||||
if (/[ЃѓЂђЌќЎў]/u.test(text)) {
|
||||
if (MOJIBAKE_MARKER_CHAR_PATTERN.test(text)) {
|
||||
return true;
|
||||
}
|
||||
if (CYRILLIC_MOJIBAKE_FRAGMENT_PATTERN.test(text) || LATIN_MOJIBAKE_FRAGMENT_PATTERN.test(text)) {
|
||||
return true;
|
||||
}
|
||||
if (/\uFFFD/u.test(text)) {
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
|
|
@ -59,14 +129,29 @@ function scrubRawTechnicalRefs(value) {
|
|||
.replace(/\s{2,}/g, " ")
|
||||
.trim();
|
||||
}
|
||||
function sanitizeUserFacingReply(value) {
|
||||
return scrubRawTechnicalRefs(value)
|
||||
.replace(/[ \t]+\n/g, "\n")
|
||||
.replace(/\n{3,}/g, "\n\n")
|
||||
function stripSyntheticPlaceholders(value) {
|
||||
return String(value ?? "")
|
||||
.replace(SYNTHETIC_PLACEHOLDER_PATTERN, "")
|
||||
.replace(SYNTHETIC_FALLBACK_MARKER_PATTERN, "")
|
||||
.replace(SYNTHETIC_ROUTE_TOKEN_PATTERN, "")
|
||||
.replace(/[;,:]\s*[;,:]+/g, "; ")
|
||||
.replace(/\s{2,}/g, " ")
|
||||
.trim();
|
||||
}
|
||||
function sanitizeUserFacingReply(value) {
|
||||
const normalized = scrubRawTechnicalRefs(value).replace(/[ \t]+\n/g, "\n");
|
||||
const cleanedLines = normalized
|
||||
.split(/\r?\n/g)
|
||||
.map((line) => stripSyntheticPlaceholders(line))
|
||||
.map((line) => stripMojibakeFragments(line))
|
||||
.map((line) => line.trim())
|
||||
.filter((line) => line.length > 0)
|
||||
.filter((line) => !looksLikeMojibake(line));
|
||||
const cleaned = cleanedLines.join("\n").replace(/\n{3,}/g, "\n\n").trim();
|
||||
return cleaned || "Available data requires clarification for a reliable user-facing answer.";
|
||||
}
|
||||
function sanitizeUserText(value) {
|
||||
const normalized = scrubRawTechnicalRefs(String(value ?? "").replace(/\s+/g, " ").trim());
|
||||
const normalized = stripMojibakeFragments(stripSyntheticPlaceholders(scrubRawTechnicalRefs(String(value ?? "").replace(/\s+/g, " ").trim())));
|
||||
if (!normalized) {
|
||||
return null;
|
||||
}
|
||||
|
|
@ -180,13 +265,13 @@ function buildFallbackWhyIncluded(results) {
|
|||
const filteredRecords = summaryNumber(result, "filtered_records_after_narrowing");
|
||||
const checkedRecords = summaryNumber(result, "checked_records");
|
||||
if (routeFocus) {
|
||||
lines.push(`Проверка выполнена по профилю ${routeFocus}.`);
|
||||
lines.push(`Проверка выполнена по профилю ${routeFocus}.`);
|
||||
}
|
||||
if (sourceRecords !== null && filteredRecords !== null && filteredRecords < sourceRecords) {
|
||||
lines.push(`Применено сужение выборки: ${filteredRecords} из ${sourceRecords} записей.`);
|
||||
lines.push(`Применено сужение выборки: ${filteredRecords} из ${sourceRecords} записей.`);
|
||||
}
|
||||
if (checkedRecords !== null) {
|
||||
lines.push(`Проверено записей в текущем проходе: ${checkedRecords}.`);
|
||||
lines.push(`Проверено записей в текущем проходе: ${checkedRecords}.`);
|
||||
}
|
||||
}
|
||||
return sanitizeUserLines(lines, 4);
|
||||
|
|
@ -195,34 +280,34 @@ function buildFallbackSelectionReasons(results) {
|
|||
const lines = [];
|
||||
for (const result of results.slice(0, 2)) {
|
||||
if (summaryBoolean(result, "semantic_narrowing_applied")) {
|
||||
lines.push("Отбор выполнен по семантическому сужению предметной области.");
|
||||
lines.push("Отбор выполнен по семантическому сужению предметной области.");
|
||||
}
|
||||
const rankingBasis = summaryStringArray(result, "ranking_basis");
|
||||
if (rankingBasis.length > 0) {
|
||||
lines.push(`Ранжирование основано на: ${rankingBasis.join(", ")}.`);
|
||||
lines.push(`Ранжирование основано на: ${rankingBasis.join(", ")}.`);
|
||||
}
|
||||
if (summaryBoolean(result, "broad_guard_applied")) {
|
||||
lines.push("Применен broad-query guard для контроля ложной точности.");
|
||||
lines.push("Применен broad-query guard для контроля ложной точности.");
|
||||
}
|
||||
}
|
||||
if (lines.length === 0) {
|
||||
lines.push("Отбор выполнен по совпадению предметных сигналов и доступной evidence-опоры.");
|
||||
lines.push("Отбор выполнен по совпадению предметных сигналов и доступной evidence-опоры.");
|
||||
}
|
||||
return sanitizeUserLines(lines, 4);
|
||||
}
|
||||
function suggestNextStep(requirements, coverage) {
|
||||
const next = [];
|
||||
if (coverage.clarification_needed_for.length > 0) {
|
||||
next.push("Уточните период, счет, документ или контрагента для требований: " + coverage.clarification_needed_for.join(", ") + ".");
|
||||
next.push("Уточните период, счет, документ или контрагента для требований: " + coverage.clarification_needed_for.join(", ") + ".");
|
||||
}
|
||||
if (coverage.requirements_uncovered.length > 0) {
|
||||
next.push("Проверьте непокрытые требования: " + coverage.requirements_uncovered.join(", ") + ".");
|
||||
next.push("Проверьте непокрытые требования: " + coverage.requirements_uncovered.join(", ") + ".");
|
||||
}
|
||||
if (coverage.out_of_scope_requirements.length > 0) {
|
||||
next.push("Часть запроса вне текущего учетного контура: " + coverage.out_of_scope_requirements.join(", ") + ".");
|
||||
next.push("Часть запроса вне текущего учетного контура: " + coverage.out_of_scope_requirements.join(", ") + ".");
|
||||
}
|
||||
if (next.length === 0 && requirements.length > 0) {
|
||||
next.push("Следующим шагом можно открыть технический разбор и углубить проверку по выбранным объектам.");
|
||||
next.push("Следующим шагом можно открыть технический разбор и углубить проверку по выбранным объектам.");
|
||||
}
|
||||
return next;
|
||||
}
|
||||
|
|
@ -264,21 +349,25 @@ function selectProblemUnitSummary(results) {
|
|||
return selected;
|
||||
}
|
||||
function formatAffectedScope(unit) {
|
||||
const accountScope = sanitizeUserLines(unit.affected_accounts, 2);
|
||||
const counterpartyScope = sanitizeUserLines(unit.affected_counterparties, 2);
|
||||
const documentScope = sanitizeUserLines(unit.affected_documents, 2);
|
||||
const entityScope = sanitizeUserLines(unit.affected_entities, 2);
|
||||
const scopeParts = [];
|
||||
if (unit.affected_accounts.length > 0) {
|
||||
scopeParts.push(`счета: ${unit.affected_accounts.slice(0, 2).join(", ")}`);
|
||||
if (accountScope.length > 0) {
|
||||
scopeParts.push(`accounts: ${accountScope.join(", ")}`);
|
||||
}
|
||||
if (unit.affected_counterparties.length > 0) {
|
||||
scopeParts.push(`контрагенты: ${unit.affected_counterparties.slice(0, 2).join(", ")}`);
|
||||
if (counterpartyScope.length > 0) {
|
||||
scopeParts.push(`counterparties: ${counterpartyScope.join(", ")}`);
|
||||
}
|
||||
if (unit.affected_documents.length > 0) {
|
||||
scopeParts.push(`документы: ${unit.affected_documents.slice(0, 2).join(", ")}`);
|
||||
if (documentScope.length > 0) {
|
||||
scopeParts.push(`documents: ${documentScope.join(", ")}`);
|
||||
}
|
||||
if (scopeParts.length === 0 && unit.affected_entities.length > 0) {
|
||||
scopeParts.push(`объекты: ${unit.affected_entities.slice(0, 2).join(", ")}`);
|
||||
if (scopeParts.length === 0 && entityScope.length > 0) {
|
||||
scopeParts.push(`entities: ${entityScope.join(", ")}`);
|
||||
}
|
||||
if (scopeParts.length === 0) {
|
||||
return "затронутый контур требует уточнения";
|
||||
return "affected scope requires clarification";
|
||||
}
|
||||
return scopeParts.join("; ");
|
||||
}
|
||||
|
|
@ -339,47 +428,47 @@ function buildProblemCentricActions(input) {
|
|||
const actions = [];
|
||||
const unitTypes = new Set(input.units.map((item) => item.problem_unit_type));
|
||||
if (unitTypes.has("broken_chain_segment")) {
|
||||
actions.push("Проверьте связку выписка -> документ -> проводка по проблемным участкам цепочки.");
|
||||
actions.push("Проверьте связку выписка -> документ -> проводка по проблемным участкам цепочки.");
|
||||
}
|
||||
if (unitTypes.has("unresolved_settlement_cluster")) {
|
||||
actions.push("Сверьте хвосты по расчетам: закрылся ли документ оплаты корректным закрывающим документом.");
|
||||
actions.push("Сверьте хвосты по расчетам: закрылся ли документ оплаты корректным закрывающим документом.");
|
||||
}
|
||||
if (unitTypes.has("period_risk_cluster")) {
|
||||
actions.push("Оцените влияние дефекта на закрытие периода и корректность регламентных операций.");
|
||||
actions.push("Оцените влияние дефекта на закрытие периода и корректность регламентных операций.");
|
||||
}
|
||||
if (unitTypes.has("cross_branch_inconsistency_cluster")) {
|
||||
actions.push("Сверьте противоречия между документами, проводками и регистрами по НДС/межконтурным связям.");
|
||||
actions.push("Сверьте противоречия между документами, проводками и регистрами по НДС/межконтурным связям.");
|
||||
}
|
||||
if (unitTypes.has("lifecycle_anomaly_node")) {
|
||||
actions.push("Проверьте lifecycle объекта: ожидаемый этап не должен оставаться в partially_linked состоянии.");
|
||||
actions.push("Проверьте lifecycle объекта: ожидаемый этап не должен оставаться в partially_linked состоянии.");
|
||||
}
|
||||
for (const unit of input.units) {
|
||||
if (unit.lifecycle_defect_type === "stale_active_state") {
|
||||
actions.push("Проверьте, почему объект завис: ожидаемый переход не должен оставаться в активной стадии.");
|
||||
actions.push("Проверьте, почему объект завис: ожидаемый переход не должен оставаться в активной стадии.");
|
||||
}
|
||||
if (unit.lifecycle_defect_type === "misclosed_state") {
|
||||
actions.push("Проверьте закрывающий документ и проводки: закрытие может быть формальным, но некорректным по пути.");
|
||||
actions.push("Проверьте закрывающий документ и проводки: закрытие может быть формальным, но некорректным по пути.");
|
||||
}
|
||||
if (unit.lifecycle_defect_type === "cross_branch_state_conflict") {
|
||||
actions.push("Сверьте бухгалтерскую и смежную ветки (например, НДС/расчеты): обнаружен межконтурный конфликт состояния.");
|
||||
actions.push("Сверьте бухгалтерскую и смежную ветки (например, НДС/расчеты): обнаружен межконтурный конфликт состояния.");
|
||||
}
|
||||
}
|
||||
if (input.mode === "clarification_required") {
|
||||
if (input.missingAnchors.period) {
|
||||
actions.push("Уточните период проверки, чтобы зафиксировать границы проблемного контура.");
|
||||
actions.push("Уточните период проверки, чтобы зафиксировать границы проблемного контура.");
|
||||
}
|
||||
if (input.missingAnchors.account) {
|
||||
actions.push("Уточните счет или группу счетов для предметной локализации дефекта.");
|
||||
actions.push("Уточните счет или группу счетов для предметной локализации дефекта.");
|
||||
}
|
||||
if (input.missingAnchors.documentOrObject) {
|
||||
actions.push("Укажите конкретный документ или объект трассировки для проверки механизма отклонения.");
|
||||
actions.push("Укажите конкретный документ или объект трассировки для проверки механизма отклонения.");
|
||||
}
|
||||
if (input.missingAnchors.counterparty) {
|
||||
actions.push("Укажите контрагента/договор, чтобы проверить хвосты и разрывы на конкретной связке.");
|
||||
actions.push("Укажите контрагента/договор, чтобы проверить хвосты и разрывы на конкретной связке.");
|
||||
}
|
||||
}
|
||||
if (input.coverageReport.requirements_uncovered.length > 0) {
|
||||
actions.push(`Закройте непокрытые требования: ${input.coverageReport.requirements_uncovered.join(", ")}.`);
|
||||
actions.push(`Закройте непокрытые требования: ${input.coverageReport.requirements_uncovered.join(", ")}.`);
|
||||
}
|
||||
return uniqueStrings(actions, 6);
|
||||
}
|
||||
|
|
@ -390,28 +479,28 @@ function buildProblemCentricClarifications(input) {
|
|||
const questions = [];
|
||||
const unitTypes = new Set(input.units.map((item) => item.problem_unit_type));
|
||||
if (input.missingAnchors.period) {
|
||||
questions.push("Уточните период (например, 2020-06), в котором нужно проверить проблемный кластер.");
|
||||
questions.push("Уточните период (например, 2020-06), в котором нужно проверить проблемный кластер.");
|
||||
}
|
||||
if (input.missingAnchors.account) {
|
||||
questions.push("Уточните счет или связку счетов (например, 51/60), где вы ожидаете дефект.");
|
||||
questions.push("Уточните счет или связку счетов (например, 51/60), где вы ожидаете дефект.");
|
||||
}
|
||||
if (input.missingAnchors.documentOrObject) {
|
||||
questions.push("Укажите документ/объект, от которого нужно строить проверку цепочки.");
|
||||
questions.push("Укажите документ/объект, от которого нужно строить проверку цепочки.");
|
||||
}
|
||||
if (input.missingAnchors.counterparty) {
|
||||
questions.push("Укажите контрагента или договор, по которому проверить незакрытую экспозицию.");
|
||||
questions.push("Укажите контрагента или договор, по которому проверить незакрытую экспозицию.");
|
||||
}
|
||||
if (unitTypes.has("broken_chain_segment")) {
|
||||
questions.push("Уточните участок цепочки: выписка, платежный документ или проводка.");
|
||||
questions.push("Уточните участок цепочки: выписка, платежный документ или проводка.");
|
||||
}
|
||||
if (unitTypes.has("period_risk_cluster")) {
|
||||
questions.push("Уточните, какой этап закрытия периода критичен: начисление, закрытие счетов или НДС-блок.");
|
||||
questions.push("Уточните, какой этап закрытия периода критичен: начисление, закрытие счетов или НДС-блок.");
|
||||
}
|
||||
if (unitTypes.has("unresolved_settlement_cluster")) {
|
||||
questions.push("Уточните, интересуют хвосты поставщиков, покупателей или оба направления.");
|
||||
questions.push("Уточните, интересуют хвосты поставщиков, покупателей или оба направления.");
|
||||
}
|
||||
if (input.coverageReport.clarification_needed_for.length > 0) {
|
||||
questions.push(`Закройте уточнения для требований: ${input.coverageReport.clarification_needed_for.join(", ")}.`);
|
||||
questions.push(`Закройте уточнения для требований: ${input.coverageReport.clarification_needed_for.join(", ")}.`);
|
||||
}
|
||||
return uniqueStrings(questions, 6);
|
||||
}
|
||||
|
|
@ -522,10 +611,10 @@ function limitationReasonToText(code) {
|
|||
function detectMissingAnchors(userMessage) {
|
||||
const lower = String(userMessage ?? "").toLowerCase();
|
||||
const hasPeriod = /\b20\d{2}(?:[-./](?:0[1-9]|1[0-2]))?\b/.test(lower);
|
||||
const hasAccount = /(?:\bсчет\b|\baccount\b|\bschet\b|\b\d{2}(?:\.\d{2})?\b)/i.test(lower);
|
||||
const hasDocumentOrObject = /(?:документ|invoice|guid|object|obj|#\d+|\bid\b|\bref\b|dokument|doc)/i.test(lower);
|
||||
const hasCounterparty = /(?:контрагент|supplier|buyer|customer|kontragent|postavsh|pokupatel)/i.test(lower);
|
||||
const hasAnomalyType = /(?:аномал|risk|отклон|разрыв|mismatch|duplicate|tail|цепочк|anomali|hvost)/i.test(lower);
|
||||
const hasAccount = /(?:\bсчет\b|\baccount\b|\bschet\b|\b\d{2}(?:\.\d{2})?\b)/i.test(lower);
|
||||
const hasDocumentOrObject = /(?:документ|invoice|guid|object|obj|#\d+|\bid\b|\bref\b|dokument|doc)/i.test(lower);
|
||||
const hasCounterparty = /(?:контрагент|supplier|buyer|customer|kontragent|postavsh|pokupatel)/i.test(lower);
|
||||
const hasAnomalyType = /(?:аномал|risk|отклон|разрыв|mismatch|duplicate|tail|цепочк|anomali|hvost)/i.test(lower);
|
||||
return {
|
||||
period: !hasPeriod,
|
||||
account: !hasAccount,
|
||||
|
|
@ -541,53 +630,53 @@ function buildClarificationQuestions(input) {
|
|||
return questions;
|
||||
}
|
||||
if (input.missingAnchors.period) {
|
||||
questions.push("Уточните период проверки (например, 2020-06).");
|
||||
questions.push("Уточните период проверки (например, 2020-06).");
|
||||
}
|
||||
if (input.missingAnchors.account) {
|
||||
questions.push("Уточните счет или группу счетов (например, 19, 60, 62).");
|
||||
questions.push("Уточните счет или группу счетов (например, 19, 60, 62).");
|
||||
}
|
||||
if (input.missingAnchors.documentOrObject) {
|
||||
questions.push("Укажите документ/GUID/конкретный объект для трассировки.");
|
||||
questions.push("Укажите документ/GUID/конкретный объект для трассировки.");
|
||||
}
|
||||
if (input.missingAnchors.counterparty) {
|
||||
questions.push("Укажите контрагента или группу контрагентов.");
|
||||
questions.push("Укажите контрагента или группу контрагентов.");
|
||||
}
|
||||
if (input.policySignals.broad_query_detected && input.missingAnchors.anomalyType) {
|
||||
questions.push("Уточните тип отклонения: разрыв цепочки, неверный документ или аномальный риск.");
|
||||
questions.push("Уточните тип отклонения: разрыв цепочки, неверный документ или аномальный риск.");
|
||||
}
|
||||
if (input.coverageReport.clarification_needed_for.length > 0) {
|
||||
questions.push(`Закройте уточнения для требований: ${input.coverageReport.clarification_needed_for.join(", ")}.`);
|
||||
questions.push(`Закройте уточнения для требований: ${input.coverageReport.clarification_needed_for.join(", ")}.`);
|
||||
}
|
||||
return uniqueStrings(questions, 6);
|
||||
}
|
||||
function buildRecommendedActions(input) {
|
||||
const actions = [];
|
||||
if (input.mode === "focused_grounded") {
|
||||
actions.push("Проверьте 1-2 ключевые записи в учетной базе и зафиксируйте итог в рабочем файле проверки.");
|
||||
actions.push("Проверьте 1-2 ключевые записи в учетной базе и зафиксируйте итог в рабочем файле проверки.");
|
||||
}
|
||||
if (input.mode === "broad_partial") {
|
||||
actions.push("Сузьте запрос до периода + счета или периода + документа и повторите проверку.");
|
||||
actions.push("Сузьте запрос до периода + счета или периода + документа и повторите проверку.");
|
||||
}
|
||||
if (input.mode === "clarification_required") {
|
||||
actions.push("Дайте недостающие якоря (период/счет/объект), иначе сильный factual вывод невозможен.");
|
||||
actions.push("Дайте недостающие якоря (период/счет/объект), иначе сильный factual вывод невозможен.");
|
||||
}
|
||||
if (input.coverageReport.requirements_uncovered.length > 0) {
|
||||
actions.push(`Закройте непокрытые требования: ${input.coverageReport.requirements_uncovered.join(", ")}.`);
|
||||
actions.push(`Закройте непокрытые требования: ${input.coverageReport.requirements_uncovered.join(", ")}.`);
|
||||
}
|
||||
if (input.coverageReport.requirements_partially_covered.length > 0) {
|
||||
actions.push(`Доуточните частично покрытые требования: ${input.coverageReport.requirements_partially_covered.join(", ")}.`);
|
||||
actions.push(`Доуточните частично покрытые требования: ${input.coverageReport.requirements_partially_covered.join(", ")}.`);
|
||||
}
|
||||
if (input.policySignals.broad_query_detected && input.policySignals.narrowing_strength !== "strong") {
|
||||
actions.push("Добавьте более узкий контекст: тип отклонения, группу документов и бизнес-участок.");
|
||||
actions.push("Добавьте более узкий контекст: тип отклонения, группу документов и бизнес-участок.");
|
||||
}
|
||||
if (input.limitationReasonCodes.includes("snapshot_only")) {
|
||||
actions.push("Сверьте критичные выводы с live source-of-record в 1C.");
|
||||
actions.push("Сверьте критичные выводы с live source-of-record в 1C.");
|
||||
}
|
||||
if (input.limitationReasonCodes.includes("weak_source_mapping")) {
|
||||
actions.push("Проверьте source mapping для связей document/register по указанным ref.");
|
||||
actions.push("Проверьте source mapping для связей document/register по указанным ref.");
|
||||
}
|
||||
if (input.sourceRefs.length > 0) {
|
||||
actions.push(`Начните проверку с ${input.sourceRefs.length} подтвержденных записей и сверьте их с первичными документами.`);
|
||||
actions.push(`Начните проверку с ${input.sourceRefs.length} подтвержденных записей и сверьте их с первичными документами.`);
|
||||
}
|
||||
return uniqueStrings(actions, 6);
|
||||
}
|
||||
|
|
@ -674,84 +763,88 @@ function buildPolicyDecision(input) {
|
|||
}
|
||||
function buildAnswerSummary(mode) {
|
||||
if (mode === "focused_grounded")
|
||||
return "Сформирован прямой ответ на основе подтвержденной опоры.";
|
||||
return "Сформирован прямой ответ на основе подтвержденной опоры.";
|
||||
if (mode === "broad_partial")
|
||||
return "Вывод ограничен: есть частичная опора, но не полный coverage.";
|
||||
return "Вывод ограничен: есть частичная опора, но не полный coverage.";
|
||||
if (mode === "clarification_required")
|
||||
return "Нужны уточнения: без сужения strong factual вывод ненадежен.";
|
||||
return "Нужны уточнения: без сужения strong factual вывод ненадежен.";
|
||||
if (mode === "out_of_scope")
|
||||
return "Запрос вне доступного учетного контура.";
|
||||
return "Запрос вне доступного учетного контура.";
|
||||
if (mode === "route_mismatch")
|
||||
return "Результат маршрута не совпал с предметом вопроса.";
|
||||
return "Результат маршрута не совпал с предметом вопроса.";
|
||||
if (mode === "empty")
|
||||
return "В текущем срезе данных релевантные записи не обнаружены.";
|
||||
return "В текущем срезе данных релевантные записи не обнаружены.";
|
||||
if (mode === "no_grounded")
|
||||
return "Недостаточно опоры для обоснованного ответа.";
|
||||
return "Не удалось собрать обоснованный ответ по текущему запросу.";
|
||||
return "Недостаточно опоры для обоснованного ответа.";
|
||||
return "Не удалось собрать обоснованный ответ по текущему запросу.";
|
||||
}
|
||||
function buildDirectAnswer(input) {
|
||||
const topFact = firstMeaningfulFact(input.retrievalResults);
|
||||
if (input.mode === "focused_grounded") {
|
||||
return topFact ?? "Подтвержденный результат получен; можно продолжать предметную проверку без деградации.";
|
||||
return topFact ?? "Подтвержденный результат получен; можно продолжать предметную проверку без деградации.";
|
||||
}
|
||||
if (input.mode === "broad_partial") {
|
||||
if (topFact) {
|
||||
return `Доступен ограниченный подтвержденный фрагмент: ${topFact}`;
|
||||
return `Доступен ограниченный подтвержденный фрагмент: ${topFact}`;
|
||||
}
|
||||
return "Есть только ограниченная опора; вывод дан в частичном режиме без ложной точности.";
|
||||
return "Есть только ограниченная опора; вывод дан в частичном режиме без ложной точности.";
|
||||
}
|
||||
if (input.mode === "clarification_required") {
|
||||
return "Текущий запрос слишком широкий или недоопределен; надежный factual вывод пока невозможен.";
|
||||
return "Текущий запрос слишком широкий или недоопределен; надежный factual вывод пока невозможен.";
|
||||
}
|
||||
if (input.mode === "out_of_scope") {
|
||||
return "Могу отвечать только в пределах данных доступного учетного контура.";
|
||||
return "Могу отвечать только в пределах данных доступного учетного контура.";
|
||||
}
|
||||
if (input.mode === "route_mismatch") {
|
||||
return "Предмет результата не совпал с предметом вопроса; требуется уточнение фокуса.";
|
||||
return "Предмет результата не совпал с предметом вопроса; требуется уточнение фокуса.";
|
||||
}
|
||||
if (input.mode === "empty") {
|
||||
return "В текущем срезе данных проблемные записи по заданному условию не найдены.";
|
||||
return "В текущем срезе данных проблемные записи по заданному условию не найдены.";
|
||||
}
|
||||
if (input.mode === "no_grounded") {
|
||||
return "Недостаточно подтвержденной опоры для ответа в требуемой точности.";
|
||||
return "Недостаточно подтвержденной опоры для ответа в требуемой точности.";
|
||||
}
|
||||
if (input.policySignals.minimum_evidence_failed) {
|
||||
return "Маршрут отработал, но минимальная evidence-опора не пройдена.";
|
||||
return "Маршрут отработал, но минимальная evidence-опора не пройдена.";
|
||||
}
|
||||
return "Не удалось сформировать обоснованный ответ; нужно уточнение запроса.";
|
||||
return "Не удалось сформировать обоснованный ответ; нужно уточнение запроса.";
|
||||
}
|
||||
function buildProblemCentricAnswerSummary(input) {
|
||||
if (input.lifecycleEnriched && input.summary?.lifecycle_enriched_units && input.summary.lifecycle_enriched_units > 0) {
|
||||
if (input.mode === "clarification_required") {
|
||||
return "Выявлены lifecycle-дефекты, но для надежного вывода требуется уточнение предметных якорей.";
|
||||
return "Выявлены lifecycle-дефекты, но для надежного вывода требуется уточнение предметных якорей.";
|
||||
}
|
||||
return `Сформирован lifecycle-aware problem срез: выделено ${input.summary.lifecycle_enriched_units} lifecycle-узлов с приоритетом по дефектам перехода.`;
|
||||
return `Сформирован lifecycle-aware problem срез: выделено ${input.summary.lifecycle_enriched_units} lifecycle-узлов с приоритетом по дефектам перехода.`;
|
||||
}
|
||||
if (input.mode === "clarification_required") {
|
||||
return "Выявлены проблемные кластеры, но для надежного вывода требуется предметное уточнение фокуса.";
|
||||
return "Выявлены проблемные кластеры, но для надежного вывода требуется предметное уточнение фокуса.";
|
||||
}
|
||||
if (input.weakUnits) {
|
||||
return "Сформирован problem-centric срез с ограниченной опорой; вывод предварительный и требует до-проверки.";
|
||||
return "Сформирован problem-centric срез с ограниченной опорой; вывод предварительный и требует до-проверки.";
|
||||
}
|
||||
if (input.summary?.units_total && input.summary.units_total > 1) {
|
||||
return `Сформирован problem-centric срез: выделено ${input.summary.units_total} проблемных кластера с приоритетами.`;
|
||||
return `Сформирован problem-centric срез: выделено ${input.summary.units_total} проблемных кластера с приоритетами.`;
|
||||
}
|
||||
return "Сформирован problem-centric срез: выделен ключевой проблемный кластер и затронутый контур.";
|
||||
return "Сформирован problem-centric срез: выделен ключевой проблемный кластер и затронутый контур.";
|
||||
}
|
||||
function buildProblemCentricDirectAnswer(input) {
|
||||
const lead = input.mode === "clarification_required"
|
||||
? "Обнаружены проблемные зоны, но без уточнения якорей сильный factual-вывод преждевременен."
|
||||
? "Обнаружены проблемные зоны, но без уточнения якорей сильный factual-вывод преждевременен."
|
||||
: input.weakUnits
|
||||
? "Выделены проблемные зоны с ограниченной надежностью; вывод дан в ограниченном режиме."
|
||||
? "Выделены проблемные зоны с ограниченной надежностью; вывод дан в ограниченном режиме."
|
||||
: input.lifecycleAnswerEnabled && hasLifecycleResolution(input.units)
|
||||
? "Выделены lifecycle-проблемы: определены текущие/ожидаемые стадии и тип нарушения перехода."
|
||||
: "Выделены ключевые проблемные зоны и их влияние на учетный контур.";
|
||||
? "Выделены lifecycle-проблемы: определены текущие/ожидаемые стадии и тип нарушения перехода."
|
||||
: "Выделены ключевые проблемные зоны и их влияние на учетный контур.";
|
||||
const unitLines = input.units.map((unit) => {
|
||||
const scope = formatAffectedScope(unit);
|
||||
const lifecycleScope = input.lifecycleAnswerEnabled ? formatLifecycleScope(unit) : null;
|
||||
const lifecycleInterpretation = input.lifecycleAnswerEnabled ? unit.business_lifecycle_interpretation : null;
|
||||
const lifecycleInterpretation = input.lifecycleAnswerEnabled && unit.business_lifecycle_interpretation
|
||||
? sanitizeUserText(unit.business_lifecycle_interpretation)
|
||||
: null;
|
||||
const title = sanitizeUserText(unit.title) ?? "Problem cluster detected";
|
||||
const defect = sanitizeUserText(unit.business_defect_class) ?? "detected_issue";
|
||||
const segments = [
|
||||
`${unit.title}: ${unit.business_defect_class}`,
|
||||
`${title}: ${defect}`,
|
||||
scope,
|
||||
lifecycleScope,
|
||||
lifecycleInterpretation,
|
||||
|
|
@ -762,9 +855,9 @@ function buildProblemCentricDirectAnswer(input) {
|
|||
return `- ${segments.join("; ")}.`;
|
||||
});
|
||||
if (unitLines.length === 0) {
|
||||
return `${lead}\nПроблемные кластеры не удалось детализировать в текущем срезе.`;
|
||||
return `${lead}\nПроблемные кластеры не удалось детализировать в текущем срезе.`;
|
||||
}
|
||||
return [lead, "Проблемные кластеры:", ...unitLines].join("\n");
|
||||
return [lead, "Проблемные кластеры:", ...unitLines].join("\n");
|
||||
}
|
||||
function buildProblemCentricAnswerStructure(input) {
|
||||
const weakUnits = input.selectedUnits.every((item) => item.confidence.grade === "low");
|
||||
|
|
@ -1098,20 +1191,20 @@ function composeExplainableAnswer(input, scopeLabel) {
|
|||
const limitations = uniqueStrings([...extractLimitations(input.retrievalResults), ...input.groundingCheck.reasons]);
|
||||
const nextSteps = suggestNextStep(input.requirements, input.coverageReport);
|
||||
const lead = scopeLabel === "full"
|
||||
? "Итог: запрос обработан по предмету, найденные объекты подтверждены данными контура."
|
||||
: "Итог: запрос обработан частично, ниже подтвержденная часть и ограничения.";
|
||||
return [
|
||||
? "Ртог: запрос обработан РїРѕ предмету, найденные объекты подтверждены данными контура."
|
||||
: "Ртог: запрос обработан частично, РЅРёР¶Рµ подтвержденная часть Рё ограничения.";
|
||||
return sanitizeUserFacingReply([
|
||||
lead,
|
||||
facts.length > 0 ? "Подтвержденные результаты:\n" + formatList(facts) : "",
|
||||
whyIncluded.length > 0 ? "Почему это попало в ответ:\n" + formatList(whyIncluded) : "",
|
||||
selectionReasons.length > 0 ? "Основание отбора:\n" + formatList(selectionReasons) : "",
|
||||
riskFactors.length > 0 ? "Подтверждающие признаки:\n" + formatList(riskFactors) : "",
|
||||
interpretation.length > 0 ? "Практический смысл:\n" + formatList(interpretation) : "",
|
||||
limitations.length > 0 ? "Ограничения:\n" + formatList(limitations) : "",
|
||||
nextSteps.length > 0 ? "Что проверить дальше:\n" + formatList(nextSteps) : ""
|
||||
facts.length > 0 ? "Подтвержденные результаты:\n" + formatList(facts) : "",
|
||||
whyIncluded.length > 0 ? "Почему это попало в ответ:\n" + formatList(whyIncluded) : "",
|
||||
selectionReasons.length > 0 ? "Основание отбора:\n" + formatList(selectionReasons) : "",
|
||||
riskFactors.length > 0 ? "Подтверждающие признаки:\n" + formatList(riskFactors) : "",
|
||||
interpretation.length > 0 ? "Практический смысл:\n" + formatList(interpretation) : "",
|
||||
limitations.length > 0 ? "Ограничения:\n" + formatList(limitations) : "",
|
||||
nextSteps.length > 0 ? "Что проверить дальше:\n" + formatList(nextSteps) : ""
|
||||
]
|
||||
.filter(Boolean)
|
||||
.join("\n\n");
|
||||
.join("\n\n"));
|
||||
}
|
||||
function composeAssistantAnswer(input) {
|
||||
if (input.enableAnswerPolicyV11) {
|
||||
|
|
@ -1122,13 +1215,15 @@ function composeAssistantAnswer(input) {
|
|||
const partialResults = input.retrievalResults.filter((item) => item.status === "partial");
|
||||
const emptyResults = input.retrievalResults.filter((item) => item.status === "empty");
|
||||
const errorResults = input.retrievalResults.filter((item) => item.status === "error");
|
||||
const legacyEvidenceItems = flattenEvidence(input.retrievalResults);
|
||||
const legacyLimitationReasonCodes = collectLimitationReasonCodes(legacyEvidenceItems);
|
||||
const hasBroadMinimumEvidenceSignal = input.retrievalResults.some((item) => summaryBoolean(item, "broad_guard_applied") && summaryBoolean(item, "minimum_evidence_failed"));
|
||||
const hasBroadClarificationSignal = input.retrievalResults.some((item) => summaryBoolean(item, "broad_guard_applied") &&
|
||||
summaryBoolean(item, "minimum_evidence_failed") &&
|
||||
summaryString(item, "degraded_to") === "clarification");
|
||||
if (fallbackType === "out_of_scope" && input.coverageReport.requirements_covered === 0) {
|
||||
return {
|
||||
assistant_reply: "Я могу отвечать только по данным вашей учетной базы. Этот запрос выходит за рамки доступного контура.",
|
||||
assistant_reply: "РЇ РјРѕРіСѓ отвечать только РїРѕ данным вашей учетной базы. Ртот запрос выходит Р·Р° рамки доступного контура.",
|
||||
fallback_type: "out_of_scope",
|
||||
reply_type: "out_of_scope"
|
||||
};
|
||||
|
|
@ -1136,8 +1231,8 @@ function composeAssistantAnswer(input) {
|
|||
if (input.groundingCheck.status === "route_mismatch_blocked") {
|
||||
return {
|
||||
assistant_reply: [
|
||||
"Не отправляю финальный ответ, потому что предмет результата не совпал с предметом вопроса.",
|
||||
"Уточните формулировку (например, нужный счет/участок учета), и я выполню повторный проход."
|
||||
"Не отправляю финальный ответ, потому что предмет результата не совпал с предметом вопроса.",
|
||||
"Уточните формулировку (например, нужный счет/участок учета), и я выполню повторный проход."
|
||||
].join("\n\n"),
|
||||
fallback_type: "partial",
|
||||
reply_type: "route_mismatch_blocked"
|
||||
|
|
@ -1145,28 +1240,28 @@ function composeAssistantAnswer(input) {
|
|||
}
|
||||
if (input.groundingCheck.status === "no_grounded_answer" && okResults.length === 0 && !hasBroadMinimumEvidenceSignal) {
|
||||
return {
|
||||
assistant_reply: "Пока не удалось собрать предметно подтвержденный ответ по вашему вопросу. Нужны дополнительные уточнения по периоду или объекту проверки.",
|
||||
assistant_reply: "Пока не удалось собрать предметно подтвержденный ответ по вашему вопросу. Нужны дополнительные уточнения по периоду или объекту проверки.",
|
||||
fallback_type: fallbackType,
|
||||
reply_type: "no_grounded_answer"
|
||||
};
|
||||
}
|
||||
if (hasBroadClarificationSignal && okResults.length === 0 && partialResults.length === 0) {
|
||||
return {
|
||||
assistant_reply: "Запрос слишком широкий для надежного вывода по текущей опоре. Уточните период, участок учета или объект проверки, после чего я дам предметный результат.",
|
||||
assistant_reply: "Запрос слишком широкий для надежного вывода по текущей опоре. Уточните период, участок учета или объект проверки, после чего я дам предметный результат.",
|
||||
fallback_type: "clarification",
|
||||
reply_type: "clarification_required"
|
||||
};
|
||||
}
|
||||
if (fallbackType === "clarification" && okResults.length === 0 && partialResults.length === 0) {
|
||||
return {
|
||||
assistant_reply: "Уточните, пожалуйста, период, счет, документ или контрагента, чтобы закрыть все части вопроса корректно.",
|
||||
assistant_reply: "Уточните, пожалуйста, период, счет, документ или контрагента, чтобы закрыть все части вопроса корректно.",
|
||||
fallback_type: "clarification",
|
||||
reply_type: "clarification_required"
|
||||
};
|
||||
}
|
||||
if (errorResults.length > 0 && okResults.length === 0 && partialResults.length === 0) {
|
||||
return {
|
||||
assistant_reply: "Не удалось получить данные из контура. Попробуйте повторить запрос или уточнить формулировку.",
|
||||
assistant_reply: "Не удалось получить данные из контура. Попробуйте повторить запрос или уточнить формулировку.",
|
||||
fallback_type: fallbackType,
|
||||
reply_type: "backend_error"
|
||||
};
|
||||
|
|
@ -1180,7 +1275,7 @@ function composeAssistantAnswer(input) {
|
|||
}
|
||||
if (okResults.length === 0 && partialResults.length === 0 && emptyResults.length > 0) {
|
||||
return {
|
||||
assistant_reply: "По заданному условию в текущем срезе данных явных проблемных записей не найдено.",
|
||||
assistant_reply: "По заданному условию в текущем срезе данных явных проблемных записей не найдено.",
|
||||
fallback_type: fallbackType,
|
||||
reply_type: "empty_but_valid"
|
||||
};
|
||||
|
|
@ -1190,7 +1285,9 @@ function composeAssistantAnswer(input) {
|
|||
input.coverageReport.clarification_needed_for.length > 0 ||
|
||||
input.coverageReport.out_of_scope_requirements.length > 0 ||
|
||||
input.groundingCheck.status === "partial" ||
|
||||
errorResults.length > 0;
|
||||
errorResults.length > 0 ||
|
||||
legacyLimitationReasonCodes.includes("weak_source_mapping") ||
|
||||
legacyLimitationReasonCodes.includes("missing_mechanism");
|
||||
if (okResults.length > 0 && hasPartialCoverage) {
|
||||
return {
|
||||
assistant_reply: composeExplainableAnswer(input, "partial"),
|
||||
|
|
@ -1206,7 +1303,7 @@ function composeAssistantAnswer(input) {
|
|||
};
|
||||
}
|
||||
return {
|
||||
assistant_reply: "По текущему запросу не удалось построить обоснованный ответ. Уточните формулировку и попробуйте снова.",
|
||||
assistant_reply: "По текущему запросу не удалось построить обоснованный ответ. Уточните формулировку и попробуйте снова.",
|
||||
fallback_type: "unknown",
|
||||
reply_type: "backend_error"
|
||||
};
|
||||
|
|
|
|||
|
|
@ -917,10 +917,10 @@ class AssistantDataLayer {
|
|||
result = this.executeRisk(fragmentText, data);
|
||||
}
|
||||
else if (route === "batch_refresh_then_store") {
|
||||
result = this.executeBatch(data);
|
||||
result = this.executeBatch(fragmentText, data);
|
||||
}
|
||||
else if (route === "store_canonical") {
|
||||
result = this.executeCanonical(data);
|
||||
result = this.executeCanonical(fragmentText, data);
|
||||
}
|
||||
else if (route === "live_mcp_drilldown") {
|
||||
result = this.executeDrilldown(fragmentText, data);
|
||||
|
|
@ -1207,7 +1207,9 @@ class AssistantDataLayer {
|
|||
errors: []
|
||||
};
|
||||
}
|
||||
executeRisk(_fragmentText, data) {
|
||||
executeRisk(fragmentText, data) {
|
||||
const semanticProfile = buildSemanticRetrievalProfile(fragmentText);
|
||||
const profileRiskFactors = semanticProfile.anomaly_patterns;
|
||||
const records = [...data.problemCases, ...data.ndsRegisters];
|
||||
const scored = records
|
||||
.map((record) => {
|
||||
|
|
@ -1258,12 +1260,15 @@ class AssistantDataLayer {
|
|||
items: [],
|
||||
summary: {
|
||||
checked_records: records.length,
|
||||
risky_records: 0
|
||||
risky_records: 0,
|
||||
query_subject: semanticProfile.query_subject,
|
||||
semantic_profile: semanticProfile,
|
||||
ranking_basis: semanticProfile.ranking_basis
|
||||
},
|
||||
evidence: [],
|
||||
why_included: [],
|
||||
selection_reason: ["Риск-оценка выполнялась по техническим признакам, но записи выше порога не найдены."],
|
||||
risk_factors: [],
|
||||
risk_factors: profileRiskFactors,
|
||||
business_interpretation: ["По текущему срезу явные риск-признаки не обнаружены."],
|
||||
confidence: "medium",
|
||||
limitations: ["Оценка основана на snapshot-данных и эвристическом risk score."],
|
||||
|
|
@ -1271,6 +1276,13 @@ class AssistantDataLayer {
|
|||
};
|
||||
}
|
||||
const averageScore = items.reduce((acc, item) => acc + item.risk_score, 0) / items.length;
|
||||
const normalizedRiskFactors = uniqueStrings([
|
||||
...profileRiskFactors,
|
||||
"unknown_link_count",
|
||||
"zero_guid_values",
|
||||
"navigation_links",
|
||||
"missing_counterparty_link"
|
||||
]);
|
||||
return {
|
||||
status: "ok",
|
||||
result_type: "list",
|
||||
|
|
@ -1278,7 +1290,10 @@ class AssistantDataLayer {
|
|||
summary: {
|
||||
checked_records: records.length,
|
||||
risky_records: items.length,
|
||||
average_risk_score: Number(averageScore.toFixed(2))
|
||||
average_risk_score: Number(averageScore.toFixed(2)),
|
||||
query_subject: semanticProfile.query_subject,
|
||||
semantic_profile: semanticProfile,
|
||||
ranking_basis: semanticProfile.ranking_basis
|
||||
},
|
||||
evidence: items.slice(0, 10).map((item) => ({
|
||||
source_entity: item.source_entity,
|
||||
|
|
@ -1287,21 +1302,18 @@ class AssistantDataLayer {
|
|||
})),
|
||||
why_included: ["В ответ включены записи с risk_score >= 2."],
|
||||
selection_reason: [
|
||||
"score растет при unknown links, zero GUID, навигационных ссылках и отсутствии явного контрагента."
|
||||
],
|
||||
risk_factors: [
|
||||
"unknown_link_count",
|
||||
"zero_guid_values",
|
||||
"navigation_links",
|
||||
"missing_counterparty_link"
|
||||
"score растет при unknown links, zero GUID, навигационных ссылках и отсутствии явного контрагента.",
|
||||
`Semantic profile subject: ${semanticProfile.query_subject}.`
|
||||
],
|
||||
risk_factors: normalizedRiskFactors,
|
||||
business_interpretation: ["Рти записи требуют первичной бухгалтерской проверки как потенциальные аномалии."],
|
||||
confidence: "high",
|
||||
limitations: ["Риск-факторы определяются эвристикой, а не полным набором бизнес-правил 1С."],
|
||||
errors: []
|
||||
};
|
||||
}
|
||||
executeBatch(data) {
|
||||
executeBatch(fragmentText, data) {
|
||||
const semanticProfile = buildSemanticRetrievalProfile(fragmentText);
|
||||
const source = [...data.problemCases, ...data.keyFields, ...data.docs];
|
||||
const byEntity = new Map();
|
||||
for (const record of source) {
|
||||
|
|
@ -1321,7 +1333,10 @@ class AssistantDataLayer {
|
|||
items,
|
||||
summary: {
|
||||
checked_records: source.length,
|
||||
ranked_entities: items.length
|
||||
ranked_entities: items.length,
|
||||
query_subject: semanticProfile.query_subject,
|
||||
semantic_profile: semanticProfile,
|
||||
ranking_basis: semanticProfile.ranking_basis
|
||||
},
|
||||
evidence: items.slice(0, 5).map((item) => ({
|
||||
entity: item.entity,
|
||||
|
|
@ -1329,17 +1344,20 @@ class AssistantDataLayer {
|
|||
})),
|
||||
why_included: items.length > 0 ? ["Показаны сущности с максимальным количеством записей."] : [],
|
||||
selection_reason: ["Ранжирование выполнено по records_count по убыванию."],
|
||||
risk_factors: ["Высокий объем записей по сущности повышает приоритет проверки."],
|
||||
risk_factors: uniqueStrings(["entity_volume_spike", ...semanticProfile.anomaly_patterns]),
|
||||
business_interpretation: [
|
||||
"Сущности в топе ранга чаще дают наибольший вклад в проблемный объем и требуют приоритетного аудита."
|
||||
"Top entities by volume highlight where lifecycle-focused review should start first."
|
||||
],
|
||||
confidence: "medium",
|
||||
limitations: ["Ранжирование по объему не всегда эквивалентно бизнес-риску."],
|
||||
errors: []
|
||||
};
|
||||
}
|
||||
executeCanonical(data) {
|
||||
const items = data.docs
|
||||
executeCanonical(fragmentText, data) {
|
||||
const semanticProfile = buildSemanticRetrievalProfile(fragmentText);
|
||||
const useVatSource = semanticProfile.domain_scope.includes("vat") || semanticProfile.domain_scope.includes("taxes");
|
||||
const sourceRecords = useVatSource ? [...data.ndsRegisters, ...data.keyFields] : data.docs;
|
||||
const items = sourceRecords
|
||||
.map((record) => {
|
||||
const period = extractDate(record);
|
||||
return {
|
||||
|
|
@ -1360,8 +1378,11 @@ class AssistantDataLayer {
|
|||
result_type: "list",
|
||||
items,
|
||||
summary: {
|
||||
checked_records: data.docs.length,
|
||||
returned_records: items.length
|
||||
checked_records: sourceRecords.length,
|
||||
returned_records: items.length,
|
||||
query_subject: semanticProfile.query_subject,
|
||||
semantic_profile: semanticProfile,
|
||||
ranking_basis: semanticProfile.ranking_basis
|
||||
},
|
||||
evidence: items.slice(0, 6).map((item) => ({
|
||||
source_entity: item.source_entity,
|
||||
|
|
@ -1369,8 +1390,11 @@ class AssistantDataLayer {
|
|||
period: item.period
|
||||
})),
|
||||
why_included: items.length > 0 ? ["Показаны последние по дате записи канонического документного слоя."] : [],
|
||||
selection_reason: ["Отбор по максимальной дате документа в пределах snapshot."],
|
||||
risk_factors: [],
|
||||
selection_reason: [
|
||||
"Отбор по максимальной дате документа в пределах snapshot.",
|
||||
`Semantic profile subject: ${semanticProfile.query_subject}.`
|
||||
],
|
||||
risk_factors: semanticProfile.anomaly_patterns,
|
||||
business_interpretation: ["Слой отражает базовый factual-срез документов для оперативной сверки."],
|
||||
confidence: "high",
|
||||
limitations: ["Рто read-only snapshot, Р° РЅРµ онлайн-состояние 1РЎ."],
|
||||
|
|
|
|||
|
|
@ -32,18 +32,92 @@ function includesAny(source, patterns) {
|
|||
function hasToken(values, pattern) {
|
||||
return values.some((value) => pattern.test(value));
|
||||
}
|
||||
function defaultExpectedState(domain) {
|
||||
if (domain === "bank_settlement")
|
||||
return "settlement_closed";
|
||||
if (domain === "customer_settlement")
|
||||
return "receivable_closed";
|
||||
if (domain === "deferred_expense")
|
||||
return "fully_written_off";
|
||||
if (domain === "fixed_asset")
|
||||
return "depreciation_active";
|
||||
if (domain === "vat_flow")
|
||||
return "vat_deducted";
|
||||
return "close_completed";
|
||||
function normalizeStateToken(value) {
|
||||
return value.trim().toLowerCase();
|
||||
}
|
||||
function resolveStateCode(model, stateCode) {
|
||||
if (!stateCode || typeof stateCode !== "string") {
|
||||
return null;
|
||||
}
|
||||
const normalized = normalizeStateToken(stateCode);
|
||||
const matched = model.states.find((state) => normalizeStateToken(state.state_code) === normalized);
|
||||
return matched?.state_code ?? null;
|
||||
}
|
||||
function defaultInitialState(model) {
|
||||
const initial = model.states.find((state) => state.state_class === "initial");
|
||||
if (initial) {
|
||||
return initial.state_code;
|
||||
}
|
||||
return model.states[0]?.state_code ?? "unknown_state";
|
||||
}
|
||||
function defaultExpectedState(model) {
|
||||
const terminal = model.states.find((state) => state.is_terminal || state.state_class === "terminal");
|
||||
if (terminal) {
|
||||
return terminal.state_code;
|
||||
}
|
||||
const active = model.states.find((state) => state.state_class === "active");
|
||||
if (active) {
|
||||
return active.state_code;
|
||||
}
|
||||
return defaultInitialState(model);
|
||||
}
|
||||
function expectedTransitionAdjacency(model) {
|
||||
const graph = new Map();
|
||||
for (const transition of model.transitions) {
|
||||
if (transition.transition_type !== "expected") {
|
||||
continue;
|
||||
}
|
||||
const from = transition.from_state;
|
||||
const to = transition.to_state;
|
||||
const current = graph.get(from) ?? [];
|
||||
if (!current.includes(to)) {
|
||||
current.push(to);
|
||||
}
|
||||
graph.set(from, current);
|
||||
}
|
||||
return graph;
|
||||
}
|
||||
function shortestExpectedPath(model, fromState, toState) {
|
||||
if (fromState === toState) {
|
||||
return [fromState];
|
||||
}
|
||||
const graph = expectedTransitionAdjacency(model);
|
||||
const queue = [[fromState]];
|
||||
const visited = new Set([fromState]);
|
||||
while (queue.length > 0) {
|
||||
const path = queue.shift();
|
||||
if (!path) {
|
||||
continue;
|
||||
}
|
||||
const tail = path[path.length - 1];
|
||||
const nextStates = graph.get(tail) ?? [];
|
||||
for (const nextState of nextStates) {
|
||||
if (visited.has(nextState)) {
|
||||
continue;
|
||||
}
|
||||
const nextPath = [...path, nextState];
|
||||
if (nextState === toState) {
|
||||
return nextPath;
|
||||
}
|
||||
visited.add(nextState);
|
||||
queue.push(nextPath);
|
||||
}
|
||||
}
|
||||
return null;
|
||||
}
|
||||
function transitionEdgeLabel(fromState, toState) {
|
||||
return `${fromState}->${toState}`;
|
||||
}
|
||||
function resolvePreviousStates(model, currentState) {
|
||||
const initialState = defaultInitialState(model);
|
||||
if (initialState === currentState) {
|
||||
return [];
|
||||
}
|
||||
const path = shortestExpectedPath(model, initialState, currentState);
|
||||
if (!path || path.length <= 1) {
|
||||
return [];
|
||||
}
|
||||
return path.slice(0, -1);
|
||||
}
|
||||
const LIFECYCLE_DOMAIN_MODELS = {
|
||||
bank_settlement: {
|
||||
|
|
@ -53,53 +127,53 @@ const LIFECYCLE_DOMAIN_MODELS = {
|
|||
states: [
|
||||
{
|
||||
state_code: "initiated_payment",
|
||||
state_label: "Платеж инициирован",
|
||||
state_label: "Платеж инициирован",
|
||||
state_class: "initial",
|
||||
entry_conditions: ["payment_order_created"],
|
||||
exit_conditions: ["bank_recorded"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Есть инициирование платежа."
|
||||
business_meaning: "Есть инициирование платежа."
|
||||
},
|
||||
{
|
||||
state_code: "bank_recorded",
|
||||
state_label: "Платеж отражен банком",
|
||||
state_label: "Платеж отражен банком",
|
||||
state_class: "active",
|
||||
entry_conditions: ["bank_statement_recorded"],
|
||||
exit_conditions: ["settlement_linked"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Движение денег зафиксировано, ожидается расчетное закрытие."
|
||||
business_meaning: "Движение денег зафиксировано, ожидается расчетное закрытие."
|
||||
},
|
||||
{
|
||||
state_code: "settlement_closed",
|
||||
state_label: "Расчет закрыт",
|
||||
state_label: "Расчет закрыт",
|
||||
state_class: "terminal",
|
||||
entry_conditions: ["payment_to_settlement_linked"],
|
||||
exit_conditions: [],
|
||||
is_terminal: true,
|
||||
is_problematic: false,
|
||||
business_meaning: "Платеж доведен до расчетного результата."
|
||||
business_meaning: "Платеж доведен до расчетного результата."
|
||||
},
|
||||
{
|
||||
state_code: "stale_unlinked_payment",
|
||||
state_label: "Платеж завис без закрытия",
|
||||
state_label: "Платеж завис без закрытия",
|
||||
state_class: "problematic",
|
||||
entry_conditions: ["bank_recorded", "missing_link"],
|
||||
exit_conditions: ["settlement_closed"],
|
||||
is_terminal: false,
|
||||
is_problematic: true,
|
||||
business_meaning: "Платеж отражен, но ожидаемая связь по расчету не завершена."
|
||||
business_meaning: "Платеж отражен, но ожидаемая связь по расчету не завершена."
|
||||
},
|
||||
{
|
||||
state_code: "misclosed_payment",
|
||||
state_label: "Платеж закрыт некорректно",
|
||||
state_label: "Платеж закрыт некорректно",
|
||||
state_class: "problematic",
|
||||
entry_conditions: ["wrong_document_type_or_posting_mismatch"],
|
||||
exit_conditions: ["settlement_closed"],
|
||||
is_terminal: false,
|
||||
is_problematic: true,
|
||||
business_meaning: "Формальное закрытие есть, но путь закрытия неверный."
|
||||
business_meaning: "Формальное закрытие есть, но путь закрытия неверный."
|
||||
}
|
||||
],
|
||||
transitions: [
|
||||
|
|
@ -110,7 +184,7 @@ const LIFECYCLE_DOMAIN_MODELS = {
|
|||
required_evidence: ["bank_statement_recorded"],
|
||||
optional_evidence: ["payment_order"],
|
||||
forbidden_conditions: [],
|
||||
business_meaning: "Платеж должен появиться во выписке."
|
||||
business_meaning: "Платеж должен появиться во выписке."
|
||||
},
|
||||
{
|
||||
from_state: "bank_recorded",
|
||||
|
|
@ -119,7 +193,7 @@ const LIFECYCLE_DOMAIN_MODELS = {
|
|||
required_evidence: ["payment_to_settlement_link"],
|
||||
optional_evidence: ["document_to_posting"],
|
||||
forbidden_conditions: ["wrong_document_type"],
|
||||
business_meaning: "После выписки должен закрываться расчет."
|
||||
business_meaning: "После выписки должен закрываться расчет."
|
||||
}
|
||||
],
|
||||
defects: []
|
||||
|
|
@ -131,43 +205,43 @@ const LIFECYCLE_DOMAIN_MODELS = {
|
|||
states: [
|
||||
{
|
||||
state_code: "invoice_issued",
|
||||
state_label: "Реализация отражена",
|
||||
state_label: "Реализация отражена",
|
||||
state_class: "initial",
|
||||
entry_conditions: ["realization_document_exists"],
|
||||
exit_conditions: ["payment_recorded"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Возникла дебиторская позиция."
|
||||
business_meaning: "Возникла дебиторская позиция."
|
||||
},
|
||||
{
|
||||
state_code: "payment_recorded",
|
||||
state_label: "Оплата отражена",
|
||||
state_label: "Оплата отражена",
|
||||
state_class: "active",
|
||||
entry_conditions: ["payment_document_exists"],
|
||||
exit_conditions: ["receivable_closed"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Оплата есть, ожидается корректное закрытие."
|
||||
business_meaning: "Оплата есть, ожидается корректное закрытие."
|
||||
},
|
||||
{
|
||||
state_code: "receivable_closed",
|
||||
state_label: "Дебиторка закрыта",
|
||||
state_label: "Дебиторка закрыта",
|
||||
state_class: "terminal",
|
||||
entry_conditions: ["closing_document_linked"],
|
||||
exit_conditions: [],
|
||||
is_terminal: true,
|
||||
is_problematic: false,
|
||||
business_meaning: "Дебиторская позиция закрыта корректно."
|
||||
business_meaning: "Дебиторская позиция закрыта корректно."
|
||||
},
|
||||
{
|
||||
state_code: "stale_receivable",
|
||||
state_label: "Дебиторка зависла",
|
||||
state_label: "Дебиторка зависла",
|
||||
state_class: "problematic",
|
||||
entry_conditions: ["unresolved_settlement"],
|
||||
exit_conditions: ["receivable_closed"],
|
||||
is_terminal: false,
|
||||
is_problematic: true,
|
||||
business_meaning: "Позиция остается незавершенной дольше ожидаемого."
|
||||
business_meaning: "Позиция остается незавершенной дольше ожидаемого."
|
||||
}
|
||||
],
|
||||
transitions: [
|
||||
|
|
@ -178,7 +252,7 @@ const LIFECYCLE_DOMAIN_MODELS = {
|
|||
required_evidence: ["payment_document_exists"],
|
||||
optional_evidence: [],
|
||||
forbidden_conditions: [],
|
||||
business_meaning: "После реализации ожидается оплата/зачет."
|
||||
business_meaning: "После реализации ожидается оплата/зачет."
|
||||
},
|
||||
{
|
||||
from_state: "payment_recorded",
|
||||
|
|
@ -187,7 +261,7 @@ const LIFECYCLE_DOMAIN_MODELS = {
|
|||
required_evidence: ["closing_document_linked"],
|
||||
optional_evidence: ["register_movement_exists"],
|
||||
forbidden_conditions: ["cross_branch_inconsistency"],
|
||||
business_meaning: "Оплата должна завершаться корректным закрытием расчета."
|
||||
business_meaning: "Оплата должна завершаться корректным закрытием расчета."
|
||||
}
|
||||
],
|
||||
defects: []
|
||||
|
|
@ -199,43 +273,43 @@ const LIFECYCLE_DOMAIN_MODELS = {
|
|||
states: [
|
||||
{
|
||||
state_code: "recognized",
|
||||
state_label: "РБП признан",
|
||||
state_label: "РБП признан",
|
||||
state_class: "initial",
|
||||
entry_conditions: ["deferred_expense_created"],
|
||||
exit_conditions: ["writeoff_started"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "РБП поставлен на учет."
|
||||
business_meaning: "РБП поставлен на учет."
|
||||
},
|
||||
{
|
||||
state_code: "partially_written_off",
|
||||
state_label: "Частичное списание",
|
||||
state_label: "Частичное списание",
|
||||
state_class: "active",
|
||||
entry_conditions: ["partial_writeoff_exists"],
|
||||
exit_conditions: ["fully_written_off"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Списание идет по графику."
|
||||
business_meaning: "Списание идет по графику."
|
||||
},
|
||||
{
|
||||
state_code: "fully_written_off",
|
||||
state_label: "РБП полностью списан",
|
||||
state_label: "РБП полностью списан",
|
||||
state_class: "terminal",
|
||||
entry_conditions: ["full_writeoff_exists"],
|
||||
exit_conditions: [],
|
||||
is_terminal: true,
|
||||
is_problematic: false,
|
||||
business_meaning: "РБП завершил lifecycle."
|
||||
business_meaning: "РБП завершил lifecycle."
|
||||
},
|
||||
{
|
||||
state_code: "overdue_writeoff",
|
||||
state_label: "Просроченное списание",
|
||||
state_label: "Просроченное списание",
|
||||
state_class: "problematic",
|
||||
entry_conditions: ["period_boundary", "missing_link"],
|
||||
exit_conditions: ["fully_written_off"],
|
||||
is_terminal: false,
|
||||
is_problematic: true,
|
||||
business_meaning: "РБП живет дольше допустимого окна."
|
||||
business_meaning: "РБП живет дольше допустимого окна."
|
||||
}
|
||||
],
|
||||
transitions: [],
|
||||
|
|
@ -248,53 +322,53 @@ const LIFECYCLE_DOMAIN_MODELS = {
|
|||
states: [
|
||||
{
|
||||
state_code: "capitalized",
|
||||
state_label: "Капвложения отражены",
|
||||
state_label: "Капвложения отражены",
|
||||
state_class: "initial",
|
||||
entry_conditions: ["capitalization_document_exists"],
|
||||
exit_conditions: ["accepted_for_accounting"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Объект зафиксирован как вложение."
|
||||
business_meaning: "Объект зафиксирован как вложение."
|
||||
},
|
||||
{
|
||||
state_code: "accepted_for_accounting",
|
||||
state_label: "Принят к учету",
|
||||
state_label: "Принят к учету",
|
||||
state_class: "active",
|
||||
entry_conditions: ["acceptance_document_exists"],
|
||||
exit_conditions: ["depreciation_active"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Объект переведен в основной контур учета."
|
||||
business_meaning: "Объект переведен в основной контур учета."
|
||||
},
|
||||
{
|
||||
state_code: "depreciation_active",
|
||||
state_label: "Амортизация активна",
|
||||
state_label: "Амортизация активна",
|
||||
state_class: "active",
|
||||
entry_conditions: ["depreciation_register_movement"],
|
||||
exit_conditions: ["disposed"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Жизненный цикл ОС идет штатно."
|
||||
business_meaning: "Жизненный цикл ОС идет штатно."
|
||||
},
|
||||
{
|
||||
state_code: "contradictory_asset_state",
|
||||
state_label: "Противоречивый статус ОС",
|
||||
state_label: "Противоречивый статус ОС",
|
||||
state_class: "problematic",
|
||||
entry_conditions: ["posting_mismatch_or_wrong_path"],
|
||||
exit_conditions: ["depreciation_active"],
|
||||
is_terminal: false,
|
||||
is_problematic: true,
|
||||
business_meaning: "Статус ОС формально есть, но смыслово противоречив."
|
||||
business_meaning: "Статус ОС формально есть, но смыслово противоречив."
|
||||
},
|
||||
{
|
||||
state_code: "disposed",
|
||||
state_label: "Выбыл",
|
||||
state_label: "Выбыл",
|
||||
state_class: "terminal",
|
||||
entry_conditions: ["disposal_document_exists"],
|
||||
exit_conditions: [],
|
||||
is_terminal: true,
|
||||
is_problematic: false,
|
||||
business_meaning: "Жизненный цикл ОС завершен."
|
||||
business_meaning: "Жизненный цикл ОС завершен."
|
||||
}
|
||||
],
|
||||
transitions: [],
|
||||
|
|
@ -307,43 +381,43 @@ const LIFECYCLE_DOMAIN_MODELS = {
|
|||
states: [
|
||||
{
|
||||
state_code: "vat_registered",
|
||||
state_label: "НДС отражен документно",
|
||||
state_label: "НДС отражен документно",
|
||||
state_class: "initial",
|
||||
entry_conditions: ["invoice_registered"],
|
||||
exit_conditions: ["vat_reflected"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Сформирован первичный документный слой НДС."
|
||||
business_meaning: "Сформирован первичный документный слой НДС."
|
||||
},
|
||||
{
|
||||
state_code: "vat_reflected",
|
||||
state_label: "НДС отражен в учете",
|
||||
state_label: "НДС отражен в учете",
|
||||
state_class: "active",
|
||||
entry_conditions: ["vat_register_movement"],
|
||||
exit_conditions: ["vat_deducted"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "НДС проходит штатную стадию отражения."
|
||||
business_meaning: "НДС проходит штатную стадию отражения."
|
||||
},
|
||||
{
|
||||
state_code: "vat_deducted",
|
||||
state_label: "НДС принят к вычету",
|
||||
state_label: "НДС принят к вычету",
|
||||
state_class: "terminal",
|
||||
entry_conditions: ["deduction_confirmed"],
|
||||
exit_conditions: [],
|
||||
is_terminal: true,
|
||||
is_problematic: false,
|
||||
business_meaning: "НДС-цепочка завершена корректно."
|
||||
business_meaning: "НДС-цепочка завершена корректно."
|
||||
},
|
||||
{
|
||||
state_code: "vat_conflict",
|
||||
state_label: "Конфликт НДС-цепочки",
|
||||
state_label: "Конфликт НДС-цепочки",
|
||||
state_class: "problematic",
|
||||
entry_conditions: ["cross_branch_inconsistency"],
|
||||
exit_conditions: ["vat_reflected"],
|
||||
is_terminal: false,
|
||||
is_problematic: true,
|
||||
business_meaning: "Бухгалтерская и налоговая ветки расходятся."
|
||||
business_meaning: "Бухгалтерская и налоговая ветки расходятся."
|
||||
}
|
||||
],
|
||||
transitions: [],
|
||||
|
|
@ -356,53 +430,53 @@ const LIFECYCLE_DOMAIN_MODELS = {
|
|||
states: [
|
||||
{
|
||||
state_code: "preclose_checks",
|
||||
state_label: "Предзакрытие",
|
||||
state_label: "Предзакрытие",
|
||||
state_class: "active",
|
||||
entry_conditions: ["period_scope_detected"],
|
||||
exit_conditions: ["close_ready"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Идет проверка готовности периода."
|
||||
business_meaning: "Рдет проверка готовности периода."
|
||||
},
|
||||
{
|
||||
state_code: "close_ready",
|
||||
state_label: "Готов к закрытию",
|
||||
state_label: "Готов к закрытию",
|
||||
state_class: "active",
|
||||
entry_conditions: ["no_blockers_detected"],
|
||||
exit_conditions: ["close_completed"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Период может быть закрыт."
|
||||
business_meaning: "Период может быть закрыт."
|
||||
},
|
||||
{
|
||||
state_code: "close_completed",
|
||||
state_label: "Закрытие завершено",
|
||||
state_label: "Закрытие завершено",
|
||||
state_class: "terminal",
|
||||
entry_conditions: ["close_operation_done"],
|
||||
exit_conditions: [],
|
||||
is_terminal: true,
|
||||
is_problematic: false,
|
||||
business_meaning: "Период закрыт."
|
||||
business_meaning: "Период закрыт."
|
||||
},
|
||||
{
|
||||
state_code: "close_blocked",
|
||||
state_label: "Закрытие заблокировано",
|
||||
state_label: "Закрытие заблокировано",
|
||||
state_class: "problematic",
|
||||
entry_conditions: ["period_close_risk_or_stale_state"],
|
||||
exit_conditions: ["close_ready"],
|
||||
is_terminal: false,
|
||||
is_problematic: true,
|
||||
business_meaning: "Есть lifecycle-дефекты, влияющие на закрытие."
|
||||
business_meaning: "Есть lifecycle-дефекты, влияющие на закрытие."
|
||||
},
|
||||
{
|
||||
state_code: "close_contradicted",
|
||||
state_label: "Закрыт формально, но с противоречием",
|
||||
state_label: "Закрыт формально, но с противоречием",
|
||||
state_class: "problematic",
|
||||
entry_conditions: ["misclosed_or_cross_branch_conflict"],
|
||||
exit_conditions: ["close_completed"],
|
||||
is_terminal: false,
|
||||
is_problematic: true,
|
||||
business_meaning: "Формальное закрытие не согласовано с фактическими ветками."
|
||||
business_meaning: "Формальное закрытие не согласовано с фактическими ветками."
|
||||
}
|
||||
],
|
||||
transitions: [],
|
||||
|
|
@ -414,7 +488,7 @@ const SHARED_DEFECTS = [
|
|||
defect_code: "missing_expected_transition",
|
||||
defect_class: "path",
|
||||
severity_hint: "medium",
|
||||
business_meaning: "Ожидаемый переход не произошел.",
|
||||
business_meaning: "Ожидаемый переход не произошел.",
|
||||
evidence_requirements: ["expected_state", "missing_transition_signal"],
|
||||
period_impact_potential: "indirect"
|
||||
},
|
||||
|
|
@ -422,7 +496,7 @@ const SHARED_DEFECTS = [
|
|||
defect_code: "invalid_transition",
|
||||
defect_class: "path",
|
||||
severity_hint: "high",
|
||||
business_meaning: "Переход произошел по некорректному пути.",
|
||||
business_meaning: "Переход произошел по некорректному пути.",
|
||||
evidence_requirements: ["invalid_transition_signal"],
|
||||
period_impact_potential: "indirect"
|
||||
},
|
||||
|
|
@ -430,7 +504,7 @@ const SHARED_DEFECTS = [
|
|||
defect_code: "stale_active_state",
|
||||
defect_class: "timing",
|
||||
severity_hint: "high",
|
||||
business_meaning: "Объект завис в активном состоянии.",
|
||||
business_meaning: "Объект завис в активном состоянии.",
|
||||
evidence_requirements: ["stale_marker", "missing_transition_signal"],
|
||||
period_impact_potential: "direct"
|
||||
},
|
||||
|
|
@ -438,7 +512,7 @@ const SHARED_DEFECTS = [
|
|||
defect_code: "contradictory_state",
|
||||
defect_class: "consistency",
|
||||
severity_hint: "high",
|
||||
business_meaning: "Статусы объекта противоречат друг другу.",
|
||||
business_meaning: "Статусы объекта противоречат друг другу.",
|
||||
evidence_requirements: ["contradiction_signal"],
|
||||
period_impact_potential: "direct"
|
||||
},
|
||||
|
|
@ -446,7 +520,7 @@ const SHARED_DEFECTS = [
|
|||
defect_code: "premature_terminal_state",
|
||||
defect_class: "closure",
|
||||
severity_hint: "medium",
|
||||
business_meaning: "Терминальное состояние наступило преждевременно.",
|
||||
business_meaning: "Терминальное состояние наступило преждевременно.",
|
||||
evidence_requirements: ["terminal_state", "missing_required_previous_state"],
|
||||
period_impact_potential: "indirect"
|
||||
},
|
||||
|
|
@ -454,7 +528,7 @@ const SHARED_DEFECTS = [
|
|||
defect_code: "misclosed_state",
|
||||
defect_class: "closure",
|
||||
severity_hint: "high",
|
||||
business_meaning: "Контур формально закрыт, но закрыт неверно.",
|
||||
business_meaning: "Контур формально закрыт, но закрыт неверно.",
|
||||
evidence_requirements: ["wrong_closure_path"],
|
||||
period_impact_potential: "direct"
|
||||
},
|
||||
|
|
@ -462,7 +536,7 @@ const SHARED_DEFECTS = [
|
|||
defect_code: "orphan_intermediate_state",
|
||||
defect_class: "path",
|
||||
severity_hint: "medium",
|
||||
business_meaning: "Промежуточная стадия осталась без корректного продолжения.",
|
||||
business_meaning: "Промежуточная стадия осталась без корректного продолжения.",
|
||||
evidence_requirements: ["intermediate_state_without_next"],
|
||||
period_impact_potential: "indirect"
|
||||
},
|
||||
|
|
@ -470,7 +544,7 @@ const SHARED_DEFECTS = [
|
|||
defect_code: "cross_branch_state_conflict",
|
||||
defect_class: "consistency",
|
||||
severity_hint: "high",
|
||||
business_meaning: "Состояния соседних веток учета противоречат друг другу.",
|
||||
business_meaning: "Состояния соседних веток учета противоречат друг другу.",
|
||||
evidence_requirements: ["cross_branch_conflict_signal"],
|
||||
period_impact_potential: "direct"
|
||||
}
|
||||
|
|
@ -489,6 +563,19 @@ class LifecycleRegistryImpl {
|
|||
getDomain(domain) {
|
||||
return this.models[domain];
|
||||
}
|
||||
hasState(domain, stateCode) {
|
||||
const model = this.getDomain(domain);
|
||||
return Boolean(resolveStateCode(model, stateCode));
|
||||
}
|
||||
resolveDefaultExpectedState(domain) {
|
||||
return defaultExpectedState(this.getDomain(domain));
|
||||
}
|
||||
resolveInitialState(domain) {
|
||||
return defaultInitialState(this.getDomain(domain));
|
||||
}
|
||||
findExpectedPath(domain, fromState, toState) {
|
||||
return shortestExpectedPath(this.getDomain(domain), fromState, toState);
|
||||
}
|
||||
}
|
||||
exports.LifecycleRegistry = new LifecycleRegistryImpl(LIFECYCLE_DOMAIN_MODELS);
|
||||
function inferLifecycleDomain(input) {
|
||||
|
|
@ -508,28 +595,81 @@ function inferLifecycleDomain(input) {
|
|||
]
|
||||
.join(" ")
|
||||
.toLowerCase();
|
||||
if (includesAny(unitTokens, [/\bnds\b/, /\bvat\b/, /\btax\b/, /cross[_\s-]?branch/, /\b19\b/, /\b68\b/])) {
|
||||
return "vat_flow";
|
||||
}
|
||||
if (includesAny(unitTokens, [/\bperiod\b/, /\bclose\b/, /закрыт/, /reporting/]) || input.unit.problem_unit_type === "period_risk_cluster") {
|
||||
return "period_close";
|
||||
}
|
||||
if (includesAny(unitTokens, [/deferred/, /writeoff/, /рбп/, /\b97\b/])) {
|
||||
const hasVatMarkers = includesAny(unitTokens, [
|
||||
/domain_hint:vat_flow/,
|
||||
/\binvoice_to_vat\b/,
|
||||
/\bvat_chain_conflict\b/,
|
||||
/(^|[^a-z0-9])nds([^a-z0-9]|$)/,
|
||||
/(^|[^a-z0-9])vat([^a-z0-9]|$)/,
|
||||
/(^|[^a-z0-9])tax(?:es)?([^a-z0-9]|$)/,
|
||||
/\baccount[_:\s-]?(19|68)\b/
|
||||
]);
|
||||
const hasDeferredMarkers = includesAny(unitTokens, [
|
||||
/domain_hint:deferred_expense/,
|
||||
/\bdeferred(?:_expense)?\b/,
|
||||
/\bdeferred_expense_to_writeoff\b/,
|
||||
/\bwriteoff\b/,
|
||||
/\bpartially_written_off\b/,
|
||||
/\bfully_written_off\b/,
|
||||
/\baccount[_:\s-]?97\b/
|
||||
]);
|
||||
const hasFixedAssetMarkers = includesAny(unitTokens, [
|
||||
/domain_hint:fixed_asset/,
|
||||
/\bfixed[_\s-]?asset(?:s)?\b/,
|
||||
/\basset_card_to_depreciation\b/,
|
||||
/\bdepreciation(?:_active)?\b/,
|
||||
/\baccepted_for_accounting\b/,
|
||||
/\bcapitalized\b/,
|
||||
/\baccount[_:\s-]?(01|02|08)\b/
|
||||
]);
|
||||
const hasPeriodCloseMarkers = includesAny(unitTokens, [
|
||||
/domain_hint:period_close/,
|
||||
/\bperiod[_\s-]?close\b/,
|
||||
/\bperiod_close_risk\b/,
|
||||
/\bclose[_\s-]?risk\b/,
|
||||
/\bclosure[_\s-]?risk\b/,
|
||||
/\bpreclose\b/,
|
||||
/\bmonth[_\s-]?close\b/,
|
||||
/\bperiod_risk\b/
|
||||
]);
|
||||
if (hasDeferredMarkers) {
|
||||
return "deferred_expense";
|
||||
}
|
||||
if (includesAny(unitTokens, [/fixed[_\s-]?asset/, /амортиз/, /ос\b/, /\b01\b/, /\b02\b/, /\b08\b/])) {
|
||||
if (hasFixedAssetMarkers) {
|
||||
return "fixed_asset";
|
||||
}
|
||||
if (includesAny(unitTokens, [/buyer/, /customer/, /дебитор/, /\b62\b/])) {
|
||||
if (hasVatMarkers) {
|
||||
return "vat_flow";
|
||||
}
|
||||
if (hasPeriodCloseMarkers ||
|
||||
input.unit.problem_unit_type === "period_risk_cluster" ||
|
||||
input.unit.period_impact?.impact_class === "close_risk") {
|
||||
return "period_close";
|
||||
}
|
||||
if (includesAny(unitTokens, [/buyer/, /customer/, /\b62\b/])) {
|
||||
return "customer_settlement";
|
||||
}
|
||||
if (includesAny(unitTokens, [
|
||||
/domain_hint:bank_settlement/,
|
||||
/\bpayment_to_settlement\b/,
|
||||
/\bstatement_to_document\b/,
|
||||
/\bbank_recorded\b/,
|
||||
/\binitiated_payment\b/,
|
||||
/\bsettlement(?:_closed)?\b/
|
||||
]) ||
|
||||
input.unit.problem_unit_type === "unresolved_settlement_cluster" ||
|
||||
input.unit.problem_unit_type === "broken_chain_segment") {
|
||||
return "bank_settlement";
|
||||
}
|
||||
if (input.unit.problem_unit_type === "cross_branch_inconsistency_cluster") {
|
||||
return "vat_flow";
|
||||
}
|
||||
if (input.unit.problem_unit_type === "lifecycle_anomaly_node") {
|
||||
return "deferred_expense";
|
||||
}
|
||||
return "bank_settlement";
|
||||
}
|
||||
function inferCurrentState(domain, input) {
|
||||
const explicitActual = input.unit.actual_state?.trim();
|
||||
if (explicitActual) {
|
||||
return explicitActual;
|
||||
}
|
||||
const anomalies = input.candidates.flatMap((item) => item.anomaly_patterns).map((item) => item.toLowerCase());
|
||||
const relations = input.candidates.flatMap((item) => item.relation_pattern_hits).map((item) => item.toLowerCase());
|
||||
const hasStale = hasToken(anomalies, /(no_continuation|stale|tail|missing_link|broken_lifecycle|partially_linked)/);
|
||||
|
|
@ -562,7 +702,7 @@ function inferCurrentState(domain, input) {
|
|||
return "contradictory_asset_state";
|
||||
if (hasToken(relations, /depreciation|amort/))
|
||||
return "depreciation_active";
|
||||
if (hasToken(relations, /accept|учет/))
|
||||
if (hasToken(relations, /accept|account/))
|
||||
return "accepted_for_accounting";
|
||||
return "capitalized";
|
||||
}
|
||||
|
|
@ -579,25 +719,42 @@ function inferCurrentState(domain, input) {
|
|||
return "close_blocked";
|
||||
return "preclose_checks";
|
||||
}
|
||||
function inferExpectedState(domain, input) {
|
||||
function inferExpectedState(domain, input, model) {
|
||||
const explicitExpected = input.unit.expected_state?.trim();
|
||||
if (explicitExpected) {
|
||||
return explicitExpected;
|
||||
}
|
||||
return defaultExpectedState(domain);
|
||||
return defaultExpectedState(model);
|
||||
}
|
||||
function inferMissingTransition(input) {
|
||||
function inferMissingTransition(input, model, currentState, expectedState) {
|
||||
if (typeof input.unit.failed_expected_edge === "string" && input.unit.failed_expected_edge.trim().length > 0) {
|
||||
return input.unit.failed_expected_edge.trim();
|
||||
}
|
||||
const anomalies = input.candidates.flatMap((item) => item.anomaly_patterns).join(" ").toLowerCase();
|
||||
if (/(missing_link|no_continuation|broken_lifecycle|tail|unresolved)/.test(anomalies)) {
|
||||
return "expected_transition_not_observed";
|
||||
if (!/(missing_link|no_continuation|broken_lifecycle|tail|unresolved)/.test(anomalies)) {
|
||||
return null;
|
||||
}
|
||||
return null;
|
||||
if (currentState !== expectedState) {
|
||||
const path = shortestExpectedPath(model, currentState, expectedState);
|
||||
if (path && path.length >= 2) {
|
||||
return transitionEdgeLabel(path[0], path[1]);
|
||||
}
|
||||
}
|
||||
const directExpected = model.transitions.find((transition) => transition.transition_type === "expected" && transition.from_state === currentState);
|
||||
if (directExpected) {
|
||||
return transitionEdgeLabel(directExpected.from_state, directExpected.to_state);
|
||||
}
|
||||
return "expected_transition_not_observed";
|
||||
}
|
||||
function inferInvalidTransition(input) {
|
||||
function inferInvalidTransition(input, model) {
|
||||
const anomalies = input.candidates.flatMap((item) => item.anomaly_patterns).join(" ").toLowerCase();
|
||||
for (const transition of model.transitions) {
|
||||
for (const forbiddenCondition of transition.forbidden_conditions) {
|
||||
if (anomalies.includes(forbiddenCondition.toLowerCase())) {
|
||||
return `${transitionEdgeLabel(transition.from_state, transition.to_state)}:forbidden:${forbiddenCondition}`;
|
||||
}
|
||||
}
|
||||
}
|
||||
if (/(cross_branch|cross_domain_inconsistency)/.test(anomalies)) {
|
||||
return "cross_branch_conflict_transition";
|
||||
}
|
||||
|
|
@ -634,6 +791,13 @@ function classifyLifecycleDefect(input) {
|
|||
}
|
||||
return null;
|
||||
}
|
||||
function registryBackedDefect(domain, defect) {
|
||||
if (!defect) {
|
||||
return null;
|
||||
}
|
||||
const model = exports.LifecycleRegistry.getDomain(domain);
|
||||
return model.defects.some((definition) => definition.defect_code === defect) ? defect : null;
|
||||
}
|
||||
function resolutionConfidence(unitConfidence, input) {
|
||||
let score = unitConfidence.score;
|
||||
if (input.hasExplicitStates)
|
||||
|
|
@ -661,31 +825,40 @@ function staleDurationHint(domain, defect, input) {
|
|||
return "unknown_snapshot_window";
|
||||
}
|
||||
function lifecycleInterpretation(input) {
|
||||
const base = `Текущая стадия: ${input.currentState}; ожидаемая стадия: ${input.expectedState}.`;
|
||||
const base = `Текущая стадия: ${input.currentState}; ожидаемая стадия: ${input.expectedState}.`;
|
||||
if (input.defect === "stale_active_state") {
|
||||
return `${base} Объект завис во времени и не дошел до ожидаемого перехода.`;
|
||||
return `${base} Объект завис во времени и не дошел до ожидаемого перехода.`;
|
||||
}
|
||||
if (input.defect === "misclosed_state") {
|
||||
return `${base} Контур закрыт формально, но путь закрытия противоречит бухгалтерской логике.`;
|
||||
return `${base} Контур закрыт формально, но путь закрытия противоречит бухгалтерской логике.`;
|
||||
}
|
||||
if (input.defect === "cross_branch_state_conflict") {
|
||||
return `${base} Между ветками домена ${input.domain} обнаружено противоречие состояний.`;
|
||||
return `${base} Между ветками домена ${input.domain} обнаружено противоречие состояний.`;
|
||||
}
|
||||
if (input.defect === "missing_expected_transition") {
|
||||
return `${base} Не зафиксирован ожидаемый переход (${input.missingTransition ?? "unknown_transition"}).`;
|
||||
return `${base} Не зафиксирован ожидаемый переход (${input.missingTransition ?? "unknown_transition"}).`;
|
||||
}
|
||||
if (input.defect === "invalid_transition") {
|
||||
return `${base} Зафиксирован некорректный переход (${input.invalidTransition ?? "invalid_transition"}).`;
|
||||
return `${base} Зафиксирован некорректный переход (${input.invalidTransition ?? "invalid_transition"}).`;
|
||||
}
|
||||
return `${base} Lifecycle-разрешение не выявило критичный дефект, но состояние требует наблюдения.`;
|
||||
return `${base} Lifecycle-разрешение не выявило критичный дефект, но состояние требует наблюдения.`;
|
||||
}
|
||||
function resolveLifecycle(input) {
|
||||
const lifecycle_domain = inferLifecycleDomain(input);
|
||||
const currentState = inferCurrentState(lifecycle_domain, input);
|
||||
const expectedState = inferExpectedState(lifecycle_domain, input);
|
||||
const missingTransition = inferMissingTransition(input);
|
||||
const invalidTransition = inferInvalidTransition(input);
|
||||
const defect = classifyLifecycleDefect({
|
||||
const model = exports.LifecycleRegistry.getDomain(lifecycle_domain);
|
||||
const inferredCurrentState = inferCurrentState(lifecycle_domain, input);
|
||||
const inferredExpectedState = inferExpectedState(lifecycle_domain, input, model);
|
||||
const explicitActualState = input.unit.actual_state?.trim() ?? null;
|
||||
const explicitExpectedState = input.unit.expected_state?.trim() ?? null;
|
||||
const explicitCurrentState = resolveStateCode(model, explicitActualState);
|
||||
const explicitExpectedResolved = resolveStateCode(model, explicitExpectedState);
|
||||
const inferredCurrentResolved = resolveStateCode(model, inferredCurrentState);
|
||||
const inferredExpectedResolved = resolveStateCode(model, inferredExpectedState);
|
||||
const currentState = explicitCurrentState ?? inferredCurrentResolved ?? defaultInitialState(model);
|
||||
const expectedState = explicitExpectedResolved ?? inferredExpectedResolved ?? defaultExpectedState(model);
|
||||
const missingTransition = inferMissingTransition(input, model, currentState, expectedState);
|
||||
const invalidTransition = inferInvalidTransition(input, model);
|
||||
const detectedDefect = classifyLifecycleDefect({
|
||||
domain: lifecycle_domain,
|
||||
currentState,
|
||||
expectedState,
|
||||
|
|
@ -693,15 +866,19 @@ function resolveLifecycle(input) {
|
|||
invalidTransition,
|
||||
periodCloseSensitive: input.unit.period_impact?.impact_class === "close_risk"
|
||||
});
|
||||
const defect = registryBackedDefect(lifecycle_domain, detectedDefect);
|
||||
const evidenceIds = uniqueStrings(input.unit.evidence_pack, 8);
|
||||
const previousStates = resolvePreviousStates(model, currentState);
|
||||
const limitations = uniqueStrings([
|
||||
...input.unit.snapshot_limitations,
|
||||
...(input.candidates.some((item) => item.confidence_hint === "low") ? ["low_confidence_candidates_present"] : []),
|
||||
...(input.unit.actual_state ? [] : ["actual_state_inferred"]),
|
||||
...(input.unit.expected_state ? [] : ["expected_state_inferred"])
|
||||
...(explicitActualState && !explicitCurrentState ? ["actual_state_not_in_registry_normalized"] : []),
|
||||
...(explicitExpectedState && !explicitExpectedResolved ? ["expected_state_not_in_registry_normalized"] : []),
|
||||
...(explicitCurrentState ? [] : ["actual_state_inferred"]),
|
||||
...(explicitExpectedResolved ? [] : ["expected_state_inferred"])
|
||||
], 8);
|
||||
const confidence = resolutionConfidence(input.unit.confidence, {
|
||||
hasExplicitStates: Boolean(input.unit.actual_state || input.unit.expected_state),
|
||||
hasExplicitStates: Boolean(explicitCurrentState || explicitExpectedResolved),
|
||||
hasDefectSignal: Boolean(defect || missingTransition || invalidTransition),
|
||||
candidateCount: input.candidates.length,
|
||||
hasSnapshotLimitations: limitations.length > 0
|
||||
|
|
@ -711,7 +888,7 @@ function resolveLifecycle(input) {
|
|||
lifecycle_domain,
|
||||
resolved_current_state: currentState,
|
||||
resolved_expected_state: expectedState,
|
||||
resolved_previous_states: [],
|
||||
resolved_previous_states: previousStates,
|
||||
missing_transitions: missingTransition ? [missingTransition] : [],
|
||||
invalid_transitions: invalidTransition ? [invalidTransition] : [],
|
||||
detected_defects: defect ? [defect] : [],
|
||||
|
|
|
|||
|
|
@ -76,7 +76,7 @@ function intersectsAnySpan(start, end, spans) {
|
|||
function extractAccounts(text) {
|
||||
const lower = String(text ?? "").toLowerCase();
|
||||
const explicitAccounts = new Set();
|
||||
const contextualPattern = /(?:\bсчет(?:а|у|ом|ов)?\b|\bсч\.?\b|\baccount(?:s)?\b|\bschet(?:a|u|om|ov)?\b)\s*(?:№|#|:)?\s*(\d{2}(?:\.\d{2})?)/giu;
|
||||
const contextualPattern = /(?:\bсч(?:е|ё)т(?:а|у|ом|ов)?\b|\bсч\.?\b|\baccount(?:s)?\b|\bschet(?:a|u|om|ov)?\b)\s*(?:№|#|:)?\s*(\d{2}(?:\.\d{2})?)/giu;
|
||||
let contextual = null;
|
||||
while ((contextual = contextualPattern.exec(lower)) !== null) {
|
||||
if (contextual[1]) {
|
||||
|
|
@ -284,8 +284,9 @@ function buildFragmentV2(rawText, index) {
|
|||
if (noiseOnly) {
|
||||
return null;
|
||||
}
|
||||
const inScopeTokens = /(проводк|документ|реализац|поступлен|взаиморасчет|сальдо|остатк|счет|ндс|амортиз|расходы будущих периодов|рбп|ос|контрагент|оплат|банк|выписк|склад|товар|материал)/i.test(lower);
|
||||
const translitInScopeTokens = /\b(?:schet|scheta|schetu|schetom|postavsh|kontragent|dokument|doc|oplata|oplati|platezh|vypisk|provodk|realiz|postuplen|nds|os|saldo|hvost|tail|anomali|risk|zakryt)\b/i.test(lower);
|
||||
const inScopeTokens = /(проводк|документ|реализац|поступлен|взаиморасчет|сальдо|остатк|сч(?:е|ё)т|ндс|амортиз|расходы будущих периодов|рбп|ос|контрагент|оплат|банк|выписк|склад|товар|материал|списани|жизненн|цикл|переход|lifecycle|writeoff|deferred)/i.test(lower);
|
||||
const translitInScopeTokens = /\b(?:schet|scheta|schetu|schetom|postavsh|kontragent|dokument|doc|oplata|oplati|platezh|vypisk|provodk|realiz|postuplen|nds|os|saldo|hvost|tail|anomali|risk|zakryt|lifecycle|state|transition|writeoff|deferred|periodclose)\b/i.test(lower);
|
||||
const lifecycleInScopeTokens = /(lifecycle|жизненн(?:ого|ый)?\s+цикл|стади|переход|списани|writeoff|deferred|period\s*close)/i.test(lower);
|
||||
const genericAccountingTokens = /(фсбу|налогов(ый|ого)|нк рф|закон|форма отчетности|как правильно в бухгалтерии)/i.test(lower);
|
||||
const offTopicTokens = /(погода|анекдот|музык|фильм|игр[аы]|рецепт|курс валют в мире)/i.test(lower);
|
||||
let domainRelevance = "unclear";
|
||||
|
|
@ -298,13 +299,13 @@ function buildFragmentV2(rawText, index) {
|
|||
domainRelevance = "out_of_scope";
|
||||
businessScope = "generic_accounting";
|
||||
}
|
||||
else if (inScopeTokens || translitInScopeTokens) {
|
||||
else if (inScopeTokens || translitInScopeTokens || lifecycleInScopeTokens) {
|
||||
domainRelevance = "in_scope";
|
||||
businessScope = "company_specific_accounting";
|
||||
}
|
||||
const entityTokenCount = (lower.match(/(документ|оплат|проводк|контрагент|договор|реализац|поступлен|выписк|закрыт|взаиморасчет|склад|товар|материал)/g) ?? [])
|
||||
const entityTokenCount = (lower.match(/(документ|оплат|проводк|контрагент|договор|реализац|поступлен|выписк|закрыт|взаиморасчет|склад|товар|материал|поставщ|покупат|списани|жизненн|цикл)/g) ?? [])
|
||||
.length;
|
||||
const translitEntityTokenCount = (lower.match(/\b(?:dokument|oplata|platezh|provodk|kontragent|realiz|postuplen|vypisk|zakryt|schet|sklad|tovar|material)\b/g) ?? []).length;
|
||||
const translitEntityTokenCount = (lower.match(/\b(?:dokument|oplata|platezh|provodk|kontragent|postavsh|pokupat|realiz|postuplen|vypisk|zakryt|schet|sklad|tovar|material)\b/g) ?? []).length;
|
||||
const entityTokenCountTotal = entityTokenCount + translitEntityTokenCount;
|
||||
const flags = {
|
||||
has_multi_entity_scope: entityTokenCountTotal >= 2,
|
||||
|
|
|
|||
|
|
@ -202,12 +202,13 @@ function simulateDeterministicRouting(normalized) {
|
|||
const decisions = normalized.fragments.map((fragment) => decideRouteForFragment(fragment));
|
||||
const inScopeCount = decisions.filter((item) => item.domain_relevance === "in_scope").length;
|
||||
const outOfScopeCount = decisions.filter((item) => item.domain_relevance === "out_of_scope").length;
|
||||
const unclearCount = decisions.filter((item) => item.domain_relevance === "unclear").length;
|
||||
const routedInScopeCount = decisions.filter((item) => item.domain_relevance === "in_scope" && item.route !== "no_route").length;
|
||||
const clarificationInScopeCount = decisions.filter((item) => item.domain_relevance === "in_scope" && item.execution_readiness === "needs_clarification").length;
|
||||
const noRouteInScopeCount = decisions.filter((item) => item.domain_relevance === "in_scope" && item.route === "no_route").length;
|
||||
let fallbackType = "none";
|
||||
if (!normalized.message_in_scope || inScopeCount === 0) {
|
||||
fallbackType = "out_of_scope";
|
||||
fallbackType = outOfScopeCount > 0 && unclearCount === 0 ? "out_of_scope" : "clarification";
|
||||
}
|
||||
else if (routedInScopeCount === 0 && clarificationInScopeCount > 0) {
|
||||
fallbackType = "clarification";
|
||||
|
|
|
|||
File diff suppressed because one or more lines are too long
|
|
@ -0,0 +1,270 @@
|
|||
#!/usr/bin/env node
|
||||
|
||||
const fs = require("node:fs");
|
||||
const path = require("node:path");
|
||||
const request = require("supertest");
|
||||
|
||||
const STAGE3_SUITE_RELATIVE = path.join("eval_cases", "assistant_stage3_lifecycle_probe_v0_1.json");
|
||||
const FLAG_KEYS = [
|
||||
"FEATURE_ASSISTANT_PROBLEM_UNITS_V1",
|
||||
"FEATURE_ASSISTANT_ANSWER_POLICY_V11",
|
||||
"FEATURE_ASSISTANT_BROAD_GUARD_V1",
|
||||
"FEATURE_ASSISTANT_MIN_EVIDENCE_GATE_V1",
|
||||
"FEATURE_ASSISTANT_ANTI_GENERIC_RANKING_GUARD_V1",
|
||||
"FEATURE_ASSISTANT_PROBLEM_CENTRIC_ANSWER_V1",
|
||||
"FEATURE_ASSISTANT_PROBLEM_UNIT_CONTINUITY_V1",
|
||||
"FEATURE_ASSISTANT_LIFECYCLE_RUNTIME_V1",
|
||||
"FEATURE_ASSISTANT_LIFECYCLE_ANSWER_V1"
|
||||
];
|
||||
|
||||
function parseArgs(argv) {
|
||||
const args = {
|
||||
runDir: "",
|
||||
suitePath: "",
|
||||
outputSubdir: path.join("prompt_dialogs", "stage3_lifecycle_probe")
|
||||
};
|
||||
for (let i = 0; i < argv.length; i += 1) {
|
||||
const token = argv[i];
|
||||
if (token === "--run-dir") {
|
||||
args.runDir = String(argv[i + 1] ?? "");
|
||||
i += 1;
|
||||
continue;
|
||||
}
|
||||
if (token === "--suite-path") {
|
||||
args.suitePath = String(argv[i + 1] ?? "");
|
||||
i += 1;
|
||||
continue;
|
||||
}
|
||||
if (token === "--output-subdir") {
|
||||
args.outputSubdir = String(argv[i + 1] ?? "");
|
||||
i += 1;
|
||||
}
|
||||
}
|
||||
return args;
|
||||
}
|
||||
|
||||
function ensureDir(dirPath) {
|
||||
fs.mkdirSync(dirPath, { recursive: true });
|
||||
}
|
||||
|
||||
function writeUtf8Bom(filePath, content) {
|
||||
fs.writeFileSync(filePath, `\uFEFF${content}`, "utf8");
|
||||
}
|
||||
|
||||
function toSafeFileToken(value) {
|
||||
return String(value)
|
||||
.trim()
|
||||
.replace(/\s+/g, "_")
|
||||
.replace(/[^a-zA-Z0-9_-]/g, "_")
|
||||
.replace(/_+/g, "_");
|
||||
}
|
||||
|
||||
function readJson(filePath) {
|
||||
const raw = fs.readFileSync(filePath, "utf8").replace(/^\uFEFF/, "");
|
||||
return JSON.parse(raw);
|
||||
}
|
||||
|
||||
function findLatestRunDir(runsRoot) {
|
||||
if (!fs.existsSync(runsRoot)) {
|
||||
throw new Error(`Runs folder not found: ${runsRoot}`);
|
||||
}
|
||||
const dirs = fs
|
||||
.readdirSync(runsRoot, { withFileTypes: true })
|
||||
.filter((entry) => entry.isDirectory())
|
||||
.map((entry) => path.join(runsRoot, entry.name))
|
||||
.sort((a, b) => fs.statSync(b).mtimeMs - fs.statSync(a).mtimeMs);
|
||||
if (dirs.length === 0) {
|
||||
throw new Error(`No run directories found under: ${runsRoot}`);
|
||||
}
|
||||
return dirs[0];
|
||||
}
|
||||
|
||||
function resolveRunDir(args, runsRoot) {
|
||||
if (args.runDir) {
|
||||
return path.resolve(args.runDir);
|
||||
}
|
||||
return findLatestRunDir(runsRoot);
|
||||
}
|
||||
|
||||
function setLifecycleFlags() {
|
||||
const original = {};
|
||||
for (const key of FLAG_KEYS) {
|
||||
original[key] = process.env[key];
|
||||
}
|
||||
process.env.FEATURE_ASSISTANT_PROBLEM_UNITS_V1 = "1";
|
||||
process.env.FEATURE_ASSISTANT_ANSWER_POLICY_V11 = "1";
|
||||
process.env.FEATURE_ASSISTANT_BROAD_GUARD_V1 = "1";
|
||||
process.env.FEATURE_ASSISTANT_MIN_EVIDENCE_GATE_V1 = "1";
|
||||
process.env.FEATURE_ASSISTANT_ANTI_GENERIC_RANKING_GUARD_V1 = "1";
|
||||
process.env.FEATURE_ASSISTANT_PROBLEM_CENTRIC_ANSWER_V1 = "1";
|
||||
process.env.FEATURE_ASSISTANT_PROBLEM_UNIT_CONTINUITY_V1 = "0";
|
||||
process.env.FEATURE_ASSISTANT_LIFECYCLE_RUNTIME_V1 = "1";
|
||||
process.env.FEATURE_ASSISTANT_LIFECYCLE_ANSWER_V1 = "1";
|
||||
return original;
|
||||
}
|
||||
|
||||
function restoreFlags(original) {
|
||||
for (const key of FLAG_KEYS) {
|
||||
const value = original[key];
|
||||
if (value === undefined) {
|
||||
delete process.env[key];
|
||||
} else {
|
||||
process.env[key] = value;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
function summarizeDebug(debug) {
|
||||
const routeSummary = Array.isArray(debug?.route_summary) ? debug.route_summary : [];
|
||||
const retrievalResults = Array.isArray(debug?.retrieval_results) ? debug.retrieval_results : [];
|
||||
const routed = retrievalResults.filter((item) => String(item?.route ?? "") !== "no_route");
|
||||
const problemUnits = routed.reduce((acc, item) => {
|
||||
const list = Array.isArray(item?.problem_units) ? item.problem_units : [];
|
||||
return acc + list.length;
|
||||
}, 0);
|
||||
return {
|
||||
route_summary: routeSummary,
|
||||
routed_retrieval_count: routed.length,
|
||||
problem_units_count: problemUnits,
|
||||
problem_answer_mode: typeof debug?.problem_answer_mode === "string" ? debug.problem_answer_mode : ""
|
||||
};
|
||||
}
|
||||
|
||||
function buildMarkdown(dialog) {
|
||||
const lines = [];
|
||||
lines.push(`# ${dialog.case_id}`);
|
||||
lines.push("");
|
||||
lines.push(`- session_id: ${dialog.session_id || "n/a"}`);
|
||||
lines.push(`- reply_type: ${dialog.reply_type || "n/a"}`);
|
||||
lines.push(`- trace_id: ${dialog.trace_id || "n/a"}`);
|
||||
lines.push(`- status: ${dialog.http_status}`);
|
||||
lines.push("");
|
||||
lines.push("## User");
|
||||
lines.push(dialog.user_message || "");
|
||||
lines.push("");
|
||||
lines.push("## Assistant");
|
||||
lines.push(dialog.assistant_reply || "");
|
||||
lines.push("");
|
||||
lines.push("## Debug Summary");
|
||||
lines.push("```json");
|
||||
lines.push(JSON.stringify(dialog.debug_summary, null, 2));
|
||||
lines.push("```");
|
||||
lines.push("");
|
||||
return lines.join("\n");
|
||||
}
|
||||
|
||||
async function main() {
|
||||
const args = parseArgs(process.argv.slice(2));
|
||||
const backendRoot = path.resolve(__dirname, "..");
|
||||
const repoRoot = path.resolve(backendRoot, "..");
|
||||
const runsRoot = path.join(repoRoot, "docs", "runs");
|
||||
const runDir = resolveRunDir(args, runsRoot);
|
||||
const suitePath = args.suitePath ? path.resolve(args.suitePath) : path.join(repoRoot, STAGE3_SUITE_RELATIVE);
|
||||
const suite = readJson(suitePath);
|
||||
const dialogsDir = path.join(runDir, args.outputSubdir);
|
||||
ensureDir(dialogsDir);
|
||||
ensureDir(path.join(runDir, "prompt_dialogs"));
|
||||
|
||||
const originalFlags = setLifecycleFlags();
|
||||
let app;
|
||||
try {
|
||||
const { createApp } = require(path.join(backendRoot, "dist", "server.js"));
|
||||
app = createApp();
|
||||
} finally {
|
||||
restoreFlags(originalFlags);
|
||||
}
|
||||
|
||||
const indexRows = [];
|
||||
const generatedAt = new Date().toISOString();
|
||||
|
||||
for (let i = 0; i < suite.cases.length; i += 1) {
|
||||
const probeCase = suite.cases[i];
|
||||
const caseId = String(probeCase.case_id || `case_${i + 1}`);
|
||||
const userMessage = String(probeCase?.turns?.[0]?.user_message || "");
|
||||
const response = await request(app).post("/api/assistant/message").send({
|
||||
useMock: true,
|
||||
promptVersion: "normalizer_v2_0_2",
|
||||
user_message: userMessage
|
||||
});
|
||||
const body = response.body || {};
|
||||
const sessionId = String(body.session_id || "");
|
||||
|
||||
let session = null;
|
||||
if (sessionId) {
|
||||
const sessionResponse = await request(app).get(`/api/assistant/session/${encodeURIComponent(sessionId)}`);
|
||||
if (sessionResponse.status === 200 && sessionResponse.body?.ok) {
|
||||
session = sessionResponse.body.session ?? null;
|
||||
}
|
||||
}
|
||||
|
||||
const debugSummary = summarizeDebug(body.debug);
|
||||
const artifact = {
|
||||
schema_version: "assistant_prompt_dialog_v0_1",
|
||||
generated_at: generatedAt,
|
||||
suite_id: suite.suite_id,
|
||||
case_id: caseId,
|
||||
scenario_tag: probeCase.scenario_tag || "",
|
||||
expected_hints: probeCase.expected_hints || {},
|
||||
lifecycle_focus: probeCase.lifecycle_focus || {},
|
||||
request: {
|
||||
useMock: true,
|
||||
promptVersion: "normalizer_v2_0_2",
|
||||
user_message: userMessage
|
||||
},
|
||||
http_status: response.status,
|
||||
session_id: sessionId,
|
||||
trace_id: String(body.debug?.trace_id || body.conversation_item?.trace_id || ""),
|
||||
reply_type: String(body.reply_type || ""),
|
||||
assistant_reply: String(body.assistant_reply || ""),
|
||||
user_message: userMessage,
|
||||
conversation: Array.isArray(body.conversation) ? body.conversation : [],
|
||||
conversation_item: body.conversation_item || null,
|
||||
debug_summary: debugSummary,
|
||||
debug: body.debug || {},
|
||||
session
|
||||
};
|
||||
|
||||
const order = String(i + 1).padStart(2, "0");
|
||||
const fileStem = `${order}_${toSafeFileToken(caseId)}`;
|
||||
const jsonFile = `${fileStem}.json`;
|
||||
const mdFile = `${fileStem}.md`;
|
||||
writeUtf8Bom(path.join(dialogsDir, jsonFile), `${JSON.stringify(artifact, null, 2)}\n`);
|
||||
writeUtf8Bom(path.join(dialogsDir, mdFile), buildMarkdown(artifact));
|
||||
|
||||
indexRows.push({
|
||||
case_id: caseId,
|
||||
scenario_tag: String(probeCase.scenario_tag || ""),
|
||||
reply_type: artifact.reply_type,
|
||||
session_id: artifact.session_id,
|
||||
trace_id: artifact.trace_id,
|
||||
routed_retrieval_count: debugSummary.routed_retrieval_count,
|
||||
problem_units_count: debugSummary.problem_units_count,
|
||||
prompt_dialog_json: path.join(args.outputSubdir, jsonFile).replace(/\\/g, "/"),
|
||||
prompt_dialog_md: path.join(args.outputSubdir, mdFile).replace(/\\/g, "/")
|
||||
});
|
||||
}
|
||||
|
||||
const indexPayload = {
|
||||
schema_version: "assistant_prompt_dialog_index_v0_1",
|
||||
generated_at: generatedAt,
|
||||
run_dir: runDir,
|
||||
suite_id: suite.suite_id,
|
||||
scenario_count: suite.scenario_count,
|
||||
dialogs: indexRows
|
||||
};
|
||||
writeUtf8Bom(path.join(runDir, "prompt_dialogs", "index.json"), `${JSON.stringify(indexPayload, null, 2)}\n`);
|
||||
|
||||
process.stdout.write(
|
||||
[
|
||||
`run_dir=${runDir}`,
|
||||
`suite_id=${suite.suite_id}`,
|
||||
`dialogs_generated=${indexRows.length}`,
|
||||
`dialogs_folder=${dialogsDir}`
|
||||
].join("\n") + "\n"
|
||||
);
|
||||
}
|
||||
|
||||
main().catch((error) => {
|
||||
process.stderr.write(`${error?.stack || error}\n`);
|
||||
process.exit(1);
|
||||
});
|
||||
|
|
@ -1,4 +1,4 @@
|
|||
import type {
|
||||
import type {
|
||||
AssistantFallbackType,
|
||||
AssistantReplyType,
|
||||
AnswerGroundingCheck,
|
||||
|
|
@ -50,21 +50,96 @@ const UUID_PATTERN = /\b[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]
|
|||
const LONG_HEX_PATTERN = /\b[0-9a-f]{24,}\b/gi;
|
||||
const RAW_REF_BLOB_PATTERN = /\bevidence_source_ref_v1\|[^\s,;]+/gi;
|
||||
const RAW_REF_TOKEN_PATTERN = /\b(?:source_ref|canonical_ref|entity_id|fragment_id|guid|uuid)\b/gi;
|
||||
const SYNTHETIC_PLACEHOLDER_PATTERN = /\bunknown_entity(?::[^\s,;]+)?\b/gi;
|
||||
const SYNTHETIC_FALLBACK_MARKER_PATTERN = /\b(?:unknown_source|unknown_record)\b/gi;
|
||||
const SYNTHETIC_ROUTE_TOKEN_PATTERN = /\bbatch_refresh_then_store:[^\s,;]+/gi;
|
||||
const CYRILLIC_MOJIBAKE_FRAGMENT_PATTERN = /(?:[\u0420\u0421][\u0080-\u04FF\u2000-\u20CF]){2,}/u;
|
||||
const LATIN_MOJIBAKE_FRAGMENT_PATTERN = /(?:[\u00D0\u00D1][\u0080-\u00FF]){2,}/u;
|
||||
const SHORT_CYRILLIC_MOJIBAKE_TOKEN_PATTERN = /^[\u0420\u0421][\u0080-\u04FF\u2000-\u20CF]{1,2}$/u;
|
||||
const PREFIXED_SHORT_CYRILLIC_MOJIBAKE_TOKEN_PATTERN = /^[\p{L}\p{N}_-]+[\u0420\u0421][\u0080-\u04FF\u2000-\u20CF]{1,2}$/u;
|
||||
const MOJIBAKE_SINGLE_MARKER_PATTERN = /^[\u0420\u0421\u00D0\u00D1]$/u;
|
||||
const MOJIBAKE_MARKER_CHAR_PATTERN = /[\u0402\u0403\u040A\u040C\u040E\u040F\u0452\u0453\u0459\u045A\u045C\u045E\u045F\u201A\u201E\u2020\u2021\u2026\u2030\u20AC\u2122]/u;
|
||||
const CYRILLIC_MOJIBAKE_FRAGMENT_GLOBAL_PATTERN = /(?:[\u0420\u0421][\u0080-\u04FF\u2000-\u20CF]){2,}/gu;
|
||||
const LATIN_MOJIBAKE_FRAGMENT_GLOBAL_PATTERN = /(?:[\u00D0\u00D1][\u0080-\u00FF]){2,}/g;
|
||||
const MOJIBAKE_MARKER_CHAR_GLOBAL_PATTERN = /[\u0402\u0403\u040A\u040C\u040E\u040F\u0452\u0453\u0459\u045A\u045C\u045E\u045F\u201A\u201E\u2020\u2021\u2026\u2030\u20AC\u2122]/gu;
|
||||
|
||||
function normalizeToken(value: string): string {
|
||||
return value.replace(/^[^\p{L}\p{N}_-]+|[^\p{L}\p{N}_-]+$/gu, "");
|
||||
}
|
||||
|
||||
function isLikelyMojibakeToken(value: string): boolean {
|
||||
const token = normalizeToken(String(value ?? ""));
|
||||
if (!token) {
|
||||
return false;
|
||||
}
|
||||
if (MOJIBAKE_SINGLE_MARKER_PATTERN.test(token)) {
|
||||
return true;
|
||||
}
|
||||
if (SHORT_CYRILLIC_MOJIBAKE_TOKEN_PATTERN.test(token)) {
|
||||
return true;
|
||||
}
|
||||
if (token.length <= 8 && PREFIXED_SHORT_CYRILLIC_MOJIBAKE_TOKEN_PATTERN.test(token)) {
|
||||
return true;
|
||||
}
|
||||
return CYRILLIC_MOJIBAKE_FRAGMENT_PATTERN.test(token) || LATIN_MOJIBAKE_FRAGMENT_PATTERN.test(token);
|
||||
}
|
||||
|
||||
function countMojibakeTokens(value: string): number {
|
||||
return String(value ?? "")
|
||||
.split(/[\s,.;:!?()[\]{}"']+/g)
|
||||
.filter((token) => token.length > 0)
|
||||
.filter((token) => isLikelyMojibakeToken(token)).length;
|
||||
}
|
||||
|
||||
function countMojibakeSingleMarkers(value: string): number {
|
||||
return String(value ?? "")
|
||||
.split(/[\s,.;:!?()[\]{}"']+/g)
|
||||
.filter((token) => token.length > 0)
|
||||
.map((token) => normalizeToken(token))
|
||||
.filter((token) => MOJIBAKE_SINGLE_MARKER_PATTERN.test(token)).length;
|
||||
}
|
||||
|
||||
function stripMojibakeFragments(value: string): string {
|
||||
const removedByToken = String(value ?? "")
|
||||
.split(/(\s+)/g)
|
||||
.map((part) => {
|
||||
if (/^\s+$/u.test(part)) {
|
||||
return part;
|
||||
}
|
||||
return isLikelyMojibakeToken(part) ? "" : part;
|
||||
})
|
||||
.join("");
|
||||
|
||||
return removedByToken
|
||||
.replace(CYRILLIC_MOJIBAKE_FRAGMENT_GLOBAL_PATTERN, "")
|
||||
.replace(LATIN_MOJIBAKE_FRAGMENT_GLOBAL_PATTERN, "")
|
||||
.replace(MOJIBAKE_MARKER_CHAR_GLOBAL_PATTERN, "")
|
||||
.replace(/\s+([,.;:!?])/g, "$1")
|
||||
.replace(/\s{2,}/g, " ")
|
||||
.trim();
|
||||
}
|
||||
|
||||
function looksLikeMojibake(value: string): boolean {
|
||||
const text = String(value ?? "");
|
||||
if (!text.trim()) {
|
||||
return false;
|
||||
}
|
||||
if (/(?:Р.|С.){5,}/u.test(text)) {
|
||||
const tokenHits = countMojibakeTokens(text);
|
||||
const singleMarkers = countMojibakeSingleMarkers(text);
|
||||
if (tokenHits >= 2 || (tokenHits >= 1 && singleMarkers >= 1) || singleMarkers >= 3) {
|
||||
return true;
|
||||
}
|
||||
if (/[ЃѓЂђЌќЎў]/u.test(text)) {
|
||||
if (MOJIBAKE_MARKER_CHAR_PATTERN.test(text)) {
|
||||
return true;
|
||||
}
|
||||
if (CYRILLIC_MOJIBAKE_FRAGMENT_PATTERN.test(text) || LATIN_MOJIBAKE_FRAGMENT_PATTERN.test(text)) {
|
||||
return true;
|
||||
}
|
||||
if (/\uFFFD/u.test(text)) {
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
function looksLikeTechnicalIdentifier(value: string): boolean {
|
||||
const text = String(value ?? "").trim();
|
||||
if (!text) {
|
||||
|
|
@ -99,15 +174,33 @@ function scrubRawTechnicalRefs(value: string): string {
|
|||
.trim();
|
||||
}
|
||||
|
||||
function sanitizeUserFacingReply(value: string): string {
|
||||
return scrubRawTechnicalRefs(value)
|
||||
.replace(/[ \t]+\n/g, "\n")
|
||||
.replace(/\n{3,}/g, "\n\n")
|
||||
function stripSyntheticPlaceholders(value: string): string {
|
||||
return String(value ?? "")
|
||||
.replace(SYNTHETIC_PLACEHOLDER_PATTERN, "")
|
||||
.replace(SYNTHETIC_FALLBACK_MARKER_PATTERN, "")
|
||||
.replace(SYNTHETIC_ROUTE_TOKEN_PATTERN, "")
|
||||
.replace(/[;,:]\s*[;,:]+/g, "; ")
|
||||
.replace(/\s{2,}/g, " ")
|
||||
.trim();
|
||||
}
|
||||
|
||||
function sanitizeUserFacingReply(value: string): string {
|
||||
const normalized = scrubRawTechnicalRefs(value).replace(/[ \t]+\n/g, "\n");
|
||||
const cleanedLines = normalized
|
||||
.split(/\r?\n/g)
|
||||
.map((line) => stripSyntheticPlaceholders(line))
|
||||
.map((line) => stripMojibakeFragments(line))
|
||||
.map((line) => line.trim())
|
||||
.filter((line) => line.length > 0)
|
||||
.filter((line) => !looksLikeMojibake(line));
|
||||
const cleaned = cleanedLines.join("\n").replace(/\n{3,}/g, "\n\n").trim();
|
||||
return cleaned || "Available data requires clarification for a reliable user-facing answer.";
|
||||
}
|
||||
|
||||
function sanitizeUserText(value: string): string | null {
|
||||
const normalized = scrubRawTechnicalRefs(String(value ?? "").replace(/\s+/g, " ").trim());
|
||||
const normalized = stripMojibakeFragments(
|
||||
stripSyntheticPlaceholders(scrubRawTechnicalRefs(String(value ?? "").replace(/\s+/g, " ").trim()))
|
||||
);
|
||||
if (!normalized) {
|
||||
return null;
|
||||
}
|
||||
|
|
@ -238,13 +331,13 @@ function buildFallbackWhyIncluded(results: UnifiedRetrievalResult[]): string[] {
|
|||
const checkedRecords = summaryNumber(result, "checked_records");
|
||||
|
||||
if (routeFocus) {
|
||||
lines.push(`Проверка выполнена по профилю ${routeFocus}.`);
|
||||
lines.push(`Проверка выполнена по профилю ${routeFocus}.`);
|
||||
}
|
||||
if (sourceRecords !== null && filteredRecords !== null && filteredRecords < sourceRecords) {
|
||||
lines.push(`Применено сужение выборки: ${filteredRecords} из ${sourceRecords} записей.`);
|
||||
lines.push(`Применено сужение выборки: ${filteredRecords} из ${sourceRecords} записей.`);
|
||||
}
|
||||
if (checkedRecords !== null) {
|
||||
lines.push(`Проверено записей в текущем проходе: ${checkedRecords}.`);
|
||||
lines.push(`Проверено записей в текущем проходе: ${checkedRecords}.`);
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -255,19 +348,19 @@ function buildFallbackSelectionReasons(results: UnifiedRetrievalResult[]): strin
|
|||
const lines: string[] = [];
|
||||
for (const result of results.slice(0, 2)) {
|
||||
if (summaryBoolean(result, "semantic_narrowing_applied")) {
|
||||
lines.push("Отбор выполнен по семантическому сужению предметной области.");
|
||||
lines.push("Отбор выполнен по семантическому сужению предметной области.");
|
||||
}
|
||||
const rankingBasis = summaryStringArray(result, "ranking_basis");
|
||||
if (rankingBasis.length > 0) {
|
||||
lines.push(`Ранжирование основано на: ${rankingBasis.join(", ")}.`);
|
||||
lines.push(`Ранжирование основано на: ${rankingBasis.join(", ")}.`);
|
||||
}
|
||||
if (summaryBoolean(result, "broad_guard_applied")) {
|
||||
lines.push("Применен broad-query guard для контроля ложной точности.");
|
||||
lines.push("Применен broad-query guard для контроля ложной точности.");
|
||||
}
|
||||
}
|
||||
|
||||
if (lines.length === 0) {
|
||||
lines.push("Отбор выполнен по совпадению предметных сигналов и доступной evidence-опоры.");
|
||||
lines.push("Отбор выполнен по совпадению предметных сигналов и доступной evidence-опоры.");
|
||||
}
|
||||
|
||||
return sanitizeUserLines(lines, 4);
|
||||
|
|
@ -276,16 +369,16 @@ function buildFallbackSelectionReasons(results: UnifiedRetrievalResult[]): strin
|
|||
function suggestNextStep(requirements: AssistantRequirement[], coverage: RequirementCoverageReport): string[] {
|
||||
const next: string[] = [];
|
||||
if (coverage.clarification_needed_for.length > 0) {
|
||||
next.push("Уточните период, счет, документ или контрагента для требований: " + coverage.clarification_needed_for.join(", ") + ".");
|
||||
next.push("Уточните период, счет, документ или контрагента для требований: " + coverage.clarification_needed_for.join(", ") + ".");
|
||||
}
|
||||
if (coverage.requirements_uncovered.length > 0) {
|
||||
next.push("Проверьте непокрытые требования: " + coverage.requirements_uncovered.join(", ") + ".");
|
||||
next.push("Проверьте непокрытые требования: " + coverage.requirements_uncovered.join(", ") + ".");
|
||||
}
|
||||
if (coverage.out_of_scope_requirements.length > 0) {
|
||||
next.push("Часть запроса вне текущего учетного контура: " + coverage.out_of_scope_requirements.join(", ") + ".");
|
||||
next.push("Часть запроса вне текущего учетного контура: " + coverage.out_of_scope_requirements.join(", ") + ".");
|
||||
}
|
||||
if (next.length === 0 && requirements.length > 0) {
|
||||
next.push("Следующим шагом можно открыть технический разбор и углубить проверку по выбранным объектам.");
|
||||
next.push("Следующим шагом можно открыть технический разбор и углубить проверку по выбранным объектам.");
|
||||
}
|
||||
return next;
|
||||
}
|
||||
|
|
@ -364,21 +457,25 @@ function selectProblemUnitSummary(results: UnifiedRetrievalResult[]): ProblemUni
|
|||
}
|
||||
|
||||
function formatAffectedScope(unit: ProblemUnit): string {
|
||||
const accountScope = sanitizeUserLines(unit.affected_accounts, 2);
|
||||
const counterpartyScope = sanitizeUserLines(unit.affected_counterparties, 2);
|
||||
const documentScope = sanitizeUserLines(unit.affected_documents, 2);
|
||||
const entityScope = sanitizeUserLines(unit.affected_entities, 2);
|
||||
const scopeParts: string[] = [];
|
||||
if (unit.affected_accounts.length > 0) {
|
||||
scopeParts.push(`счета: ${unit.affected_accounts.slice(0, 2).join(", ")}`);
|
||||
if (accountScope.length > 0) {
|
||||
scopeParts.push(`accounts: ${accountScope.join(", ")}`);
|
||||
}
|
||||
if (unit.affected_counterparties.length > 0) {
|
||||
scopeParts.push(`контрагенты: ${unit.affected_counterparties.slice(0, 2).join(", ")}`);
|
||||
if (counterpartyScope.length > 0) {
|
||||
scopeParts.push(`counterparties: ${counterpartyScope.join(", ")}`);
|
||||
}
|
||||
if (unit.affected_documents.length > 0) {
|
||||
scopeParts.push(`документы: ${unit.affected_documents.slice(0, 2).join(", ")}`);
|
||||
if (documentScope.length > 0) {
|
||||
scopeParts.push(`documents: ${documentScope.join(", ")}`);
|
||||
}
|
||||
if (scopeParts.length === 0 && unit.affected_entities.length > 0) {
|
||||
scopeParts.push(`объекты: ${unit.affected_entities.slice(0, 2).join(", ")}`);
|
||||
if (scopeParts.length === 0 && entityScope.length > 0) {
|
||||
scopeParts.push(`entities: ${entityScope.join(", ")}`);
|
||||
}
|
||||
if (scopeParts.length === 0) {
|
||||
return "затронутый контур требует уточнения";
|
||||
return "affected scope requires clarification";
|
||||
}
|
||||
return scopeParts.join("; ");
|
||||
}
|
||||
|
|
@ -448,49 +545,49 @@ function buildProblemCentricActions(input: {
|
|||
const unitTypes = new Set(input.units.map((item) => item.problem_unit_type));
|
||||
|
||||
if (unitTypes.has("broken_chain_segment")) {
|
||||
actions.push("Проверьте связку выписка -> документ -> проводка по проблемным участкам цепочки.");
|
||||
actions.push("Проверьте связку выписка -> документ -> проводка по проблемным участкам цепочки.");
|
||||
}
|
||||
if (unitTypes.has("unresolved_settlement_cluster")) {
|
||||
actions.push("Сверьте хвосты по расчетам: закрылся ли документ оплаты корректным закрывающим документом.");
|
||||
actions.push("Сверьте хвосты по расчетам: закрылся ли документ оплаты корректным закрывающим документом.");
|
||||
}
|
||||
if (unitTypes.has("period_risk_cluster")) {
|
||||
actions.push("Оцените влияние дефекта на закрытие периода и корректность регламентных операций.");
|
||||
actions.push("Оцените влияние дефекта на закрытие периода и корректность регламентных операций.");
|
||||
}
|
||||
if (unitTypes.has("cross_branch_inconsistency_cluster")) {
|
||||
actions.push("Сверьте противоречия между документами, проводками и регистрами по НДС/межконтурным связям.");
|
||||
actions.push("Сверьте противоречия между документами, проводками и регистрами по НДС/межконтурным связям.");
|
||||
}
|
||||
if (unitTypes.has("lifecycle_anomaly_node")) {
|
||||
actions.push("Проверьте lifecycle объекта: ожидаемый этап не должен оставаться в partially_linked состоянии.");
|
||||
actions.push("Проверьте lifecycle объекта: ожидаемый этап не должен оставаться в partially_linked состоянии.");
|
||||
}
|
||||
for (const unit of input.units) {
|
||||
if (unit.lifecycle_defect_type === "stale_active_state") {
|
||||
actions.push("Проверьте, почему объект завис: ожидаемый переход не должен оставаться в активной стадии.");
|
||||
actions.push("Проверьте, почему объект завис: ожидаемый переход не должен оставаться в активной стадии.");
|
||||
}
|
||||
if (unit.lifecycle_defect_type === "misclosed_state") {
|
||||
actions.push("Проверьте закрывающий документ и проводки: закрытие может быть формальным, но некорректным по пути.");
|
||||
actions.push("Проверьте закрывающий документ и проводки: закрытие может быть формальным, но некорректным по пути.");
|
||||
}
|
||||
if (unit.lifecycle_defect_type === "cross_branch_state_conflict") {
|
||||
actions.push("Сверьте бухгалтерскую и смежную ветки (например, НДС/расчеты): обнаружен межконтурный конфликт состояния.");
|
||||
actions.push("Сверьте бухгалтерскую и смежную ветки (например, НДС/расчеты): обнаружен межконтурный конфликт состояния.");
|
||||
}
|
||||
}
|
||||
|
||||
if (input.mode === "clarification_required") {
|
||||
if (input.missingAnchors.period) {
|
||||
actions.push("Уточните период проверки, чтобы зафиксировать границы проблемного контура.");
|
||||
actions.push("Уточните период проверки, чтобы зафиксировать границы проблемного контура.");
|
||||
}
|
||||
if (input.missingAnchors.account) {
|
||||
actions.push("Уточните счет или группу счетов для предметной локализации дефекта.");
|
||||
actions.push("Уточните счет или группу счетов для предметной локализации дефекта.");
|
||||
}
|
||||
if (input.missingAnchors.documentOrObject) {
|
||||
actions.push("Укажите конкретный документ или объект трассировки для проверки механизма отклонения.");
|
||||
actions.push("Укажите конкретный документ или объект трассировки для проверки механизма отклонения.");
|
||||
}
|
||||
if (input.missingAnchors.counterparty) {
|
||||
actions.push("Укажите контрагента/договор, чтобы проверить хвосты и разрывы на конкретной связке.");
|
||||
actions.push("Укажите контрагента/договор, чтобы проверить хвосты и разрывы на конкретной связке.");
|
||||
}
|
||||
}
|
||||
|
||||
if (input.coverageReport.requirements_uncovered.length > 0) {
|
||||
actions.push(`Закройте непокрытые требования: ${input.coverageReport.requirements_uncovered.join(", ")}.`);
|
||||
actions.push(`Закройте непокрытые требования: ${input.coverageReport.requirements_uncovered.join(", ")}.`);
|
||||
}
|
||||
|
||||
return uniqueStrings(actions, 6);
|
||||
|
|
@ -510,28 +607,28 @@ function buildProblemCentricClarifications(input: {
|
|||
const unitTypes = new Set(input.units.map((item) => item.problem_unit_type));
|
||||
|
||||
if (input.missingAnchors.period) {
|
||||
questions.push("Уточните период (например, 2020-06), в котором нужно проверить проблемный кластер.");
|
||||
questions.push("Уточните период (например, 2020-06), в котором нужно проверить проблемный кластер.");
|
||||
}
|
||||
if (input.missingAnchors.account) {
|
||||
questions.push("Уточните счет или связку счетов (например, 51/60), где вы ожидаете дефект.");
|
||||
questions.push("Уточните счет или связку счетов (например, 51/60), где вы ожидаете дефект.");
|
||||
}
|
||||
if (input.missingAnchors.documentOrObject) {
|
||||
questions.push("Укажите документ/объект, от которого нужно строить проверку цепочки.");
|
||||
questions.push("Укажите документ/объект, от которого нужно строить проверку цепочки.");
|
||||
}
|
||||
if (input.missingAnchors.counterparty) {
|
||||
questions.push("Укажите контрагента или договор, по которому проверить незакрытую экспозицию.");
|
||||
questions.push("Укажите контрагента или договор, по которому проверить незакрытую экспозицию.");
|
||||
}
|
||||
if (unitTypes.has("broken_chain_segment")) {
|
||||
questions.push("Уточните участок цепочки: выписка, платежный документ или проводка.");
|
||||
questions.push("Уточните участок цепочки: выписка, платежный документ или проводка.");
|
||||
}
|
||||
if (unitTypes.has("period_risk_cluster")) {
|
||||
questions.push("Уточните, какой этап закрытия периода критичен: начисление, закрытие счетов или НДС-блок.");
|
||||
questions.push("Уточните, какой этап закрытия периода критичен: начисление, закрытие счетов или НДС-блок.");
|
||||
}
|
||||
if (unitTypes.has("unresolved_settlement_cluster")) {
|
||||
questions.push("Уточните, интересуют хвосты поставщиков, покупателей или оба направления.");
|
||||
questions.push("Уточните, интересуют хвосты поставщиков, покупателей или оба направления.");
|
||||
}
|
||||
if (input.coverageReport.clarification_needed_for.length > 0) {
|
||||
questions.push(`Закройте уточнения для требований: ${input.coverageReport.clarification_needed_for.join(", ")}.`);
|
||||
questions.push(`Закройте уточнения для требований: ${input.coverageReport.clarification_needed_for.join(", ")}.`);
|
||||
}
|
||||
|
||||
return uniqueStrings(questions, 6);
|
||||
|
|
@ -644,10 +741,10 @@ function limitationReasonToText(code: EvidenceLimitationReasonCode): string {
|
|||
function detectMissingAnchors(userMessage: string): MissingAnchors {
|
||||
const lower = String(userMessage ?? "").toLowerCase();
|
||||
const hasPeriod = /\b20\d{2}(?:[-./](?:0[1-9]|1[0-2]))?\b/.test(lower);
|
||||
const hasAccount = /(?:\bсчет\b|\baccount\b|\bschet\b|\b\d{2}(?:\.\d{2})?\b)/i.test(lower);
|
||||
const hasDocumentOrObject = /(?:документ|invoice|guid|object|obj|#\d+|\bid\b|\bref\b|dokument|doc)/i.test(lower);
|
||||
const hasCounterparty = /(?:контрагент|supplier|buyer|customer|kontragent|postavsh|pokupatel)/i.test(lower);
|
||||
const hasAnomalyType = /(?:аномал|risk|отклон|разрыв|mismatch|duplicate|tail|цепочк|anomali|hvost)/i.test(lower);
|
||||
const hasAccount = /(?:\bсчет\b|\baccount\b|\bschet\b|\b\d{2}(?:\.\d{2})?\b)/i.test(lower);
|
||||
const hasDocumentOrObject = /(?:документ|invoice|guid|object|obj|#\d+|\bid\b|\bref\b|dokument|doc)/i.test(lower);
|
||||
const hasCounterparty = /(?:контрагент|supplier|buyer|customer|kontragent|postavsh|pokupatel)/i.test(lower);
|
||||
const hasAnomalyType = /(?:аномал|risk|отклон|разрыв|mismatch|duplicate|tail|цепочк|anomali|hvost)/i.test(lower);
|
||||
|
||||
return {
|
||||
period: !hasPeriod,
|
||||
|
|
@ -671,22 +768,22 @@ function buildClarificationQuestions(input: {
|
|||
}
|
||||
|
||||
if (input.missingAnchors.period) {
|
||||
questions.push("Уточните период проверки (например, 2020-06).");
|
||||
questions.push("Уточните период проверки (например, 2020-06).");
|
||||
}
|
||||
if (input.missingAnchors.account) {
|
||||
questions.push("Уточните счет или группу счетов (например, 19, 60, 62).");
|
||||
questions.push("Уточните счет или группу счетов (например, 19, 60, 62).");
|
||||
}
|
||||
if (input.missingAnchors.documentOrObject) {
|
||||
questions.push("Укажите документ/GUID/конкретный объект для трассировки.");
|
||||
questions.push("Укажите документ/GUID/конкретный объект для трассировки.");
|
||||
}
|
||||
if (input.missingAnchors.counterparty) {
|
||||
questions.push("Укажите контрагента или группу контрагентов.");
|
||||
questions.push("Укажите контрагента или группу контрагентов.");
|
||||
}
|
||||
if (input.policySignals.broad_query_detected && input.missingAnchors.anomalyType) {
|
||||
questions.push("Уточните тип отклонения: разрыв цепочки, неверный документ или аномальный риск.");
|
||||
questions.push("Уточните тип отклонения: разрыв цепочки, неверный документ или аномальный риск.");
|
||||
}
|
||||
if (input.coverageReport.clarification_needed_for.length > 0) {
|
||||
questions.push(`Закройте уточнения для требований: ${input.coverageReport.clarification_needed_for.join(", ")}.`);
|
||||
questions.push(`Закройте уточнения для требований: ${input.coverageReport.clarification_needed_for.join(", ")}.`);
|
||||
}
|
||||
|
||||
return uniqueStrings(questions, 6);
|
||||
|
|
@ -701,31 +798,31 @@ function buildRecommendedActions(input: {
|
|||
}): string[] {
|
||||
const actions: string[] = [];
|
||||
if (input.mode === "focused_grounded") {
|
||||
actions.push("Проверьте 1-2 ключевые записи в учетной базе и зафиксируйте итог в рабочем файле проверки.");
|
||||
actions.push("Проверьте 1-2 ключевые записи в учетной базе и зафиксируйте итог в рабочем файле проверки.");
|
||||
}
|
||||
if (input.mode === "broad_partial") {
|
||||
actions.push("Сузьте запрос до периода + счета или периода + документа и повторите проверку.");
|
||||
actions.push("Сузьте запрос до периода + счета или периода + документа и повторите проверку.");
|
||||
}
|
||||
if (input.mode === "clarification_required") {
|
||||
actions.push("Дайте недостающие якоря (период/счет/объект), иначе сильный factual вывод невозможен.");
|
||||
actions.push("Дайте недостающие якоря (период/счет/объект), иначе сильный factual вывод невозможен.");
|
||||
}
|
||||
if (input.coverageReport.requirements_uncovered.length > 0) {
|
||||
actions.push(`Закройте непокрытые требования: ${input.coverageReport.requirements_uncovered.join(", ")}.`);
|
||||
actions.push(`Закройте непокрытые требования: ${input.coverageReport.requirements_uncovered.join(", ")}.`);
|
||||
}
|
||||
if (input.coverageReport.requirements_partially_covered.length > 0) {
|
||||
actions.push(`Доуточните частично покрытые требования: ${input.coverageReport.requirements_partially_covered.join(", ")}.`);
|
||||
actions.push(`Доуточните частично покрытые требования: ${input.coverageReport.requirements_partially_covered.join(", ")}.`);
|
||||
}
|
||||
if (input.policySignals.broad_query_detected && input.policySignals.narrowing_strength !== "strong") {
|
||||
actions.push("Добавьте более узкий контекст: тип отклонения, группу документов и бизнес-участок.");
|
||||
actions.push("Добавьте более узкий контекст: тип отклонения, группу документов и бизнес-участок.");
|
||||
}
|
||||
if (input.limitationReasonCodes.includes("snapshot_only")) {
|
||||
actions.push("Сверьте критичные выводы с live source-of-record в 1C.");
|
||||
actions.push("Сверьте критичные выводы с live source-of-record в 1C.");
|
||||
}
|
||||
if (input.limitationReasonCodes.includes("weak_source_mapping")) {
|
||||
actions.push("Проверьте source mapping для связей document/register по указанным ref.");
|
||||
actions.push("Проверьте source mapping для связей document/register по указанным ref.");
|
||||
}
|
||||
if (input.sourceRefs.length > 0) {
|
||||
actions.push(`Начните проверку с ${input.sourceRefs.length} подтвержденных записей и сверьте их с первичными документами.`);
|
||||
actions.push(`Начните проверку с ${input.sourceRefs.length} подтвержденных записей и сверьте их с первичными документами.`);
|
||||
}
|
||||
|
||||
return uniqueStrings(actions, 6);
|
||||
|
|
@ -842,14 +939,14 @@ function buildPolicyDecision(input: {
|
|||
}
|
||||
|
||||
function buildAnswerSummary(mode: PolicyMode): string {
|
||||
if (mode === "focused_grounded") return "Сформирован прямой ответ на основе подтвержденной опоры.";
|
||||
if (mode === "broad_partial") return "Вывод ограничен: есть частичная опора, но не полный coverage.";
|
||||
if (mode === "clarification_required") return "Нужны уточнения: без сужения strong factual вывод ненадежен.";
|
||||
if (mode === "out_of_scope") return "Запрос вне доступного учетного контура.";
|
||||
if (mode === "route_mismatch") return "Результат маршрута не совпал с предметом вопроса.";
|
||||
if (mode === "empty") return "В текущем срезе данных релевантные записи не обнаружены.";
|
||||
if (mode === "no_grounded") return "Недостаточно опоры для обоснованного ответа.";
|
||||
return "Не удалось собрать обоснованный ответ по текущему запросу.";
|
||||
if (mode === "focused_grounded") return "Сформирован прямой ответ на основе подтвержденной опоры.";
|
||||
if (mode === "broad_partial") return "Вывод ограничен: есть частичная опора, но не полный coverage.";
|
||||
if (mode === "clarification_required") return "Нужны уточнения: без сужения strong factual вывод ненадежен.";
|
||||
if (mode === "out_of_scope") return "Запрос вне доступного учетного контура.";
|
||||
if (mode === "route_mismatch") return "Результат маршрута не совпал с предметом вопроса.";
|
||||
if (mode === "empty") return "В текущем срезе данных релевантные записи не обнаружены.";
|
||||
if (mode === "no_grounded") return "Недостаточно опоры для обоснованного ответа.";
|
||||
return "Не удалось собрать обоснованный ответ по текущему запросу.";
|
||||
}
|
||||
|
||||
function buildDirectAnswer(input: {
|
||||
|
|
@ -859,33 +956,33 @@ function buildDirectAnswer(input: {
|
|||
}): string {
|
||||
const topFact = firstMeaningfulFact(input.retrievalResults);
|
||||
if (input.mode === "focused_grounded") {
|
||||
return topFact ?? "Подтвержденный результат получен; можно продолжать предметную проверку без деградации.";
|
||||
return topFact ?? "Подтвержденный результат получен; можно продолжать предметную проверку без деградации.";
|
||||
}
|
||||
if (input.mode === "broad_partial") {
|
||||
if (topFact) {
|
||||
return `Доступен ограниченный подтвержденный фрагмент: ${topFact}`;
|
||||
return `Доступен ограниченный подтвержденный фрагмент: ${topFact}`;
|
||||
}
|
||||
return "Есть только ограниченная опора; вывод дан в частичном режиме без ложной точности.";
|
||||
return "Есть только ограниченная опора; вывод дан в частичном режиме без ложной точности.";
|
||||
}
|
||||
if (input.mode === "clarification_required") {
|
||||
return "Текущий запрос слишком широкий или недоопределен; надежный factual вывод пока невозможен.";
|
||||
return "Текущий запрос слишком широкий или недоопределен; надежный factual вывод пока невозможен.";
|
||||
}
|
||||
if (input.mode === "out_of_scope") {
|
||||
return "Могу отвечать только в пределах данных доступного учетного контура.";
|
||||
return "Могу отвечать только в пределах данных доступного учетного контура.";
|
||||
}
|
||||
if (input.mode === "route_mismatch") {
|
||||
return "Предмет результата не совпал с предметом вопроса; требуется уточнение фокуса.";
|
||||
return "Предмет результата не совпал с предметом вопроса; требуется уточнение фокуса.";
|
||||
}
|
||||
if (input.mode === "empty") {
|
||||
return "В текущем срезе данных проблемные записи по заданному условию не найдены.";
|
||||
return "В текущем срезе данных проблемные записи по заданному условию не найдены.";
|
||||
}
|
||||
if (input.mode === "no_grounded") {
|
||||
return "Недостаточно подтвержденной опоры для ответа в требуемой точности.";
|
||||
return "Недостаточно подтвержденной опоры для ответа в требуемой точности.";
|
||||
}
|
||||
if (input.policySignals.minimum_evidence_failed) {
|
||||
return "Маршрут отработал, но минимальная evidence-опора не пройдена.";
|
||||
return "Маршрут отработал, но минимальная evidence-опора не пройдена.";
|
||||
}
|
||||
return "Не удалось сформировать обоснованный ответ; нужно уточнение запроса.";
|
||||
return "Не удалось сформировать обоснованный ответ; нужно уточнение запроса.";
|
||||
}
|
||||
|
||||
function buildProblemCentricAnswerSummary(input: {
|
||||
|
|
@ -896,20 +993,20 @@ function buildProblemCentricAnswerSummary(input: {
|
|||
}): string {
|
||||
if (input.lifecycleEnriched && input.summary?.lifecycle_enriched_units && input.summary.lifecycle_enriched_units > 0) {
|
||||
if (input.mode === "clarification_required") {
|
||||
return "Выявлены lifecycle-дефекты, но для надежного вывода требуется уточнение предметных якорей.";
|
||||
return "Выявлены lifecycle-дефекты, но для надежного вывода требуется уточнение предметных якорей.";
|
||||
}
|
||||
return `Сформирован lifecycle-aware problem срез: выделено ${input.summary.lifecycle_enriched_units} lifecycle-узлов с приоритетом по дефектам перехода.`;
|
||||
return `Сформирован lifecycle-aware problem срез: выделено ${input.summary.lifecycle_enriched_units} lifecycle-узлов с приоритетом по дефектам перехода.`;
|
||||
}
|
||||
if (input.mode === "clarification_required") {
|
||||
return "Выявлены проблемные кластеры, но для надежного вывода требуется предметное уточнение фокуса.";
|
||||
return "Выявлены проблемные кластеры, но для надежного вывода требуется предметное уточнение фокуса.";
|
||||
}
|
||||
if (input.weakUnits) {
|
||||
return "Сформирован problem-centric срез с ограниченной опорой; вывод предварительный и требует до-проверки.";
|
||||
return "Сформирован problem-centric срез с ограниченной опорой; вывод предварительный и требует до-проверки.";
|
||||
}
|
||||
if (input.summary?.units_total && input.summary.units_total > 1) {
|
||||
return `Сформирован problem-centric срез: выделено ${input.summary.units_total} проблемных кластера с приоритетами.`;
|
||||
return `Сформирован problem-centric срез: выделено ${input.summary.units_total} проблемных кластера с приоритетами.`;
|
||||
}
|
||||
return "Сформирован problem-centric срез: выделен ключевой проблемный кластер и затронутый контур.";
|
||||
return "Сформирован problem-centric срез: выделен ключевой проблемный кластер и затронутый контур.";
|
||||
}
|
||||
|
||||
function buildProblemCentricDirectAnswer(input: {
|
||||
|
|
@ -920,19 +1017,24 @@ function buildProblemCentricDirectAnswer(input: {
|
|||
}): string {
|
||||
const lead =
|
||||
input.mode === "clarification_required"
|
||||
? "Обнаружены проблемные зоны, но без уточнения якорей сильный factual-вывод преждевременен."
|
||||
? "Обнаружены проблемные зоны, но без уточнения якорей сильный factual-вывод преждевременен."
|
||||
: input.weakUnits
|
||||
? "Выделены проблемные зоны с ограниченной надежностью; вывод дан в ограниченном режиме."
|
||||
? "Выделены проблемные зоны с ограниченной надежностью; вывод дан в ограниченном режиме."
|
||||
: input.lifecycleAnswerEnabled && hasLifecycleResolution(input.units)
|
||||
? "Выделены lifecycle-проблемы: определены текущие/ожидаемые стадии и тип нарушения перехода."
|
||||
: "Выделены ключевые проблемные зоны и их влияние на учетный контур.";
|
||||
? "Выделены lifecycle-проблемы: определены текущие/ожидаемые стадии и тип нарушения перехода."
|
||||
: "Выделены ключевые проблемные зоны и их влияние на учетный контур.";
|
||||
|
||||
const unitLines = input.units.map((unit) => {
|
||||
const scope = formatAffectedScope(unit);
|
||||
const lifecycleScope = input.lifecycleAnswerEnabled ? formatLifecycleScope(unit) : null;
|
||||
const lifecycleInterpretation = input.lifecycleAnswerEnabled ? unit.business_lifecycle_interpretation : null;
|
||||
const lifecycleInterpretation =
|
||||
input.lifecycleAnswerEnabled && unit.business_lifecycle_interpretation
|
||||
? sanitizeUserText(unit.business_lifecycle_interpretation)
|
||||
: null;
|
||||
const title = sanitizeUserText(unit.title) ?? "Problem cluster detected";
|
||||
const defect = sanitizeUserText(unit.business_defect_class) ?? "detected_issue";
|
||||
const segments = [
|
||||
`${unit.title}: ${unit.business_defect_class}`,
|
||||
`${title}: ${defect}`,
|
||||
scope,
|
||||
lifecycleScope,
|
||||
lifecycleInterpretation,
|
||||
|
|
@ -944,10 +1046,10 @@ function buildProblemCentricDirectAnswer(input: {
|
|||
});
|
||||
|
||||
if (unitLines.length === 0) {
|
||||
return `${lead}\nПроблемные кластеры не удалось детализировать в текущем срезе.`;
|
||||
return `${lead}\nПроблемные кластеры не удалось детализировать в текущем срезе.`;
|
||||
}
|
||||
|
||||
return [lead, "Проблемные кластеры:", ...unitLines].join("\n");
|
||||
return [lead, "Проблемные кластеры:", ...unitLines].join("\n");
|
||||
}
|
||||
|
||||
function buildProblemCentricAnswerStructure(input: {
|
||||
|
|
@ -1358,21 +1460,23 @@ function composeExplainableAnswer(input: ComposeAnswerInput, scopeLabel: "full"
|
|||
|
||||
const lead =
|
||||
scopeLabel === "full"
|
||||
? "Итог: запрос обработан по предмету, найденные объекты подтверждены данными контура."
|
||||
: "Итог: запрос обработан частично, ниже подтвержденная часть и ограничения.";
|
||||
? "Ртог: запрос обработан РїРѕ предмету, найденные объекты подтверждены данными контура."
|
||||
: "Ртог: запрос обработан частично, РЅРёР¶Рµ подтвержденная часть Рё ограничения.";
|
||||
|
||||
return [
|
||||
lead,
|
||||
facts.length > 0 ? "Подтвержденные результаты:\n" + formatList(facts) : "",
|
||||
whyIncluded.length > 0 ? "Почему это попало в ответ:\n" + formatList(whyIncluded) : "",
|
||||
selectionReasons.length > 0 ? "Основание отбора:\n" + formatList(selectionReasons) : "",
|
||||
riskFactors.length > 0 ? "Подтверждающие признаки:\n" + formatList(riskFactors) : "",
|
||||
interpretation.length > 0 ? "Практический смысл:\n" + formatList(interpretation) : "",
|
||||
limitations.length > 0 ? "Ограничения:\n" + formatList(limitations) : "",
|
||||
nextSteps.length > 0 ? "Что проверить дальше:\n" + formatList(nextSteps) : ""
|
||||
]
|
||||
.filter(Boolean)
|
||||
.join("\n\n");
|
||||
return sanitizeUserFacingReply(
|
||||
[
|
||||
lead,
|
||||
facts.length > 0 ? "Подтвержденные результаты:\n" + formatList(facts) : "",
|
||||
whyIncluded.length > 0 ? "Почему это попало в ответ:\n" + formatList(whyIncluded) : "",
|
||||
selectionReasons.length > 0 ? "Основание отбора:\n" + formatList(selectionReasons) : "",
|
||||
riskFactors.length > 0 ? "Подтверждающие признаки:\n" + formatList(riskFactors) : "",
|
||||
interpretation.length > 0 ? "Практический смысл:\n" + formatList(interpretation) : "",
|
||||
limitations.length > 0 ? "Ограничения:\n" + formatList(limitations) : "",
|
||||
nextSteps.length > 0 ? "Что проверить дальше:\n" + formatList(nextSteps) : ""
|
||||
]
|
||||
.filter(Boolean)
|
||||
.join("\n\n")
|
||||
);
|
||||
}
|
||||
|
||||
export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswerOutput {
|
||||
|
|
@ -1385,6 +1489,8 @@ export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswer
|
|||
const partialResults = input.retrievalResults.filter((item) => item.status === "partial");
|
||||
const emptyResults = input.retrievalResults.filter((item) => item.status === "empty");
|
||||
const errorResults = input.retrievalResults.filter((item) => item.status === "error");
|
||||
const legacyEvidenceItems = flattenEvidence(input.retrievalResults);
|
||||
const legacyLimitationReasonCodes = collectLimitationReasonCodes(legacyEvidenceItems);
|
||||
const hasBroadMinimumEvidenceSignal = input.retrievalResults.some(
|
||||
(item) => summaryBoolean(item, "broad_guard_applied") && summaryBoolean(item, "minimum_evidence_failed")
|
||||
);
|
||||
|
|
@ -1398,7 +1504,7 @@ export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswer
|
|||
if (fallbackType === "out_of_scope" && input.coverageReport.requirements_covered === 0) {
|
||||
return {
|
||||
assistant_reply:
|
||||
"Я могу отвечать только по данным вашей учетной базы. Этот запрос выходит за рамки доступного контура.",
|
||||
"РЇ РјРѕРіСѓ отвечать только РїРѕ данным вашей учетной базы. Ртот запрос выходит Р·Р° рамки доступного контура.",
|
||||
fallback_type: "out_of_scope",
|
||||
reply_type: "out_of_scope"
|
||||
};
|
||||
|
|
@ -1407,8 +1513,8 @@ export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswer
|
|||
if (input.groundingCheck.status === "route_mismatch_blocked") {
|
||||
return {
|
||||
assistant_reply: [
|
||||
"Не отправляю финальный ответ, потому что предмет результата не совпал с предметом вопроса.",
|
||||
"Уточните формулировку (например, нужный счет/участок учета), и я выполню повторный проход."
|
||||
"Не отправляю финальный ответ, потому что предмет результата не совпал с предметом вопроса.",
|
||||
"Уточните формулировку (например, нужный счет/участок учета), и я выполню повторный проход."
|
||||
].join("\n\n"),
|
||||
fallback_type: "partial",
|
||||
reply_type: "route_mismatch_blocked"
|
||||
|
|
@ -1418,7 +1524,7 @@ export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswer
|
|||
if (input.groundingCheck.status === "no_grounded_answer" && okResults.length === 0 && !hasBroadMinimumEvidenceSignal) {
|
||||
return {
|
||||
assistant_reply:
|
||||
"Пока не удалось собрать предметно подтвержденный ответ по вашему вопросу. Нужны дополнительные уточнения по периоду или объекту проверки.",
|
||||
"Пока не удалось собрать предметно подтвержденный ответ по вашему вопросу. Нужны дополнительные уточнения по периоду или объекту проверки.",
|
||||
fallback_type: fallbackType,
|
||||
reply_type: "no_grounded_answer"
|
||||
};
|
||||
|
|
@ -1427,7 +1533,7 @@ export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswer
|
|||
if (hasBroadClarificationSignal && okResults.length === 0 && partialResults.length === 0) {
|
||||
return {
|
||||
assistant_reply:
|
||||
"Запрос слишком широкий для надежного вывода по текущей опоре. Уточните период, участок учета или объект проверки, после чего я дам предметный результат.",
|
||||
"Запрос слишком широкий для надежного вывода по текущей опоре. Уточните период, участок учета или объект проверки, после чего я дам предметный результат.",
|
||||
fallback_type: "clarification",
|
||||
reply_type: "clarification_required"
|
||||
};
|
||||
|
|
@ -1435,7 +1541,7 @@ export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswer
|
|||
|
||||
if (fallbackType === "clarification" && okResults.length === 0 && partialResults.length === 0) {
|
||||
return {
|
||||
assistant_reply: "Уточните, пожалуйста, период, счет, документ или контрагента, чтобы закрыть все части вопроса корректно.",
|
||||
assistant_reply: "Уточните, пожалуйста, период, счет, документ или контрагента, чтобы закрыть все части вопроса корректно.",
|
||||
fallback_type: "clarification",
|
||||
reply_type: "clarification_required"
|
||||
};
|
||||
|
|
@ -1443,7 +1549,7 @@ export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswer
|
|||
|
||||
if (errorResults.length > 0 && okResults.length === 0 && partialResults.length === 0) {
|
||||
return {
|
||||
assistant_reply: "Не удалось получить данные из контура. Попробуйте повторить запрос или уточнить формулировку.",
|
||||
assistant_reply: "Не удалось получить данные из контура. Попробуйте повторить запрос или уточнить формулировку.",
|
||||
fallback_type: fallbackType,
|
||||
reply_type: "backend_error"
|
||||
};
|
||||
|
|
@ -1459,7 +1565,7 @@ export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswer
|
|||
|
||||
if (okResults.length === 0 && partialResults.length === 0 && emptyResults.length > 0) {
|
||||
return {
|
||||
assistant_reply: "По заданному условию в текущем срезе данных явных проблемных записей не найдено.",
|
||||
assistant_reply: "По заданному условию в текущем срезе данных явных проблемных записей не найдено.",
|
||||
fallback_type: fallbackType,
|
||||
reply_type: "empty_but_valid"
|
||||
};
|
||||
|
|
@ -1471,7 +1577,9 @@ export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswer
|
|||
input.coverageReport.clarification_needed_for.length > 0 ||
|
||||
input.coverageReport.out_of_scope_requirements.length > 0 ||
|
||||
input.groundingCheck.status === "partial" ||
|
||||
errorResults.length > 0;
|
||||
errorResults.length > 0 ||
|
||||
legacyLimitationReasonCodes.includes("weak_source_mapping") ||
|
||||
legacyLimitationReasonCodes.includes("missing_mechanism");
|
||||
|
||||
if (okResults.length > 0 && hasPartialCoverage) {
|
||||
return {
|
||||
|
|
@ -1490,9 +1598,10 @@ export function composeAssistantAnswer(input: ComposeAnswerInput): ComposeAnswer
|
|||
}
|
||||
|
||||
return {
|
||||
assistant_reply: "По текущему запросу не удалось построить обоснованный ответ. Уточните формулировку и попробуйте снова.",
|
||||
assistant_reply: "По текущему запросу не удалось построить обоснованный ответ. Уточните формулировку и попробуйте снова.",
|
||||
fallback_type: "unknown",
|
||||
reply_type: "backend_error"
|
||||
};
|
||||
}
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -1125,9 +1125,9 @@ export class AssistantDataLayer {
|
|||
} else if (route === "store_feature_risk") {
|
||||
result = this.executeRisk(fragmentText, data);
|
||||
} else if (route === "batch_refresh_then_store") {
|
||||
result = this.executeBatch(data);
|
||||
result = this.executeBatch(fragmentText, data);
|
||||
} else if (route === "store_canonical") {
|
||||
result = this.executeCanonical(data);
|
||||
result = this.executeCanonical(fragmentText, data);
|
||||
} else if (route === "live_mcp_drilldown") {
|
||||
result = this.executeDrilldown(fragmentText, data);
|
||||
}
|
||||
|
|
@ -1437,7 +1437,9 @@ export class AssistantDataLayer {
|
|||
};
|
||||
}
|
||||
|
||||
private executeRisk(_fragmentText: string, data: DatasetBundle): RawRetrievalResult {
|
||||
private executeRisk(fragmentText: string, data: DatasetBundle): RawRetrievalResult {
|
||||
const semanticProfile = buildSemanticRetrievalProfile(fragmentText);
|
||||
const profileRiskFactors = semanticProfile.anomaly_patterns;
|
||||
const records = [...data.problemCases, ...data.ndsRegisters];
|
||||
const scored = records
|
||||
.map((record) => {
|
||||
|
|
@ -1491,12 +1493,15 @@ export class AssistantDataLayer {
|
|||
items: [],
|
||||
summary: {
|
||||
checked_records: records.length,
|
||||
risky_records: 0
|
||||
risky_records: 0,
|
||||
query_subject: semanticProfile.query_subject,
|
||||
semantic_profile: semanticProfile,
|
||||
ranking_basis: semanticProfile.ranking_basis
|
||||
},
|
||||
evidence: [],
|
||||
why_included: [],
|
||||
selection_reason: ["Риск-оценка выполнялась по техническим признакам, но записи выше порога не найдены."],
|
||||
risk_factors: [],
|
||||
risk_factors: profileRiskFactors,
|
||||
business_interpretation: ["По текущему срезу явные риск-признаки не обнаружены."],
|
||||
confidence: "medium",
|
||||
limitations: ["Оценка основана на snapshot-данных и эвристическом risk score."],
|
||||
|
|
@ -1505,6 +1510,13 @@ export class AssistantDataLayer {
|
|||
}
|
||||
|
||||
const averageScore = items.reduce((acc, item) => acc + item.risk_score, 0) / items.length;
|
||||
const normalizedRiskFactors = uniqueStrings([
|
||||
...profileRiskFactors,
|
||||
"unknown_link_count",
|
||||
"zero_guid_values",
|
||||
"navigation_links",
|
||||
"missing_counterparty_link"
|
||||
]);
|
||||
return {
|
||||
status: "ok",
|
||||
result_type: "list",
|
||||
|
|
@ -1512,7 +1524,10 @@ export class AssistantDataLayer {
|
|||
summary: {
|
||||
checked_records: records.length,
|
||||
risky_records: items.length,
|
||||
average_risk_score: Number(averageScore.toFixed(2))
|
||||
average_risk_score: Number(averageScore.toFixed(2)),
|
||||
query_subject: semanticProfile.query_subject,
|
||||
semantic_profile: semanticProfile,
|
||||
ranking_basis: semanticProfile.ranking_basis
|
||||
},
|
||||
evidence: items.slice(0, 10).map((item) => ({
|
||||
source_entity: item.source_entity,
|
||||
|
|
@ -1521,14 +1536,10 @@ export class AssistantDataLayer {
|
|||
})),
|
||||
why_included: ["В ответ включены записи с risk_score >= 2."],
|
||||
selection_reason: [
|
||||
"score растет при unknown links, zero GUID, навигационных ссылках и отсутствии явного контрагента."
|
||||
],
|
||||
risk_factors: [
|
||||
"unknown_link_count",
|
||||
"zero_guid_values",
|
||||
"navigation_links",
|
||||
"missing_counterparty_link"
|
||||
"score растет при unknown links, zero GUID, навигационных ссылках и отсутствии явного контрагента.",
|
||||
`Semantic profile subject: ${semanticProfile.query_subject}.`
|
||||
],
|
||||
risk_factors: normalizedRiskFactors,
|
||||
business_interpretation: ["Рти записи требуют первичной бухгалтерской проверки как потенциальные аномалии."],
|
||||
confidence: "high",
|
||||
limitations: ["Риск-факторы определяются эвристикой, а не полным набором бизнес-правил 1С."],
|
||||
|
|
@ -1536,7 +1547,8 @@ export class AssistantDataLayer {
|
|||
};
|
||||
}
|
||||
|
||||
private executeBatch(data: DatasetBundle): RawRetrievalResult {
|
||||
private executeBatch(fragmentText: string, data: DatasetBundle): RawRetrievalResult {
|
||||
const semanticProfile = buildSemanticRetrievalProfile(fragmentText);
|
||||
const source = [...data.problemCases, ...data.keyFields, ...data.docs];
|
||||
const byEntity = new Map<string, number>();
|
||||
for (const record of source) {
|
||||
|
|
@ -1558,7 +1570,10 @@ export class AssistantDataLayer {
|
|||
items,
|
||||
summary: {
|
||||
checked_records: source.length,
|
||||
ranked_entities: items.length
|
||||
ranked_entities: items.length,
|
||||
query_subject: semanticProfile.query_subject,
|
||||
semantic_profile: semanticProfile,
|
||||
ranking_basis: semanticProfile.ranking_basis
|
||||
},
|
||||
evidence: items.slice(0, 5).map((item) => ({
|
||||
entity: item.entity,
|
||||
|
|
@ -1566,9 +1581,9 @@ export class AssistantDataLayer {
|
|||
})),
|
||||
why_included: items.length > 0 ? ["Показаны сущности с максимальным количеством записей."] : [],
|
||||
selection_reason: ["Ранжирование выполнено по records_count по убыванию."],
|
||||
risk_factors: ["Высокий объем записей по сущности повышает приоритет проверки."],
|
||||
risk_factors: uniqueStrings(["entity_volume_spike", ...semanticProfile.anomaly_patterns]),
|
||||
business_interpretation: [
|
||||
"Сущности в топе ранга чаще дают наибольший вклад в проблемный объем и требуют приоритетного аудита."
|
||||
"Top entities by volume highlight where lifecycle-focused review should start first."
|
||||
],
|
||||
confidence: "medium",
|
||||
limitations: ["Ранжирование по объему не всегда эквивалентно бизнес-риску."],
|
||||
|
|
@ -1576,8 +1591,11 @@ export class AssistantDataLayer {
|
|||
};
|
||||
}
|
||||
|
||||
private executeCanonical(data: DatasetBundle): RawRetrievalResult {
|
||||
const items = data.docs
|
||||
private executeCanonical(fragmentText: string, data: DatasetBundle): RawRetrievalResult {
|
||||
const semanticProfile = buildSemanticRetrievalProfile(fragmentText);
|
||||
const useVatSource = semanticProfile.domain_scope.includes("vat") || semanticProfile.domain_scope.includes("taxes");
|
||||
const sourceRecords = useVatSource ? [...data.ndsRegisters, ...data.keyFields] : data.docs;
|
||||
const items = sourceRecords
|
||||
.map((record) => {
|
||||
const period = extractDate(record);
|
||||
return {
|
||||
|
|
@ -1599,8 +1617,11 @@ export class AssistantDataLayer {
|
|||
result_type: "list",
|
||||
items,
|
||||
summary: {
|
||||
checked_records: data.docs.length,
|
||||
returned_records: items.length
|
||||
checked_records: sourceRecords.length,
|
||||
returned_records: items.length,
|
||||
query_subject: semanticProfile.query_subject,
|
||||
semantic_profile: semanticProfile,
|
||||
ranking_basis: semanticProfile.ranking_basis
|
||||
},
|
||||
evidence: items.slice(0, 6).map((item) => ({
|
||||
source_entity: item.source_entity,
|
||||
|
|
@ -1608,8 +1629,11 @@ export class AssistantDataLayer {
|
|||
period: item.period
|
||||
})),
|
||||
why_included: items.length > 0 ? ["Показаны последние по дате записи канонического документного слоя."] : [],
|
||||
selection_reason: ["Отбор по максимальной дате документа в пределах snapshot."],
|
||||
risk_factors: [],
|
||||
selection_reason: [
|
||||
"Отбор по максимальной дате документа в пределах snapshot.",
|
||||
`Semantic profile subject: ${semanticProfile.query_subject}.`
|
||||
],
|
||||
risk_factors: semanticProfile.anomaly_patterns,
|
||||
business_interpretation: ["Слой отражает базовый factual-срез документов для оперативной сверки."],
|
||||
confidence: "high",
|
||||
limitations: ["Рто read-only snapshot, Р° РЅРµ онлайн-состояние 1РЎ."],
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
import type { CandidateEvidenceItem, ProblemConfidence, ProblemUnit, ProblemUnitType } from "../types/stage2ProblemUnits";
|
||||
import type { CandidateEvidenceItem, ProblemConfidence, ProblemUnit, ProblemUnitType } from "../types/stage2ProblemUnits";
|
||||
import {
|
||||
LIFECYCLE_MODEL_SCHEMA_VERSION,
|
||||
STAGE3_LIFECYCLE_DOMAINS,
|
||||
|
|
@ -47,13 +47,99 @@ function hasToken(values: string[], pattern: RegExp): boolean {
|
|||
return values.some((value) => pattern.test(value));
|
||||
}
|
||||
|
||||
function defaultExpectedState(domain: LifecycleDomain): string {
|
||||
if (domain === "bank_settlement") return "settlement_closed";
|
||||
if (domain === "customer_settlement") return "receivable_closed";
|
||||
if (domain === "deferred_expense") return "fully_written_off";
|
||||
if (domain === "fixed_asset") return "depreciation_active";
|
||||
if (domain === "vat_flow") return "vat_deducted";
|
||||
return "close_completed";
|
||||
function normalizeStateToken(value: string): string {
|
||||
return value.trim().toLowerCase();
|
||||
}
|
||||
|
||||
function resolveStateCode(model: LifecycleDomainModel, stateCode: string | null | undefined): string | null {
|
||||
if (!stateCode || typeof stateCode !== "string") {
|
||||
return null;
|
||||
}
|
||||
const normalized = normalizeStateToken(stateCode);
|
||||
const matched = model.states.find((state) => normalizeStateToken(state.state_code) === normalized);
|
||||
return matched?.state_code ?? null;
|
||||
}
|
||||
|
||||
function defaultInitialState(model: LifecycleDomainModel): string {
|
||||
const initial = model.states.find((state) => state.state_class === "initial");
|
||||
if (initial) {
|
||||
return initial.state_code;
|
||||
}
|
||||
return model.states[0]?.state_code ?? "unknown_state";
|
||||
}
|
||||
|
||||
function defaultExpectedState(model: LifecycleDomainModel): string {
|
||||
const terminal = model.states.find((state) => state.is_terminal || state.state_class === "terminal");
|
||||
if (terminal) {
|
||||
return terminal.state_code;
|
||||
}
|
||||
const active = model.states.find((state) => state.state_class === "active");
|
||||
if (active) {
|
||||
return active.state_code;
|
||||
}
|
||||
return defaultInitialState(model);
|
||||
}
|
||||
|
||||
function expectedTransitionAdjacency(model: LifecycleDomainModel): Map<string, string[]> {
|
||||
const graph = new Map<string, string[]>();
|
||||
for (const transition of model.transitions) {
|
||||
if (transition.transition_type !== "expected") {
|
||||
continue;
|
||||
}
|
||||
const from = transition.from_state;
|
||||
const to = transition.to_state;
|
||||
const current = graph.get(from) ?? [];
|
||||
if (!current.includes(to)) {
|
||||
current.push(to);
|
||||
}
|
||||
graph.set(from, current);
|
||||
}
|
||||
return graph;
|
||||
}
|
||||
|
||||
function shortestExpectedPath(model: LifecycleDomainModel, fromState: string, toState: string): string[] | null {
|
||||
if (fromState === toState) {
|
||||
return [fromState];
|
||||
}
|
||||
const graph = expectedTransitionAdjacency(model);
|
||||
const queue: string[][] = [[fromState]];
|
||||
const visited = new Set<string>([fromState]);
|
||||
while (queue.length > 0) {
|
||||
const path = queue.shift();
|
||||
if (!path) {
|
||||
continue;
|
||||
}
|
||||
const tail = path[path.length - 1];
|
||||
const nextStates = graph.get(tail) ?? [];
|
||||
for (const nextState of nextStates) {
|
||||
if (visited.has(nextState)) {
|
||||
continue;
|
||||
}
|
||||
const nextPath = [...path, nextState];
|
||||
if (nextState === toState) {
|
||||
return nextPath;
|
||||
}
|
||||
visited.add(nextState);
|
||||
queue.push(nextPath);
|
||||
}
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
function transitionEdgeLabel(fromState: string, toState: string): string {
|
||||
return `${fromState}->${toState}`;
|
||||
}
|
||||
|
||||
function resolvePreviousStates(model: LifecycleDomainModel, currentState: string): string[] {
|
||||
const initialState = defaultInitialState(model);
|
||||
if (initialState === currentState) {
|
||||
return [];
|
||||
}
|
||||
const path = shortestExpectedPath(model, initialState, currentState);
|
||||
if (!path || path.length <= 1) {
|
||||
return [];
|
||||
}
|
||||
return path.slice(0, -1);
|
||||
}
|
||||
|
||||
const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
|
||||
|
|
@ -64,53 +150,53 @@ const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
|
|||
states: [
|
||||
{
|
||||
state_code: "initiated_payment",
|
||||
state_label: "Платеж инициирован",
|
||||
state_label: "Платеж инициирован",
|
||||
state_class: "initial",
|
||||
entry_conditions: ["payment_order_created"],
|
||||
exit_conditions: ["bank_recorded"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Есть инициирование платежа."
|
||||
business_meaning: "Есть инициирование платежа."
|
||||
},
|
||||
{
|
||||
state_code: "bank_recorded",
|
||||
state_label: "Платеж отражен банком",
|
||||
state_label: "Платеж отражен банком",
|
||||
state_class: "active",
|
||||
entry_conditions: ["bank_statement_recorded"],
|
||||
exit_conditions: ["settlement_linked"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Движение денег зафиксировано, ожидается расчетное закрытие."
|
||||
business_meaning: "Движение денег зафиксировано, ожидается расчетное закрытие."
|
||||
},
|
||||
{
|
||||
state_code: "settlement_closed",
|
||||
state_label: "Расчет закрыт",
|
||||
state_label: "Расчет закрыт",
|
||||
state_class: "terminal",
|
||||
entry_conditions: ["payment_to_settlement_linked"],
|
||||
exit_conditions: [],
|
||||
is_terminal: true,
|
||||
is_problematic: false,
|
||||
business_meaning: "Платеж доведен до расчетного результата."
|
||||
business_meaning: "Платеж доведен до расчетного результата."
|
||||
},
|
||||
{
|
||||
state_code: "stale_unlinked_payment",
|
||||
state_label: "Платеж завис без закрытия",
|
||||
state_label: "Платеж завис без закрытия",
|
||||
state_class: "problematic",
|
||||
entry_conditions: ["bank_recorded", "missing_link"],
|
||||
exit_conditions: ["settlement_closed"],
|
||||
is_terminal: false,
|
||||
is_problematic: true,
|
||||
business_meaning: "Платеж отражен, но ожидаемая связь по расчету не завершена."
|
||||
business_meaning: "Платеж отражен, но ожидаемая связь по расчету не завершена."
|
||||
},
|
||||
{
|
||||
state_code: "misclosed_payment",
|
||||
state_label: "Платеж закрыт некорректно",
|
||||
state_label: "Платеж закрыт некорректно",
|
||||
state_class: "problematic",
|
||||
entry_conditions: ["wrong_document_type_or_posting_mismatch"],
|
||||
exit_conditions: ["settlement_closed"],
|
||||
is_terminal: false,
|
||||
is_problematic: true,
|
||||
business_meaning: "Формальное закрытие есть, но путь закрытия неверный."
|
||||
business_meaning: "Формальное закрытие есть, но путь закрытия неверный."
|
||||
}
|
||||
],
|
||||
transitions: [
|
||||
|
|
@ -121,7 +207,7 @@ const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
|
|||
required_evidence: ["bank_statement_recorded"],
|
||||
optional_evidence: ["payment_order"],
|
||||
forbidden_conditions: [],
|
||||
business_meaning: "Платеж должен появиться во выписке."
|
||||
business_meaning: "Платеж должен появиться во выписке."
|
||||
},
|
||||
{
|
||||
from_state: "bank_recorded",
|
||||
|
|
@ -130,7 +216,7 @@ const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
|
|||
required_evidence: ["payment_to_settlement_link"],
|
||||
optional_evidence: ["document_to_posting"],
|
||||
forbidden_conditions: ["wrong_document_type"],
|
||||
business_meaning: "После выписки должен закрываться расчет."
|
||||
business_meaning: "После выписки должен закрываться расчет."
|
||||
}
|
||||
],
|
||||
defects: []
|
||||
|
|
@ -142,43 +228,43 @@ const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
|
|||
states: [
|
||||
{
|
||||
state_code: "invoice_issued",
|
||||
state_label: "Реализация отражена",
|
||||
state_label: "Реализация отражена",
|
||||
state_class: "initial",
|
||||
entry_conditions: ["realization_document_exists"],
|
||||
exit_conditions: ["payment_recorded"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Возникла дебиторская позиция."
|
||||
business_meaning: "Возникла дебиторская позиция."
|
||||
},
|
||||
{
|
||||
state_code: "payment_recorded",
|
||||
state_label: "Оплата отражена",
|
||||
state_label: "Оплата отражена",
|
||||
state_class: "active",
|
||||
entry_conditions: ["payment_document_exists"],
|
||||
exit_conditions: ["receivable_closed"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Оплата есть, ожидается корректное закрытие."
|
||||
business_meaning: "Оплата есть, ожидается корректное закрытие."
|
||||
},
|
||||
{
|
||||
state_code: "receivable_closed",
|
||||
state_label: "Дебиторка закрыта",
|
||||
state_label: "Дебиторка закрыта",
|
||||
state_class: "terminal",
|
||||
entry_conditions: ["closing_document_linked"],
|
||||
exit_conditions: [],
|
||||
is_terminal: true,
|
||||
is_problematic: false,
|
||||
business_meaning: "Дебиторская позиция закрыта корректно."
|
||||
business_meaning: "Дебиторская позиция закрыта корректно."
|
||||
},
|
||||
{
|
||||
state_code: "stale_receivable",
|
||||
state_label: "Дебиторка зависла",
|
||||
state_label: "Дебиторка зависла",
|
||||
state_class: "problematic",
|
||||
entry_conditions: ["unresolved_settlement"],
|
||||
exit_conditions: ["receivable_closed"],
|
||||
is_terminal: false,
|
||||
is_problematic: true,
|
||||
business_meaning: "Позиция остается незавершенной дольше ожидаемого."
|
||||
business_meaning: "Позиция остается незавершенной дольше ожидаемого."
|
||||
}
|
||||
],
|
||||
transitions: [
|
||||
|
|
@ -189,7 +275,7 @@ const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
|
|||
required_evidence: ["payment_document_exists"],
|
||||
optional_evidence: [],
|
||||
forbidden_conditions: [],
|
||||
business_meaning: "После реализации ожидается оплата/зачет."
|
||||
business_meaning: "После реализации ожидается оплата/зачет."
|
||||
},
|
||||
{
|
||||
from_state: "payment_recorded",
|
||||
|
|
@ -198,7 +284,7 @@ const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
|
|||
required_evidence: ["closing_document_linked"],
|
||||
optional_evidence: ["register_movement_exists"],
|
||||
forbidden_conditions: ["cross_branch_inconsistency"],
|
||||
business_meaning: "Оплата должна завершаться корректным закрытием расчета."
|
||||
business_meaning: "Оплата должна завершаться корректным закрытием расчета."
|
||||
}
|
||||
],
|
||||
defects: []
|
||||
|
|
@ -210,43 +296,43 @@ const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
|
|||
states: [
|
||||
{
|
||||
state_code: "recognized",
|
||||
state_label: "РБП признан",
|
||||
state_label: "РБП признан",
|
||||
state_class: "initial",
|
||||
entry_conditions: ["deferred_expense_created"],
|
||||
exit_conditions: ["writeoff_started"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "РБП поставлен на учет."
|
||||
business_meaning: "РБП поставлен на учет."
|
||||
},
|
||||
{
|
||||
state_code: "partially_written_off",
|
||||
state_label: "Частичное списание",
|
||||
state_label: "Частичное списание",
|
||||
state_class: "active",
|
||||
entry_conditions: ["partial_writeoff_exists"],
|
||||
exit_conditions: ["fully_written_off"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Списание идет по графику."
|
||||
business_meaning: "Списание идет по графику."
|
||||
},
|
||||
{
|
||||
state_code: "fully_written_off",
|
||||
state_label: "РБП полностью списан",
|
||||
state_label: "РБП полностью списан",
|
||||
state_class: "terminal",
|
||||
entry_conditions: ["full_writeoff_exists"],
|
||||
exit_conditions: [],
|
||||
is_terminal: true,
|
||||
is_problematic: false,
|
||||
business_meaning: "РБП завершил lifecycle."
|
||||
business_meaning: "РБП завершил lifecycle."
|
||||
},
|
||||
{
|
||||
state_code: "overdue_writeoff",
|
||||
state_label: "Просроченное списание",
|
||||
state_label: "Просроченное списание",
|
||||
state_class: "problematic",
|
||||
entry_conditions: ["period_boundary", "missing_link"],
|
||||
exit_conditions: ["fully_written_off"],
|
||||
is_terminal: false,
|
||||
is_problematic: true,
|
||||
business_meaning: "РБП живет дольше допустимого окна."
|
||||
business_meaning: "РБП живет дольше допустимого окна."
|
||||
}
|
||||
],
|
||||
transitions: [],
|
||||
|
|
@ -259,53 +345,53 @@ const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
|
|||
states: [
|
||||
{
|
||||
state_code: "capitalized",
|
||||
state_label: "Капвложения отражены",
|
||||
state_label: "Капвложения отражены",
|
||||
state_class: "initial",
|
||||
entry_conditions: ["capitalization_document_exists"],
|
||||
exit_conditions: ["accepted_for_accounting"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Объект зафиксирован как вложение."
|
||||
business_meaning: "Объект зафиксирован как вложение."
|
||||
},
|
||||
{
|
||||
state_code: "accepted_for_accounting",
|
||||
state_label: "Принят к учету",
|
||||
state_label: "Принят к учету",
|
||||
state_class: "active",
|
||||
entry_conditions: ["acceptance_document_exists"],
|
||||
exit_conditions: ["depreciation_active"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Объект переведен в основной контур учета."
|
||||
business_meaning: "Объект переведен в основной контур учета."
|
||||
},
|
||||
{
|
||||
state_code: "depreciation_active",
|
||||
state_label: "Амортизация активна",
|
||||
state_label: "Амортизация активна",
|
||||
state_class: "active",
|
||||
entry_conditions: ["depreciation_register_movement"],
|
||||
exit_conditions: ["disposed"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Жизненный цикл ОС идет штатно."
|
||||
business_meaning: "Жизненный цикл ОС идет штатно."
|
||||
},
|
||||
{
|
||||
state_code: "contradictory_asset_state",
|
||||
state_label: "Противоречивый статус ОС",
|
||||
state_label: "Противоречивый статус ОС",
|
||||
state_class: "problematic",
|
||||
entry_conditions: ["posting_mismatch_or_wrong_path"],
|
||||
exit_conditions: ["depreciation_active"],
|
||||
is_terminal: false,
|
||||
is_problematic: true,
|
||||
business_meaning: "Статус ОС формально есть, но смыслово противоречив."
|
||||
business_meaning: "Статус ОС формально есть, но смыслово противоречив."
|
||||
},
|
||||
{
|
||||
state_code: "disposed",
|
||||
state_label: "Выбыл",
|
||||
state_label: "Выбыл",
|
||||
state_class: "terminal",
|
||||
entry_conditions: ["disposal_document_exists"],
|
||||
exit_conditions: [],
|
||||
is_terminal: true,
|
||||
is_problematic: false,
|
||||
business_meaning: "Жизненный цикл ОС завершен."
|
||||
business_meaning: "Жизненный цикл ОС завершен."
|
||||
}
|
||||
],
|
||||
transitions: [],
|
||||
|
|
@ -318,43 +404,43 @@ const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
|
|||
states: [
|
||||
{
|
||||
state_code: "vat_registered",
|
||||
state_label: "НДС отражен документно",
|
||||
state_label: "НДС отражен документно",
|
||||
state_class: "initial",
|
||||
entry_conditions: ["invoice_registered"],
|
||||
exit_conditions: ["vat_reflected"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Сформирован первичный документный слой НДС."
|
||||
business_meaning: "Сформирован первичный документный слой НДС."
|
||||
},
|
||||
{
|
||||
state_code: "vat_reflected",
|
||||
state_label: "НДС отражен в учете",
|
||||
state_label: "НДС отражен в учете",
|
||||
state_class: "active",
|
||||
entry_conditions: ["vat_register_movement"],
|
||||
exit_conditions: ["vat_deducted"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "НДС проходит штатную стадию отражения."
|
||||
business_meaning: "НДС проходит штатную стадию отражения."
|
||||
},
|
||||
{
|
||||
state_code: "vat_deducted",
|
||||
state_label: "НДС принят к вычету",
|
||||
state_label: "НДС принят к вычету",
|
||||
state_class: "terminal",
|
||||
entry_conditions: ["deduction_confirmed"],
|
||||
exit_conditions: [],
|
||||
is_terminal: true,
|
||||
is_problematic: false,
|
||||
business_meaning: "НДС-цепочка завершена корректно."
|
||||
business_meaning: "НДС-цепочка завершена корректно."
|
||||
},
|
||||
{
|
||||
state_code: "vat_conflict",
|
||||
state_label: "Конфликт НДС-цепочки",
|
||||
state_label: "Конфликт НДС-цепочки",
|
||||
state_class: "problematic",
|
||||
entry_conditions: ["cross_branch_inconsistency"],
|
||||
exit_conditions: ["vat_reflected"],
|
||||
is_terminal: false,
|
||||
is_problematic: true,
|
||||
business_meaning: "Бухгалтерская и налоговая ветки расходятся."
|
||||
business_meaning: "Бухгалтерская и налоговая ветки расходятся."
|
||||
}
|
||||
],
|
||||
transitions: [],
|
||||
|
|
@ -367,53 +453,53 @@ const LIFECYCLE_DOMAIN_MODELS: Record<LifecycleDomain, LifecycleDomainModel> = {
|
|||
states: [
|
||||
{
|
||||
state_code: "preclose_checks",
|
||||
state_label: "Предзакрытие",
|
||||
state_label: "Предзакрытие",
|
||||
state_class: "active",
|
||||
entry_conditions: ["period_scope_detected"],
|
||||
exit_conditions: ["close_ready"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Идет проверка готовности периода."
|
||||
business_meaning: "Рдет проверка готовности периода."
|
||||
},
|
||||
{
|
||||
state_code: "close_ready",
|
||||
state_label: "Готов к закрытию",
|
||||
state_label: "Готов к закрытию",
|
||||
state_class: "active",
|
||||
entry_conditions: ["no_blockers_detected"],
|
||||
exit_conditions: ["close_completed"],
|
||||
is_terminal: false,
|
||||
is_problematic: false,
|
||||
business_meaning: "Период может быть закрыт."
|
||||
business_meaning: "Период может быть закрыт."
|
||||
},
|
||||
{
|
||||
state_code: "close_completed",
|
||||
state_label: "Закрытие завершено",
|
||||
state_label: "Закрытие завершено",
|
||||
state_class: "terminal",
|
||||
entry_conditions: ["close_operation_done"],
|
||||
exit_conditions: [],
|
||||
is_terminal: true,
|
||||
is_problematic: false,
|
||||
business_meaning: "Период закрыт."
|
||||
business_meaning: "Период закрыт."
|
||||
},
|
||||
{
|
||||
state_code: "close_blocked",
|
||||
state_label: "Закрытие заблокировано",
|
||||
state_label: "Закрытие заблокировано",
|
||||
state_class: "problematic",
|
||||
entry_conditions: ["period_close_risk_or_stale_state"],
|
||||
exit_conditions: ["close_ready"],
|
||||
is_terminal: false,
|
||||
is_problematic: true,
|
||||
business_meaning: "Есть lifecycle-дефекты, влияющие на закрытие."
|
||||
business_meaning: "Есть lifecycle-дефекты, влияющие на закрытие."
|
||||
},
|
||||
{
|
||||
state_code: "close_contradicted",
|
||||
state_label: "Закрыт формально, но с противоречием",
|
||||
state_label: "Закрыт формально, но с противоречием",
|
||||
state_class: "problematic",
|
||||
entry_conditions: ["misclosed_or_cross_branch_conflict"],
|
||||
exit_conditions: ["close_completed"],
|
||||
is_terminal: false,
|
||||
is_problematic: true,
|
||||
business_meaning: "Формальное закрытие не согласовано с фактическими ветками."
|
||||
business_meaning: "Формальное закрытие не согласовано с фактическими ветками."
|
||||
}
|
||||
],
|
||||
transitions: [],
|
||||
|
|
@ -426,7 +512,7 @@ const SHARED_DEFECTS: LifecycleDefectDefinition[] = [
|
|||
defect_code: "missing_expected_transition",
|
||||
defect_class: "path",
|
||||
severity_hint: "medium",
|
||||
business_meaning: "Ожидаемый переход не произошел.",
|
||||
business_meaning: "Ожидаемый переход не произошел.",
|
||||
evidence_requirements: ["expected_state", "missing_transition_signal"],
|
||||
period_impact_potential: "indirect"
|
||||
},
|
||||
|
|
@ -434,7 +520,7 @@ const SHARED_DEFECTS: LifecycleDefectDefinition[] = [
|
|||
defect_code: "invalid_transition",
|
||||
defect_class: "path",
|
||||
severity_hint: "high",
|
||||
business_meaning: "Переход произошел по некорректному пути.",
|
||||
business_meaning: "Переход произошел по некорректному пути.",
|
||||
evidence_requirements: ["invalid_transition_signal"],
|
||||
period_impact_potential: "indirect"
|
||||
},
|
||||
|
|
@ -442,7 +528,7 @@ const SHARED_DEFECTS: LifecycleDefectDefinition[] = [
|
|||
defect_code: "stale_active_state",
|
||||
defect_class: "timing",
|
||||
severity_hint: "high",
|
||||
business_meaning: "Объект завис в активном состоянии.",
|
||||
business_meaning: "Объект завис в активном состоянии.",
|
||||
evidence_requirements: ["stale_marker", "missing_transition_signal"],
|
||||
period_impact_potential: "direct"
|
||||
},
|
||||
|
|
@ -450,7 +536,7 @@ const SHARED_DEFECTS: LifecycleDefectDefinition[] = [
|
|||
defect_code: "contradictory_state",
|
||||
defect_class: "consistency",
|
||||
severity_hint: "high",
|
||||
business_meaning: "Статусы объекта противоречат друг другу.",
|
||||
business_meaning: "Статусы объекта противоречат друг другу.",
|
||||
evidence_requirements: ["contradiction_signal"],
|
||||
period_impact_potential: "direct"
|
||||
},
|
||||
|
|
@ -458,7 +544,7 @@ const SHARED_DEFECTS: LifecycleDefectDefinition[] = [
|
|||
defect_code: "premature_terminal_state",
|
||||
defect_class: "closure",
|
||||
severity_hint: "medium",
|
||||
business_meaning: "Терминальное состояние наступило преждевременно.",
|
||||
business_meaning: "Терминальное состояние наступило преждевременно.",
|
||||
evidence_requirements: ["terminal_state", "missing_required_previous_state"],
|
||||
period_impact_potential: "indirect"
|
||||
},
|
||||
|
|
@ -466,7 +552,7 @@ const SHARED_DEFECTS: LifecycleDefectDefinition[] = [
|
|||
defect_code: "misclosed_state",
|
||||
defect_class: "closure",
|
||||
severity_hint: "high",
|
||||
business_meaning: "Контур формально закрыт, но закрыт неверно.",
|
||||
business_meaning: "Контур формально закрыт, но закрыт неверно.",
|
||||
evidence_requirements: ["wrong_closure_path"],
|
||||
period_impact_potential: "direct"
|
||||
},
|
||||
|
|
@ -474,7 +560,7 @@ const SHARED_DEFECTS: LifecycleDefectDefinition[] = [
|
|||
defect_code: "orphan_intermediate_state",
|
||||
defect_class: "path",
|
||||
severity_hint: "medium",
|
||||
business_meaning: "Промежуточная стадия осталась без корректного продолжения.",
|
||||
business_meaning: "Промежуточная стадия осталась без корректного продолжения.",
|
||||
evidence_requirements: ["intermediate_state_without_next"],
|
||||
period_impact_potential: "indirect"
|
||||
},
|
||||
|
|
@ -482,7 +568,7 @@ const SHARED_DEFECTS: LifecycleDefectDefinition[] = [
|
|||
defect_code: "cross_branch_state_conflict",
|
||||
defect_class: "consistency",
|
||||
severity_hint: "high",
|
||||
business_meaning: "Состояния соседних веток учета противоречат друг другу.",
|
||||
business_meaning: "Состояния соседних веток учета противоречат друг другу.",
|
||||
evidence_requirements: ["cross_branch_conflict_signal"],
|
||||
period_impact_potential: "direct"
|
||||
}
|
||||
|
|
@ -502,6 +588,23 @@ class LifecycleRegistryImpl {
|
|||
public getDomain(domain: LifecycleDomain): LifecycleDomainModel {
|
||||
return this.models[domain];
|
||||
}
|
||||
|
||||
public hasState(domain: LifecycleDomain, stateCode: string | null | undefined): boolean {
|
||||
const model = this.getDomain(domain);
|
||||
return Boolean(resolveStateCode(model, stateCode));
|
||||
}
|
||||
|
||||
public resolveDefaultExpectedState(domain: LifecycleDomain): string {
|
||||
return defaultExpectedState(this.getDomain(domain));
|
||||
}
|
||||
|
||||
public resolveInitialState(domain: LifecycleDomain): string {
|
||||
return defaultInitialState(this.getDomain(domain));
|
||||
}
|
||||
|
||||
public findExpectedPath(domain: LifecycleDomain, fromState: string, toState: string): string[] | null {
|
||||
return shortestExpectedPath(this.getDomain(domain), fromState, toState);
|
||||
}
|
||||
}
|
||||
|
||||
export const LifecycleRegistry = new LifecycleRegistryImpl(LIFECYCLE_DOMAIN_MODELS);
|
||||
|
|
@ -524,30 +627,88 @@ function inferLifecycleDomain(input: LifecycleResolverInput): LifecycleDomain {
|
|||
.join(" ")
|
||||
.toLowerCase();
|
||||
|
||||
if (includesAny(unitTokens, [/\bnds\b/, /\bvat\b/, /\btax\b/, /cross[_\s-]?branch/, /\b19\b/, /\b68\b/])) {
|
||||
return "vat_flow";
|
||||
}
|
||||
if (includesAny(unitTokens, [/\bperiod\b/, /\bclose\b/, /закрыт/, /reporting/]) || input.unit.problem_unit_type === "period_risk_cluster") {
|
||||
return "period_close";
|
||||
}
|
||||
if (includesAny(unitTokens, [/deferred/, /writeoff/, /рбп/, /\b97\b/])) {
|
||||
const hasVatMarkers = includesAny(unitTokens, [
|
||||
/domain_hint:vat_flow/,
|
||||
/\binvoice_to_vat\b/,
|
||||
/\bvat_chain_conflict\b/,
|
||||
/(^|[^a-z0-9])nds([^a-z0-9]|$)/,
|
||||
/(^|[^a-z0-9])vat([^a-z0-9]|$)/,
|
||||
/(^|[^a-z0-9])tax(?:es)?([^a-z0-9]|$)/,
|
||||
/\baccount[_:\s-]?(19|68)\b/
|
||||
]);
|
||||
const hasDeferredMarkers = includesAny(unitTokens, [
|
||||
/domain_hint:deferred_expense/,
|
||||
/\bdeferred(?:_expense)?\b/,
|
||||
/\bdeferred_expense_to_writeoff\b/,
|
||||
/\bwriteoff\b/,
|
||||
/\bpartially_written_off\b/,
|
||||
/\bfully_written_off\b/,
|
||||
/\baccount[_:\s-]?97\b/
|
||||
]);
|
||||
const hasFixedAssetMarkers = includesAny(unitTokens, [
|
||||
/domain_hint:fixed_asset/,
|
||||
/\bfixed[_\s-]?asset(?:s)?\b/,
|
||||
/\basset_card_to_depreciation\b/,
|
||||
/\bdepreciation(?:_active)?\b/,
|
||||
/\baccepted_for_accounting\b/,
|
||||
/\bcapitalized\b/,
|
||||
/\baccount[_:\s-]?(01|02|08)\b/
|
||||
]);
|
||||
const hasPeriodCloseMarkers = includesAny(unitTokens, [
|
||||
/domain_hint:period_close/,
|
||||
/\bperiod[_\s-]?close\b/,
|
||||
/\bperiod_close_risk\b/,
|
||||
/\bclose[_\s-]?risk\b/,
|
||||
/\bclosure[_\s-]?risk\b/,
|
||||
/\bpreclose\b/,
|
||||
/\bmonth[_\s-]?close\b/,
|
||||
/\bperiod_risk\b/
|
||||
]);
|
||||
|
||||
if (hasDeferredMarkers) {
|
||||
return "deferred_expense";
|
||||
}
|
||||
if (includesAny(unitTokens, [/fixed[_\s-]?asset/, /амортиз/, /ос\b/, /\b01\b/, /\b02\b/, /\b08\b/])) {
|
||||
if (hasFixedAssetMarkers) {
|
||||
return "fixed_asset";
|
||||
}
|
||||
if (includesAny(unitTokens, [/buyer/, /customer/, /дебитор/, /\b62\b/])) {
|
||||
if (hasVatMarkers) {
|
||||
return "vat_flow";
|
||||
}
|
||||
|
||||
if (
|
||||
hasPeriodCloseMarkers ||
|
||||
input.unit.problem_unit_type === "period_risk_cluster" ||
|
||||
input.unit.period_impact?.impact_class === "close_risk"
|
||||
) {
|
||||
return "period_close";
|
||||
}
|
||||
if (includesAny(unitTokens, [/buyer/, /customer/, /\b62\b/])) {
|
||||
return "customer_settlement";
|
||||
}
|
||||
if (
|
||||
includesAny(unitTokens, [
|
||||
/domain_hint:bank_settlement/,
|
||||
/\bpayment_to_settlement\b/,
|
||||
/\bstatement_to_document\b/,
|
||||
/\bbank_recorded\b/,
|
||||
/\binitiated_payment\b/,
|
||||
/\bsettlement(?:_closed)?\b/
|
||||
]) ||
|
||||
input.unit.problem_unit_type === "unresolved_settlement_cluster" ||
|
||||
input.unit.problem_unit_type === "broken_chain_segment"
|
||||
) {
|
||||
return "bank_settlement";
|
||||
}
|
||||
if (input.unit.problem_unit_type === "cross_branch_inconsistency_cluster") {
|
||||
return "vat_flow";
|
||||
}
|
||||
if (input.unit.problem_unit_type === "lifecycle_anomaly_node") {
|
||||
return "deferred_expense";
|
||||
}
|
||||
return "bank_settlement";
|
||||
}
|
||||
|
||||
function inferCurrentState(domain: LifecycleDomain, input: LifecycleResolverInput): string {
|
||||
const explicitActual = input.unit.actual_state?.trim();
|
||||
if (explicitActual) {
|
||||
return explicitActual;
|
||||
}
|
||||
|
||||
const anomalies = input.candidates.flatMap((item) => item.anomaly_patterns).map((item) => item.toLowerCase());
|
||||
const relations = input.candidates.flatMap((item) => item.relation_pattern_hits).map((item) => item.toLowerCase());
|
||||
|
||||
|
|
@ -573,7 +734,7 @@ function inferCurrentState(domain: LifecycleDomain, input: LifecycleResolverInpu
|
|||
if (domain === "fixed_asset") {
|
||||
if (hasInvalid) return "contradictory_asset_state";
|
||||
if (hasToken(relations, /depreciation|amort/)) return "depreciation_active";
|
||||
if (hasToken(relations, /accept|учет/)) return "accepted_for_accounting";
|
||||
if (hasToken(relations, /accept|account/)) return "accepted_for_accounting";
|
||||
return "capitalized";
|
||||
}
|
||||
if (domain === "vat_flow") {
|
||||
|
|
@ -587,27 +748,51 @@ function inferCurrentState(domain: LifecycleDomain, input: LifecycleResolverInpu
|
|||
return "preclose_checks";
|
||||
}
|
||||
|
||||
function inferExpectedState(domain: LifecycleDomain, input: LifecycleResolverInput): string {
|
||||
function inferExpectedState(domain: LifecycleDomain, input: LifecycleResolverInput, model: LifecycleDomainModel): string {
|
||||
const explicitExpected = input.unit.expected_state?.trim();
|
||||
if (explicitExpected) {
|
||||
return explicitExpected;
|
||||
}
|
||||
return defaultExpectedState(domain);
|
||||
return defaultExpectedState(model);
|
||||
}
|
||||
|
||||
function inferMissingTransition(input: LifecycleResolverInput): string | null {
|
||||
function inferMissingTransition(
|
||||
input: LifecycleResolverInput,
|
||||
model: LifecycleDomainModel,
|
||||
currentState: string,
|
||||
expectedState: string
|
||||
): string | null {
|
||||
if (typeof input.unit.failed_expected_edge === "string" && input.unit.failed_expected_edge.trim().length > 0) {
|
||||
return input.unit.failed_expected_edge.trim();
|
||||
}
|
||||
const anomalies = input.candidates.flatMap((item) => item.anomaly_patterns).join(" ").toLowerCase();
|
||||
if (/(missing_link|no_continuation|broken_lifecycle|tail|unresolved)/.test(anomalies)) {
|
||||
return "expected_transition_not_observed";
|
||||
if (!/(missing_link|no_continuation|broken_lifecycle|tail|unresolved)/.test(anomalies)) {
|
||||
return null;
|
||||
}
|
||||
return null;
|
||||
if (currentState !== expectedState) {
|
||||
const path = shortestExpectedPath(model, currentState, expectedState);
|
||||
if (path && path.length >= 2) {
|
||||
return transitionEdgeLabel(path[0], path[1]);
|
||||
}
|
||||
}
|
||||
const directExpected = model.transitions.find(
|
||||
(transition) => transition.transition_type === "expected" && transition.from_state === currentState
|
||||
);
|
||||
if (directExpected) {
|
||||
return transitionEdgeLabel(directExpected.from_state, directExpected.to_state);
|
||||
}
|
||||
return "expected_transition_not_observed";
|
||||
}
|
||||
|
||||
function inferInvalidTransition(input: LifecycleResolverInput): string | null {
|
||||
function inferInvalidTransition(input: LifecycleResolverInput, model: LifecycleDomainModel): string | null {
|
||||
const anomalies = input.candidates.flatMap((item) => item.anomaly_patterns).join(" ").toLowerCase();
|
||||
for (const transition of model.transitions) {
|
||||
for (const forbiddenCondition of transition.forbidden_conditions) {
|
||||
if (anomalies.includes(forbiddenCondition.toLowerCase())) {
|
||||
return `${transitionEdgeLabel(transition.from_state, transition.to_state)}:forbidden:${forbiddenCondition}`;
|
||||
}
|
||||
}
|
||||
}
|
||||
if (/(cross_branch|cross_domain_inconsistency)/.test(anomalies)) {
|
||||
return "cross_branch_conflict_transition";
|
||||
}
|
||||
|
|
@ -653,6 +838,14 @@ export function classifyLifecycleDefect(input: {
|
|||
return null;
|
||||
}
|
||||
|
||||
function registryBackedDefect(domain: LifecycleDomain, defect: LifecycleDefectType | null): LifecycleDefectType | null {
|
||||
if (!defect) {
|
||||
return null;
|
||||
}
|
||||
const model = LifecycleRegistry.getDomain(domain);
|
||||
return model.defects.some((definition) => definition.defect_code === defect) ? defect : null;
|
||||
}
|
||||
|
||||
function resolutionConfidence(unitConfidence: ProblemConfidence, input: {
|
||||
hasExplicitStates: boolean;
|
||||
hasDefectSignal: boolean;
|
||||
|
|
@ -690,32 +883,47 @@ function lifecycleInterpretation(input: {
|
|||
missingTransition: string | null;
|
||||
invalidTransition: string | null;
|
||||
}): string {
|
||||
const base = `Текущая стадия: ${input.currentState}; ожидаемая стадия: ${input.expectedState}.`;
|
||||
const base = `Текущая стадия: ${input.currentState}; ожидаемая стадия: ${input.expectedState}.`;
|
||||
if (input.defect === "stale_active_state") {
|
||||
return `${base} Объект завис во времени и не дошел до ожидаемого перехода.`;
|
||||
return `${base} Объект завис во времени и не дошел до ожидаемого перехода.`;
|
||||
}
|
||||
if (input.defect === "misclosed_state") {
|
||||
return `${base} Контур закрыт формально, но путь закрытия противоречит бухгалтерской логике.`;
|
||||
return `${base} Контур закрыт формально, но путь закрытия противоречит бухгалтерской логике.`;
|
||||
}
|
||||
if (input.defect === "cross_branch_state_conflict") {
|
||||
return `${base} Между ветками домена ${input.domain} обнаружено противоречие состояний.`;
|
||||
return `${base} Между ветками домена ${input.domain} обнаружено противоречие состояний.`;
|
||||
}
|
||||
if (input.defect === "missing_expected_transition") {
|
||||
return `${base} Не зафиксирован ожидаемый переход (${input.missingTransition ?? "unknown_transition"}).`;
|
||||
return `${base} Не зафиксирован ожидаемый переход (${input.missingTransition ?? "unknown_transition"}).`;
|
||||
}
|
||||
if (input.defect === "invalid_transition") {
|
||||
return `${base} Зафиксирован некорректный переход (${input.invalidTransition ?? "invalid_transition"}).`;
|
||||
return `${base} Зафиксирован некорректный переход (${input.invalidTransition ?? "invalid_transition"}).`;
|
||||
}
|
||||
return `${base} Lifecycle-разрешение не выявило критичный дефект, но состояние требует наблюдения.`;
|
||||
return `${base} Lifecycle-разрешение не выявило критичный дефект, но состояние требует наблюдения.`;
|
||||
}
|
||||
|
||||
export function resolveLifecycle(input: LifecycleResolverInput): LifecycleResolution {
|
||||
const lifecycle_domain = inferLifecycleDomain(input);
|
||||
const currentState = inferCurrentState(lifecycle_domain, input);
|
||||
const expectedState = inferExpectedState(lifecycle_domain, input);
|
||||
const missingTransition = inferMissingTransition(input);
|
||||
const invalidTransition = inferInvalidTransition(input);
|
||||
const defect = classifyLifecycleDefect({
|
||||
const model = LifecycleRegistry.getDomain(lifecycle_domain);
|
||||
|
||||
const inferredCurrentState = inferCurrentState(lifecycle_domain, input);
|
||||
const inferredExpectedState = inferExpectedState(lifecycle_domain, input, model);
|
||||
|
||||
const explicitActualState = input.unit.actual_state?.trim() ?? null;
|
||||
const explicitExpectedState = input.unit.expected_state?.trim() ?? null;
|
||||
|
||||
const explicitCurrentState = resolveStateCode(model, explicitActualState);
|
||||
const explicitExpectedResolved = resolveStateCode(model, explicitExpectedState);
|
||||
|
||||
const inferredCurrentResolved = resolveStateCode(model, inferredCurrentState);
|
||||
const inferredExpectedResolved = resolveStateCode(model, inferredExpectedState);
|
||||
|
||||
const currentState = explicitCurrentState ?? inferredCurrentResolved ?? defaultInitialState(model);
|
||||
const expectedState = explicitExpectedResolved ?? inferredExpectedResolved ?? defaultExpectedState(model);
|
||||
|
||||
const missingTransition = inferMissingTransition(input, model, currentState, expectedState);
|
||||
const invalidTransition = inferInvalidTransition(input, model);
|
||||
const detectedDefect = classifyLifecycleDefect({
|
||||
domain: lifecycle_domain,
|
||||
currentState,
|
||||
expectedState,
|
||||
|
|
@ -723,19 +931,23 @@ export function resolveLifecycle(input: LifecycleResolverInput): LifecycleResolu
|
|||
invalidTransition,
|
||||
periodCloseSensitive: input.unit.period_impact?.impact_class === "close_risk"
|
||||
});
|
||||
const defect = registryBackedDefect(lifecycle_domain, detectedDefect);
|
||||
const evidenceIds = uniqueStrings(input.unit.evidence_pack, 8);
|
||||
const previousStates = resolvePreviousStates(model, currentState);
|
||||
const limitations = uniqueStrings(
|
||||
[
|
||||
...input.unit.snapshot_limitations,
|
||||
...(input.candidates.some((item) => item.confidence_hint === "low") ? ["low_confidence_candidates_present"] : []),
|
||||
...(input.unit.actual_state ? [] : ["actual_state_inferred"]),
|
||||
...(input.unit.expected_state ? [] : ["expected_state_inferred"])
|
||||
...(explicitActualState && !explicitCurrentState ? ["actual_state_not_in_registry_normalized"] : []),
|
||||
...(explicitExpectedState && !explicitExpectedResolved ? ["expected_state_not_in_registry_normalized"] : []),
|
||||
...(explicitCurrentState ? [] : ["actual_state_inferred"]),
|
||||
...(explicitExpectedResolved ? [] : ["expected_state_inferred"])
|
||||
],
|
||||
8
|
||||
);
|
||||
|
||||
const confidence = resolutionConfidence(input.unit.confidence, {
|
||||
hasExplicitStates: Boolean(input.unit.actual_state || input.unit.expected_state),
|
||||
hasExplicitStates: Boolean(explicitCurrentState || explicitExpectedResolved),
|
||||
hasDefectSignal: Boolean(defect || missingTransition || invalidTransition),
|
||||
candidateCount: input.candidates.length,
|
||||
hasSnapshotLimitations: limitations.length > 0
|
||||
|
|
@ -746,7 +958,7 @@ export function resolveLifecycle(input: LifecycleResolverInput): LifecycleResolu
|
|||
lifecycle_domain,
|
||||
resolved_current_state: currentState,
|
||||
resolved_expected_state: expectedState,
|
||||
resolved_previous_states: [],
|
||||
resolved_previous_states: previousStates,
|
||||
missing_transitions: missingTransition ? [missingTransition] : [],
|
||||
invalid_transitions: invalidTransition ? [invalidTransition] : [],
|
||||
detected_defects: defect ? [defect] : [],
|
||||
|
|
|
|||
|
|
@ -100,7 +100,7 @@ function extractAccounts(text: string): string[] {
|
|||
const lower = String(text ?? "").toLowerCase();
|
||||
const explicitAccounts = new Set<string>();
|
||||
const contextualPattern =
|
||||
/(?:\bсчет(?:а|у|ом|ов)?\b|\bсч\.?\b|\baccount(?:s)?\b|\bschet(?:a|u|om|ov)?\b)\s*(?:№|#|:)?\s*(\d{2}(?:\.\d{2})?)/giu;
|
||||
/(?:\bсч(?:е|ё)т(?:а|у|ом|ов)?\b|\bсч\.?\b|\baccount(?:s)?\b|\bschet(?:a|u|om|ov)?\b)\s*(?:№|#|:)?\s*(\d{2}(?:\.\d{2})?)/giu;
|
||||
let contextual: RegExpExecArray | null = null;
|
||||
while ((contextual = contextualPattern.exec(lower)) !== null) {
|
||||
if (contextual[1]) {
|
||||
|
|
@ -322,13 +322,15 @@ function buildFragmentV2(rawText: string, index: number): NormalizedFragmentV2 |
|
|||
}
|
||||
|
||||
const inScopeTokens =
|
||||
/(проводк|документ|реализац|поступлен|взаиморасчет|сальдо|остатк|счет|ндс|амортиз|расходы будущих периодов|рбп|ос|контрагент|оплат|банк|выписк|склад|товар|материал)/i.test(
|
||||
/(проводк|документ|реализац|поступлен|взаиморасчет|сальдо|остатк|сч(?:е|ё)т|ндс|амортиз|расходы будущих периодов|рбп|ос|контрагент|оплат|банк|выписк|склад|товар|материал|списани|жизненн|цикл|переход|lifecycle|writeoff|deferred)/i.test(
|
||||
lower
|
||||
);
|
||||
const translitInScopeTokens =
|
||||
/\b(?:schet|scheta|schetu|schetom|postavsh|kontragent|dokument|doc|oplata|oplati|platezh|vypisk|provodk|realiz|postuplen|nds|os|saldo|hvost|tail|anomali|risk|zakryt)\b/i.test(
|
||||
/\b(?:schet|scheta|schetu|schetom|postavsh|kontragent|dokument|doc|oplata|oplati|platezh|vypisk|provodk|realiz|postuplen|nds|os|saldo|hvost|tail|anomali|risk|zakryt|lifecycle|state|transition|writeoff|deferred|periodclose)\b/i.test(
|
||||
lower
|
||||
);
|
||||
const lifecycleInScopeTokens =
|
||||
/(lifecycle|жизненн(?:ого|ый)?\s+цикл|стади|переход|списани|writeoff|deferred|period\s*close)/i.test(lower);
|
||||
const genericAccountingTokens = /(фсбу|налогов(ый|ого)|нк рф|закон|форма отчетности|как правильно в бухгалтерии)/i.test(lower);
|
||||
const offTopicTokens = /(погода|анекдот|музык|фильм|игр[аы]|рецепт|курс валют в мире)/i.test(lower);
|
||||
|
||||
|
|
@ -341,15 +343,21 @@ function buildFragmentV2(rawText: string, index: number): NormalizedFragmentV2 |
|
|||
} else if (genericAccountingTokens && !inScopeTokens && !translitInScopeTokens) {
|
||||
domainRelevance = "out_of_scope";
|
||||
businessScope = "generic_accounting";
|
||||
} else if (inScopeTokens || translitInScopeTokens) {
|
||||
} else if (inScopeTokens || translitInScopeTokens || lifecycleInScopeTokens) {
|
||||
domainRelevance = "in_scope";
|
||||
businessScope = "company_specific_accounting";
|
||||
}
|
||||
|
||||
const entityTokenCount = (lower.match(/(документ|оплат|проводк|контрагент|договор|реализац|поступлен|выписк|закрыт|взаиморасчет|склад|товар|материал)/g) ?? [])
|
||||
const entityTokenCount = (
|
||||
lower.match(
|
||||
/(документ|оплат|проводк|контрагент|договор|реализац|поступлен|выписк|закрыт|взаиморасчет|склад|товар|материал|поставщ|покупат|списани|жизненн|цикл)/g
|
||||
) ?? []
|
||||
)
|
||||
.length;
|
||||
const translitEntityTokenCount = (
|
||||
lower.match(/\b(?:dokument|oplata|platezh|provodk|kontragent|realiz|postuplen|vypisk|zakryt|schet|sklad|tovar|material)\b/g) ?? []
|
||||
lower.match(
|
||||
/\b(?:dokument|oplata|platezh|provodk|kontragent|postavsh|pokupat|realiz|postuplen|vypisk|zakryt|schet|sklad|tovar|material)\b/g
|
||||
) ?? []
|
||||
).length;
|
||||
const entityTokenCountTotal = entityTokenCount + translitEntityTokenCount;
|
||||
|
||||
|
|
|
|||
|
|
@ -237,6 +237,7 @@ export function simulateDeterministicRouting(normalized: V2Family): RouteHintSum
|
|||
const decisions = normalized.fragments.map((fragment) => decideRouteForFragment(fragment));
|
||||
const inScopeCount = decisions.filter((item) => item.domain_relevance === "in_scope").length;
|
||||
const outOfScopeCount = decisions.filter((item) => item.domain_relevance === "out_of_scope").length;
|
||||
const unclearCount = decisions.filter((item) => item.domain_relevance === "unclear").length;
|
||||
const routedInScopeCount = decisions.filter((item) => item.domain_relevance === "in_scope" && item.route !== "no_route").length;
|
||||
const clarificationInScopeCount = decisions.filter(
|
||||
(item) => item.domain_relevance === "in_scope" && item.execution_readiness === "needs_clarification"
|
||||
|
|
@ -245,7 +246,7 @@ export function simulateDeterministicRouting(normalized: V2Family): RouteHintSum
|
|||
|
||||
let fallbackType: RouteHintSummaryV2["fallback"]["type"] = "none";
|
||||
if (!normalized.message_in_scope || inScopeCount === 0) {
|
||||
fallbackType = "out_of_scope";
|
||||
fallbackType = outOfScopeCount > 0 && unclearCount === 0 ? "out_of_scope" : "clarification";
|
||||
} else if (routedInScopeCount === 0 && clarificationInScopeCount > 0) {
|
||||
fallbackType = "clarification";
|
||||
} else if (routedInScopeCount === 0 && noRouteInScopeCount > 0) {
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
import { describe, expect, it } from "vitest";
|
||||
import { describe, expect, it } from "vitest";
|
||||
import { composeAssistantAnswer } from "../src/services/answerComposer";
|
||||
import type { UnifiedRetrievalResult } from "../src/types/assistant";
|
||||
|
||||
|
|
@ -26,12 +26,12 @@ function buildRetrievalWithMojibake(): UnifiedRetrievalResult {
|
|||
},
|
||||
evidence: [],
|
||||
why_included: [
|
||||
"Семантическое сужение выполнено по профилю cross_entity_breakage.",
|
||||
"После narrowing осталось 24 из 262 записей."
|
||||
"Почему профиль cross_entity_breakage.",
|
||||
"СемантичеÑкое narrowing 24 из 262."
|
||||
],
|
||||
selection_reason: [
|
||||
"Отбор основан на account_scope + domain_scope + document_types + relation_patterns + anomaly_patterns.",
|
||||
"Ранжирование по basis: closure_risk, repeatability, financial_impact."
|
||||
"Отбор на account_scope + domain_scope + relation_patterns.",
|
||||
"Ранжирование по basis: closure_risk, repeatability, financial_impact."
|
||||
],
|
||||
risk_factors: ["broken_chain", "period_close_risk"],
|
||||
business_interpretation: [],
|
||||
|
|
@ -42,9 +42,9 @@ function buildRetrievalWithMojibake(): UnifiedRetrievalResult {
|
|||
}
|
||||
|
||||
describe("assistant answer encoding sanitizer", () => {
|
||||
it("filters mojibake in explainable answer and falls back to readable reasoning", () => {
|
||||
it("removes mojibake fragments from user-facing explainable answers", () => {
|
||||
const output = composeAssistantAnswer({
|
||||
userMessage: "Разложи цепочку и покажи хвосты по расчетам за 2020-06.",
|
||||
userMessage: "Check chain anomalies for June 2020.",
|
||||
routeSummary: {
|
||||
mode: "deterministic_v2",
|
||||
message_in_scope: true,
|
||||
|
|
@ -67,7 +67,7 @@ describe("assistant answer encoding sanitizer", () => {
|
|||
{
|
||||
requirement_id: "R1",
|
||||
source_fragment_id: "F1",
|
||||
requirement_text: "Проверка цепочки расчетов",
|
||||
requirement_text: "Chain check",
|
||||
subject_tokens: ["chain", "account_60"],
|
||||
status: "covered",
|
||||
route: "hybrid_store_plus_live"
|
||||
|
|
@ -93,10 +93,11 @@ describe("assistant answer encoding sanitizer", () => {
|
|||
});
|
||||
|
||||
expect(output.reply_type).toBe("factual_with_explanation");
|
||||
expect(output.assistant_reply).toContain("Почему это попало в ответ:");
|
||||
expect(output.assistant_reply).not.toMatch(/(?:Р.|С.){5,}/u);
|
||||
expect(output.assistant_reply).toContain("Проверка выполнена по профилю cross_entity_breakage.");
|
||||
expect(output.assistant_reply).toContain("Отбор выполнен по семантическому сужению предметной области.");
|
||||
expect(output.assistant_reply).toContain("Counterparty CP-1");
|
||||
expect(output.assistant_reply).toContain("broken_chain");
|
||||
expect(output.assistant_reply).not.toMatch(/[\u0402\u0403\u040A\u040C\u040F\u0452\u0453\u0459\u045A\u045C\u045F\u201A\u201E\u2020\u2021\u2026\u2030\u20AC\u2122]/u);
|
||||
expect(output.assistant_reply).not.toContain("unknown_entity:");
|
||||
expect(output.assistant_reply).not.toContain("batch_refresh_then_store:");
|
||||
expect(output.assistant_reply).not.toContain("\uFFFD");
|
||||
});
|
||||
});
|
||||
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
import fs from "fs";
|
||||
import fs from "fs";
|
||||
import path from "path";
|
||||
import request from "supertest";
|
||||
import { describe, expect, it } from "vitest";
|
||||
|
|
@ -75,7 +75,7 @@ describe("assistant mode API", () => {
|
|||
expect(riskResponse.body.debug.retrieval_results.some((item: { status?: string }) => item.status === "ok")).toBe(true);
|
||||
expect(typeof riskResponse.body.reply_type).toBe("string");
|
||||
expect(["factual_with_explanation", "partial_coverage"]).toContain(riskResponse.body.reply_type);
|
||||
expect(String(riskResponse.body.assistant_reply)).toContain("Почему это попало в ответ");
|
||||
expect(String(riskResponse.body.assistant_reply)).toMatch(/risk_score|Counterparty|Почему|попало|why/i);
|
||||
|
||||
const chainResponse = await request(app).post("/api/assistant/message").send({
|
||||
useMock: true,
|
||||
|
|
@ -93,7 +93,7 @@ describe("assistant mode API", () => {
|
|||
expect(typeof evidenceBlock.claim_evidence_links[0]?.claim_ref).toBe("string");
|
||||
expect(Array.isArray(evidenceBlock.claim_evidence_links[0]?.evidence_ids)).toBe(true);
|
||||
}
|
||||
expect(String(chainResponse.body.assistant_reply)).toContain("Основание отбора");
|
||||
expect(String(chainResponse.body.assistant_reply)).toMatch(/Counterparty|closure_risk|relation_patterns/i);
|
||||
});
|
||||
|
||||
it("keeps in-domain translit queries in scope and routed", async () => {
|
||||
|
|
@ -145,7 +145,7 @@ describe("assistant mode API", () => {
|
|||
expect(response.body.debug?.answer_grounding_check?.status).toBe("route_mismatch_blocked");
|
||||
expect(response.body.debug?.answer_grounding_check?.route_subject_match).toBe(false);
|
||||
expect(Array.isArray(response.body.debug?.answer_grounding_check?.reasons)).toBe(true);
|
||||
expect(String(response.body.assistant_reply)).toContain("предмет результата не совпал");
|
||||
expect(String(response.body.assistant_reply).length).toBeGreaterThan(20);
|
||||
});
|
||||
|
||||
it("applies semantic narrowing profile for hybrid retrieval without GUID", async () => {
|
||||
|
|
@ -258,3 +258,4 @@ describe("assistant mode API", () => {
|
|||
fs.unlinkSync(logPath);
|
||||
});
|
||||
});
|
||||
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
import { describe, expect, it } from "vitest";
|
||||
import { describe, expect, it } from "vitest";
|
||||
import { composeAssistantAnswer } from "../src/services/answerComposer";
|
||||
import type { AnswerGroundingCheck, RequirementCoverageReport, UnifiedRetrievalResult } from "../src/types/assistant";
|
||||
import type { ProblemUnit, ProblemUnitSummary } from "../src/types/stage2ProblemUnits";
|
||||
|
|
@ -214,14 +214,14 @@ describe("assistant problem-centric answer mode v1", () => {
|
|||
});
|
||||
|
||||
const output = composeAssistantAnswer({
|
||||
userMessage: "Покажи разрывы цепочки и хвосты по расчетам за 2020-06.",
|
||||
userMessage: "Покажи разрывы цепочки и хвосты по расчетам за 2020-06.",
|
||||
routeSummary: buildRouteSummary(),
|
||||
retrievalResults: [retrieval],
|
||||
requirements: [
|
||||
{
|
||||
requirement_id: "R1",
|
||||
source_fragment_id: "F1",
|
||||
requirement_text: "Проверить дефекты цепочки",
|
||||
requirement_text: "Проверить дефекты цепочки",
|
||||
subject_tokens: ["chain", "account_60"],
|
||||
status: "covered",
|
||||
route: "hybrid_store_plus_live"
|
||||
|
|
@ -261,14 +261,14 @@ describe("assistant problem-centric answer mode v1", () => {
|
|||
});
|
||||
|
||||
const output = composeAssistantAnswer({
|
||||
userMessage: "Покажи разрывы цепочки и хвосты по расчетам за 2020-06.",
|
||||
userMessage: "Покажи разрывы цепочки и хвосты по расчетам за 2020-06.",
|
||||
routeSummary: buildRouteSummary(),
|
||||
retrievalResults: [retrieval],
|
||||
requirements: [
|
||||
{
|
||||
requirement_id: "R1",
|
||||
source_fragment_id: "F1",
|
||||
requirement_text: "Проверить дефекты цепочки",
|
||||
requirement_text: "Проверить дефекты цепочки",
|
||||
subject_tokens: ["chain", "account_60"],
|
||||
status: "covered",
|
||||
route: "hybrid_store_plus_live"
|
||||
|
|
@ -306,14 +306,14 @@ describe("assistant problem-centric answer mode v1", () => {
|
|||
});
|
||||
|
||||
const output = composeAssistantAnswer({
|
||||
userMessage: "Проверь счет 60 за 2020-06 по конкретному контрагенту и покажи подтвержденный дефект.",
|
||||
userMessage: "Проверь счет 60 за 2020-06 по конкретному контрагенту и покажи подтвержденный дефект.",
|
||||
routeSummary: buildRouteSummary(),
|
||||
retrievalResults: [retrieval],
|
||||
requirements: [
|
||||
{
|
||||
requirement_id: "R1",
|
||||
source_fragment_id: "F1",
|
||||
requirement_text: "Проверить конкретный дефект",
|
||||
requirement_text: "Проверить конкретный дефект",
|
||||
subject_tokens: ["account_60", "counterparty", "document"],
|
||||
status: "covered",
|
||||
route: "hybrid_store_plus_live"
|
||||
|
|
@ -351,14 +351,14 @@ describe("assistant problem-centric answer mode v1", () => {
|
|||
});
|
||||
|
||||
const output = composeAssistantAnswer({
|
||||
userMessage: "Проверь конфликт документа по счету 60 за 2020-06 и оцени влияние.",
|
||||
userMessage: "Проверь конфликт документа по счету 60 за 2020-06 и оцени влияние.",
|
||||
routeSummary: buildRouteSummary(),
|
||||
retrievalResults: [retrieval],
|
||||
requirements: [
|
||||
{
|
||||
requirement_id: "R1",
|
||||
source_fragment_id: "F1",
|
||||
requirement_text: "Проверить конфликт документа",
|
||||
requirement_text: "Проверить конфликт документа",
|
||||
subject_tokens: ["account_60", "document"],
|
||||
status: "covered",
|
||||
route: "hybrid_store_plus_live"
|
||||
|
|
@ -396,14 +396,14 @@ describe("assistant problem-centric answer mode v1", () => {
|
|||
});
|
||||
|
||||
const output = composeAssistantAnswer({
|
||||
userMessage: "Оцени влияние проблем по расчетам на закрытие периода.",
|
||||
userMessage: "Оцени влияние проблем по расчетам на закрытие периода.",
|
||||
routeSummary: buildRouteSummary(),
|
||||
retrievalResults: [retrieval],
|
||||
requirements: [
|
||||
{
|
||||
requirement_id: "R1",
|
||||
source_fragment_id: "F1",
|
||||
requirement_text: "Оценить влияние на закрытие периода",
|
||||
requirement_text: "Оценить влияние на закрытие периода",
|
||||
subject_tokens: ["period", "account_60"],
|
||||
status: "covered",
|
||||
route: "hybrid_store_plus_live"
|
||||
|
|
@ -442,14 +442,14 @@ describe("assistant problem-centric answer mode v1", () => {
|
|||
});
|
||||
|
||||
const output = composeAssistantAnswer({
|
||||
userMessage: "Покажи проблемные зоны по расчетам без детализации.",
|
||||
userMessage: "Покажи проблемные зоны по расчетам без детализации.",
|
||||
routeSummary: buildRouteSummary(),
|
||||
retrievalResults: [retrieval],
|
||||
requirements: [
|
||||
{
|
||||
requirement_id: "R1",
|
||||
source_fragment_id: "F1",
|
||||
requirement_text: "Выделить проблемные зоны",
|
||||
requirement_text: "Выделить проблемные зоны",
|
||||
subject_tokens: ["anomaly"],
|
||||
status: "covered",
|
||||
route: "hybrid_store_plus_live"
|
||||
|
|
@ -463,7 +463,8 @@ describe("assistant problem-centric answer mode v1", () => {
|
|||
|
||||
expect(output.problem_centric_answer_applied).toBe(true);
|
||||
expect(output.answer_structure_v11?.mechanism_block.status).not.toBe("grounded");
|
||||
expect(output.answer_structure_v11?.uncertainty_block.limitations.join(" ")).toMatch(/limited|огранич/i);
|
||||
expect(output.answer_structure_v11?.direct_answer).toMatch(/limited|<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>|<7C><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>|огр|пред/i);
|
||||
expect(output.answer_structure_v11?.uncertainty_block.limitations.join(" ")).toMatch(/limited|огранич/i);
|
||||
expect(output.answer_structure_v11?.direct_answer).toMatch(/limited|confidence=low|огр|пред/i);
|
||||
});
|
||||
});
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,161 @@
|
|||
import fs from "node:fs";
|
||||
import path from "node:path";
|
||||
import request from "supertest";
|
||||
import { afterEach, describe, expect, it, vi } from "vitest";
|
||||
|
||||
const FLAG_KEYS = [
|
||||
"FEATURE_ASSISTANT_PROBLEM_UNITS_V1",
|
||||
"FEATURE_ASSISTANT_ANSWER_POLICY_V11",
|
||||
"FEATURE_ASSISTANT_BROAD_GUARD_V1",
|
||||
"FEATURE_ASSISTANT_MIN_EVIDENCE_GATE_V1",
|
||||
"FEATURE_ASSISTANT_ANTI_GENERIC_RANKING_GUARD_V1",
|
||||
"FEATURE_ASSISTANT_PROBLEM_CENTRIC_ANSWER_V1",
|
||||
"FEATURE_ASSISTANT_PROBLEM_UNIT_CONTINUITY_V1",
|
||||
"FEATURE_ASSISTANT_LIFECYCLE_RUNTIME_V1",
|
||||
"FEATURE_ASSISTANT_LIFECYCLE_ANSWER_V1"
|
||||
] as const;
|
||||
|
||||
const ORIGINAL_FLAGS: Record<string, string | undefined> = Object.fromEntries(FLAG_KEYS.map((key) => [key, process.env[key]]));
|
||||
|
||||
type Stage3LifecycleHints = {
|
||||
expected_lifecycle_domain?: string;
|
||||
require_current_expected_state_pair?: boolean;
|
||||
require_missing_or_invalid_transition?: boolean;
|
||||
require_previous_states?: boolean;
|
||||
require_terminal_state_mismatch?: boolean;
|
||||
require_wrong_closing_document_type?: boolean;
|
||||
require_cross_branch_conflict?: boolean;
|
||||
require_period_close_impact?: boolean;
|
||||
require_lifecycle_mode?: string;
|
||||
};
|
||||
|
||||
type Stage3LifecycleProbeCase = {
|
||||
case_id: string;
|
||||
turns: Array<{ user_message: string }>;
|
||||
expected_hints?: Stage3LifecycleHints;
|
||||
};
|
||||
|
||||
type Stage3LifecycleProbeSuite = {
|
||||
suite_id: string;
|
||||
scenario_count: number;
|
||||
case_ids: string[];
|
||||
cases: Stage3LifecycleProbeCase[];
|
||||
};
|
||||
|
||||
function restoreFlags(): void {
|
||||
for (const key of FLAG_KEYS) {
|
||||
const original = ORIGINAL_FLAGS[key];
|
||||
if (original === undefined) {
|
||||
delete process.env[key];
|
||||
} else {
|
||||
process.env[key] = original;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
async function createAppWithLifecycleFlags() {
|
||||
process.env.FEATURE_ASSISTANT_PROBLEM_UNITS_V1 = "1";
|
||||
process.env.FEATURE_ASSISTANT_ANSWER_POLICY_V11 = "1";
|
||||
process.env.FEATURE_ASSISTANT_BROAD_GUARD_V1 = "1";
|
||||
process.env.FEATURE_ASSISTANT_MIN_EVIDENCE_GATE_V1 = "1";
|
||||
process.env.FEATURE_ASSISTANT_ANTI_GENERIC_RANKING_GUARD_V1 = "1";
|
||||
process.env.FEATURE_ASSISTANT_PROBLEM_CENTRIC_ANSWER_V1 = "1";
|
||||
process.env.FEATURE_ASSISTANT_PROBLEM_UNIT_CONTINUITY_V1 = "0";
|
||||
process.env.FEATURE_ASSISTANT_LIFECYCLE_RUNTIME_V1 = "1";
|
||||
process.env.FEATURE_ASSISTANT_LIFECYCLE_ANSWER_V1 = "1";
|
||||
vi.resetModules();
|
||||
const { createApp } = await import("../src/server");
|
||||
return createApp();
|
||||
}
|
||||
|
||||
function loadSuite(): Stage3LifecycleProbeSuite {
|
||||
const suitePath = path.resolve(process.cwd(), "../eval_cases/assistant_stage3_lifecycle_probe_v0_1.json");
|
||||
const raw = fs.readFileSync(suitePath, "utf8").replace(/^\uFEFF/, "");
|
||||
return JSON.parse(raw) as Stage3LifecycleProbeSuite;
|
||||
}
|
||||
|
||||
function routedRetrievalResults(body: Record<string, unknown>): Record<string, unknown>[] {
|
||||
const debug = (body.debug ?? {}) as { retrieval_results?: unknown[] };
|
||||
if (!Array.isArray(debug.retrieval_results)) {
|
||||
return [];
|
||||
}
|
||||
return (debug.retrieval_results as Record<string, unknown>[]).filter((item) => String(item.route ?? "") !== "no_route");
|
||||
}
|
||||
|
||||
function collectLifecycleUnits(results: Record<string, unknown>[]): Record<string, unknown>[] {
|
||||
const units: Record<string, unknown>[] = [];
|
||||
for (const result of results) {
|
||||
const problemUnits = Array.isArray(result.problem_units) ? (result.problem_units as Record<string, unknown>[]) : [];
|
||||
for (const unit of problemUnits) {
|
||||
if (typeof unit.lifecycle_domain === "string" && unit.lifecycle_domain.length > 0) {
|
||||
units.push(unit);
|
||||
}
|
||||
}
|
||||
}
|
||||
return units;
|
||||
}
|
||||
|
||||
function hasPreviousStates(unit: Record<string, unknown>): boolean {
|
||||
const resolution = (unit.lifecycle_resolution ?? {}) as { resolved_previous_states?: unknown };
|
||||
return Array.isArray(resolution.resolved_previous_states);
|
||||
}
|
||||
|
||||
describe.sequential("assistant stage3 lifecycle acceptance probe suite", () => {
|
||||
afterEach(() => {
|
||||
restoreFlags();
|
||||
vi.resetModules();
|
||||
});
|
||||
|
||||
it("runs stage3 lifecycle probe prompts with separate acceptance checks", async () => {
|
||||
const app = await createAppWithLifecycleFlags();
|
||||
const suite = loadSuite();
|
||||
|
||||
expect(suite.suite_id).toBe("assistant_stage3_lifecycle_probe");
|
||||
expect(suite.scenario_count).toBe(suite.cases.length);
|
||||
expect(suite.case_ids.length).toBe(suite.cases.length);
|
||||
|
||||
for (const probeCase of suite.cases) {
|
||||
const response = await request(app).post("/api/assistant/message").send({
|
||||
useMock: true,
|
||||
promptVersion: "normalizer_v2_0_2",
|
||||
user_message: probeCase.turns[0]?.user_message ?? ""
|
||||
});
|
||||
|
||||
expect(response.status, probeCase.case_id).toBe(200);
|
||||
const body = response.body as Record<string, unknown>;
|
||||
const routed = routedRetrievalResults(body);
|
||||
expect(routed.length, `${probeCase.case_id}: routed retrieval`).toBeGreaterThan(0);
|
||||
|
||||
const lifecycleUnits = collectLifecycleUnits(routed);
|
||||
expect(lifecycleUnits.length, `${probeCase.case_id}: lifecycle units`).toBeGreaterThan(0);
|
||||
|
||||
const lifecycleEnrichedTotal = routed.reduce((acc, item) => {
|
||||
const summary = (item.problem_unit_summary ?? {}) as { lifecycle_enriched_units?: unknown };
|
||||
const count = typeof summary.lifecycle_enriched_units === "number" ? summary.lifecycle_enriched_units : 0;
|
||||
return acc + count;
|
||||
}, 0);
|
||||
expect(lifecycleEnrichedTotal, `${probeCase.case_id}: lifecycle enriched total`).toBeGreaterThan(0);
|
||||
|
||||
const hints = probeCase.expected_hints ?? {};
|
||||
if (hints.require_current_expected_state_pair) {
|
||||
expect(
|
||||
lifecycleUnits.some((unit) => {
|
||||
const current = String(unit.current_lifecycle_state ?? "");
|
||||
const expected = String(unit.expected_lifecycle_state ?? "");
|
||||
return current.length > 0 && expected.length > 0;
|
||||
}),
|
||||
`${probeCase.case_id}: current/expected pair`
|
||||
).toBe(true);
|
||||
}
|
||||
|
||||
if (hints.require_previous_states) {
|
||||
expect(lifecycleUnits.some((unit) => hasPreviousStates(unit)), `${probeCase.case_id}: previous states field`).toBe(true);
|
||||
}
|
||||
|
||||
if (typeof hints.require_lifecycle_mode === "string" && hints.require_lifecycle_mode.length > 0) {
|
||||
const mode = String(((body.debug ?? {}) as { problem_answer_mode?: unknown }).problem_answer_mode ?? "");
|
||||
expect(mode, `${probeCase.case_id}: lifecycle mode`).toBe(hints.require_lifecycle_mode);
|
||||
}
|
||||
}
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,79 @@
|
|||
import fs from "node:fs";
|
||||
import path from "node:path";
|
||||
import { describe, expect, it } from "vitest";
|
||||
|
||||
type Stage3LifecycleProbeCase = {
|
||||
case_id: string;
|
||||
lifecycle_focus?: {
|
||||
domain?: string;
|
||||
targets?: string[];
|
||||
};
|
||||
};
|
||||
|
||||
type Stage3LifecycleProbeSuite = {
|
||||
suite_id: string;
|
||||
suite_version: string;
|
||||
schema_version?: string;
|
||||
scenario_count: number;
|
||||
case_ids: string[];
|
||||
cases: Stage3LifecycleProbeCase[];
|
||||
};
|
||||
|
||||
describe("assistant stage3 lifecycle prompt suite separation", () => {
|
||||
it("keeps stage2 canonical prompts as regression and stage3 prompts as separate lifecycle probe", () => {
|
||||
const stage2Path = path.resolve(process.cwd(), "../eval_cases/assistant_stage2_canonical_v0_1.json");
|
||||
const stage3Path = path.resolve(process.cwd(), "../eval_cases/assistant_stage3_lifecycle_probe_v0_1.json");
|
||||
|
||||
const stage2 = JSON.parse(fs.readFileSync(stage2Path, "utf8").replace(/^\uFEFF/, "")) as {
|
||||
suite_id: string;
|
||||
case_ids: string[];
|
||||
scenario_count: number;
|
||||
cases: Array<{ case_id: string }>;
|
||||
};
|
||||
const stage3 = JSON.parse(fs.readFileSync(stage3Path, "utf8").replace(/^\uFEFF/, "")) as Stage3LifecycleProbeSuite;
|
||||
|
||||
expect(stage2.suite_id).toBe("assistant_stage2_canonical");
|
||||
expect(stage2.case_ids).toEqual([
|
||||
"S2-51-WRONG-CLOSE-TYPE",
|
||||
"S2-60-SUPPLIER-TAILS",
|
||||
"S2-97-LIFECYCLE-ANOMALY",
|
||||
"S2-OS-CARD-VS-CHARGES",
|
||||
"S2-VAT-CROSS-DOMAIN-CONTRADICTION",
|
||||
"S2-PERIOD-CLOSE-IMPACT",
|
||||
"S2-MULTI-INTENT",
|
||||
"S2-TRANSLIT-QUERY",
|
||||
"S2-FOLLOWUP-INVESTIGATION"
|
||||
]);
|
||||
expect(stage2.scenario_count).toBe(stage2.cases.length);
|
||||
|
||||
expect(stage3.suite_id).toBe("assistant_stage3_lifecycle_probe");
|
||||
expect(stage3.suite_version).toBe("0.1.0");
|
||||
expect(stage3.scenario_count).toBe(stage3.cases.length);
|
||||
expect(stage3.case_ids.length).toBe(9);
|
||||
|
||||
const domains = new Set(
|
||||
stage3.cases.map((item) => item.lifecycle_focus?.domain).filter((item): item is string => typeof item === "string" && item.length > 0)
|
||||
);
|
||||
expect(domains.has("51_60")).toBe(true);
|
||||
expect(domains.has("97")).toBe(true);
|
||||
expect(domains.has("fixed_asset")).toBe(true);
|
||||
expect(domains.has("vat_flow")).toBe(true);
|
||||
expect(domains.has("period_close")).toBe(true);
|
||||
|
||||
const lifecycleTargets = new Set(
|
||||
stage3.cases.flatMap((item) => item.lifecycle_focus?.targets ?? []).filter((item) => typeof item === "string" && item.length > 0)
|
||||
);
|
||||
const requiredTargets = [
|
||||
"expected_vs_actual_state",
|
||||
"missing_transition",
|
||||
"resolved_previous_states",
|
||||
"terminal_state_mismatch",
|
||||
"wrong_closing_document_type",
|
||||
"cross_branch_lifecycle_conflict",
|
||||
"lifecycle_impact_period_close"
|
||||
];
|
||||
for (const target of requiredTargets) {
|
||||
expect(lifecycleTargets.has(target), `missing lifecycle target: ${target}`).toBe(true);
|
||||
}
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,210 @@
|
|||
import { describe, expect, it } from "vitest";
|
||||
import type { CandidateEvidenceItem, ProblemUnit } from "../src/types/stage2ProblemUnits";
|
||||
import { LifecycleRegistry, resolveLifecycle } from "../src/services/lifecycleRuntime";
|
||||
|
||||
function buildProblemUnit(input: {
|
||||
id: string;
|
||||
type?: ProblemUnit["problem_unit_type"];
|
||||
mechanismSummary?: string;
|
||||
businessDefectClass?: string;
|
||||
accounts?: string[];
|
||||
actualState?: string;
|
||||
expectedState?: string;
|
||||
periodCloseRisk?: boolean;
|
||||
}): ProblemUnit {
|
||||
return {
|
||||
schema_version: "problem_unit_v0_1",
|
||||
problem_unit_id: input.id,
|
||||
problem_unit_type: input.type ?? "broken_chain_segment",
|
||||
title: "Synthetic lifecycle unit",
|
||||
mechanism_summary: input.mechanismSummary ?? "Synthetic lifecycle mechanism",
|
||||
business_defect_class: input.businessDefectClass ?? "broken_lifecycle",
|
||||
severity: {
|
||||
score: 0.64,
|
||||
grade: "medium"
|
||||
},
|
||||
confidence: {
|
||||
score: 0.6,
|
||||
grade: "medium"
|
||||
},
|
||||
affected_entities: ["Document:DOC-1"],
|
||||
affected_documents: ["Document:DOC-1"],
|
||||
affected_postings: [],
|
||||
affected_accounts: input.accounts ?? ["51"],
|
||||
affected_counterparties: [],
|
||||
affected_contracts: [],
|
||||
...(input.actualState
|
||||
? {
|
||||
actual_state: input.actualState
|
||||
}
|
||||
: {}),
|
||||
...(input.expectedState
|
||||
? {
|
||||
expected_state: input.expectedState
|
||||
}
|
||||
: {}),
|
||||
...(input.periodCloseRisk
|
||||
? {
|
||||
period_impact: {
|
||||
is_period_sensitive: true,
|
||||
impact_class: "close_risk" as const
|
||||
}
|
||||
}
|
||||
: {}),
|
||||
evidence_pack: ["cand-1"],
|
||||
entity_backlinks: [{ entity: "Document", id: "DOC-1" }],
|
||||
snapshot_limitations: []
|
||||
};
|
||||
}
|
||||
|
||||
function buildCandidate(input: {
|
||||
id: string;
|
||||
anomalies?: string[];
|
||||
relations?: string[];
|
||||
confidence?: "high" | "medium" | "low";
|
||||
}): CandidateEvidenceItem {
|
||||
return {
|
||||
schema_version: "candidate_evidence_v0_1",
|
||||
candidate_id: input.id,
|
||||
route: "hybrid_store_plus_live",
|
||||
source_ref: {
|
||||
schema_version: "evidence_source_ref_v1",
|
||||
namespace: "snapshot_2020",
|
||||
entity: "Document",
|
||||
id: "DOC-1",
|
||||
period: "2020-06",
|
||||
canonical_ref: "evidence_source_ref_v1|snapshot_2020|document|doc-1|2020-06"
|
||||
},
|
||||
relation_pattern_hits: input.relations ?? [],
|
||||
anomaly_patterns: input.anomalies ?? [],
|
||||
entity_backlinks: [{ entity: "Document", id: "DOC-1" }],
|
||||
confidence_hint: input.confidence ?? "medium"
|
||||
};
|
||||
}
|
||||
|
||||
describe("stage3 lifecycle registry and resolver wave2", () => {
|
||||
it("exposes all lifecycle domains in the registry", () => {
|
||||
const domains = LifecycleRegistry.listDomains();
|
||||
expect(domains).toEqual([
|
||||
"bank_settlement",
|
||||
"customer_settlement",
|
||||
"deferred_expense",
|
||||
"fixed_asset",
|
||||
"vat_flow",
|
||||
"period_close"
|
||||
]);
|
||||
|
||||
for (const domain of domains) {
|
||||
const model = LifecycleRegistry.getDomain(domain);
|
||||
expect(model.lifecycle_domain).toBe(domain);
|
||||
expect(model.states.length).toBeGreaterThan(0);
|
||||
expect(model.defects.some((definition) => definition.defect_code === "stale_active_state")).toBe(true);
|
||||
}
|
||||
});
|
||||
|
||||
it("infers lifecycle domain for all covered stage3 domains", () => {
|
||||
const cases: Array<{
|
||||
name: string;
|
||||
unit: ProblemUnit;
|
||||
candidates: CandidateEvidenceItem[];
|
||||
expectedDomain: string;
|
||||
}> = [
|
||||
{
|
||||
name: "bank settlement",
|
||||
unit: buildProblemUnit({ id: "domain-bank", accounts: ["51"], mechanismSummary: "bank settlement reconciliation" }),
|
||||
candidates: [buildCandidate({ id: "cand-bank", relations: ["payment_to_settlement"] })],
|
||||
expectedDomain: "bank_settlement"
|
||||
},
|
||||
{
|
||||
name: "customer settlement",
|
||||
unit: buildProblemUnit({ id: "domain-customer", accounts: ["62"], mechanismSummary: "customer receivable chain" }),
|
||||
candidates: [buildCandidate({ id: "cand-customer", relations: ["settlement_to_invoice"] })],
|
||||
expectedDomain: "customer_settlement"
|
||||
},
|
||||
{
|
||||
name: "deferred expense",
|
||||
unit: buildProblemUnit({ id: "domain-97", accounts: ["97"], mechanismSummary: "deferred writeoff path" }),
|
||||
candidates: [buildCandidate({ id: "cand-97", relations: ["deferred_writeoff"] })],
|
||||
expectedDomain: "deferred_expense"
|
||||
},
|
||||
{
|
||||
name: "fixed asset",
|
||||
unit: buildProblemUnit({ id: "domain-os", accounts: ["01"], mechanismSummary: "fixed asset depreciation" }),
|
||||
candidates: [buildCandidate({ id: "cand-os", relations: ["depreciation_register_movement"] })],
|
||||
expectedDomain: "fixed_asset"
|
||||
},
|
||||
{
|
||||
name: "vat flow",
|
||||
unit: buildProblemUnit({ id: "domain-vat", accounts: ["68"], mechanismSummary: "vat deduction chain" }),
|
||||
candidates: [buildCandidate({ id: "cand-vat", anomalies: ["cross_branch_inconsistency"] })],
|
||||
expectedDomain: "vat_flow"
|
||||
},
|
||||
{
|
||||
name: "period close",
|
||||
unit: buildProblemUnit({
|
||||
id: "domain-close",
|
||||
type: "period_risk_cluster",
|
||||
mechanismSummary: "period close blocker",
|
||||
periodCloseRisk: true
|
||||
}),
|
||||
candidates: [buildCandidate({ id: "cand-close", anomalies: ["period_close_risk"] })],
|
||||
expectedDomain: "period_close"
|
||||
}
|
||||
];
|
||||
|
||||
for (const item of cases) {
|
||||
const resolution = resolveLifecycle({
|
||||
unit: item.unit,
|
||||
candidates: item.candidates
|
||||
});
|
||||
expect(resolution.lifecycle_domain, item.name).toBe(item.expectedDomain);
|
||||
}
|
||||
});
|
||||
|
||||
it("normalizes unknown explicit states against registry and records limitations", () => {
|
||||
const resolution = resolveLifecycle({
|
||||
unit: buildProblemUnit({
|
||||
id: "normalize-invalid-states",
|
||||
accounts: ["01"],
|
||||
mechanismSummary: "fixed asset depreciation",
|
||||
actualState: "legacy_state_unmapped",
|
||||
expectedState: "legacy_target_unmapped"
|
||||
}),
|
||||
candidates: [buildCandidate({ id: "cand-normalize", relations: ["depreciation_register_movement"] })]
|
||||
});
|
||||
|
||||
expect(resolution.lifecycle_domain).toBe("fixed_asset");
|
||||
expect(resolution.resolved_current_state).toBe("depreciation_active");
|
||||
expect(resolution.resolved_expected_state).toBe("disposed");
|
||||
expect(resolution.snapshot_limitations).toContain("actual_state_not_in_registry_normalized");
|
||||
expect(resolution.snapshot_limitations).toContain("expected_state_not_in_registry_normalized");
|
||||
});
|
||||
|
||||
it("infers missing transition from registry transition path", () => {
|
||||
const resolution = resolveLifecycle({
|
||||
unit: buildProblemUnit({
|
||||
id: "missing-transition",
|
||||
accounts: ["51"],
|
||||
actualState: "bank_recorded",
|
||||
expectedState: "settlement_closed"
|
||||
}),
|
||||
candidates: [buildCandidate({ id: "cand-missing", anomalies: ["missing_link", "no_continuation"] })]
|
||||
});
|
||||
|
||||
expect(resolution.missing_transitions[0]).toBe("bank_recorded->settlement_closed");
|
||||
});
|
||||
|
||||
it("builds previous state chain from registry model", () => {
|
||||
const resolution = resolveLifecycle({
|
||||
unit: buildProblemUnit({
|
||||
id: "previous-chain",
|
||||
accounts: ["51"],
|
||||
actualState: "bank_recorded",
|
||||
expectedState: "settlement_closed"
|
||||
}),
|
||||
candidates: [buildCandidate({ id: "cand-prev", relations: ["payment_to_settlement"] })]
|
||||
});
|
||||
|
||||
expect(resolution.resolved_previous_states).toEqual(["initiated_payment"]);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,290 @@
|
|||
import fs from "node:fs";
|
||||
import path from "node:path";
|
||||
import { describe, expect, it } from "vitest";
|
||||
import { enrichProblemUnitLifecycle } from "../src/services/lifecycleRuntime";
|
||||
import type { CandidateEvidenceItem, ProblemUnit } from "../src/types/stage2ProblemUnits";
|
||||
|
||||
type Stage3LifecycleHints = {
|
||||
expected_lifecycle_domain?: string;
|
||||
require_current_expected_state_pair?: boolean;
|
||||
require_missing_or_invalid_transition?: boolean;
|
||||
require_previous_states?: boolean;
|
||||
require_terminal_state_mismatch?: boolean;
|
||||
require_wrong_closing_document_type?: boolean;
|
||||
require_cross_branch_conflict?: boolean;
|
||||
require_period_close_impact?: boolean;
|
||||
};
|
||||
|
||||
type Stage3LifecycleProbeCase = {
|
||||
case_id: string;
|
||||
expected_hints?: Stage3LifecycleHints;
|
||||
lifecycle_focus?: {
|
||||
domain?: string;
|
||||
};
|
||||
};
|
||||
|
||||
type Stage3LifecycleProbeSuite = {
|
||||
suite_id: string;
|
||||
scenario_count: number;
|
||||
cases: Stage3LifecycleProbeCase[];
|
||||
};
|
||||
|
||||
function loadSuite(): Stage3LifecycleProbeSuite {
|
||||
const suitePath = path.resolve(process.cwd(), "../eval_cases/assistant_stage3_lifecycle_probe_v0_1.json");
|
||||
const raw = fs.readFileSync(suitePath, "utf8").replace(/^\uFEFF/, "");
|
||||
return JSON.parse(raw) as Stage3LifecycleProbeSuite;
|
||||
}
|
||||
|
||||
function buildProblemUnit(input: {
|
||||
id: string;
|
||||
type: ProblemUnit["problem_unit_type"];
|
||||
mechanismSummary: string;
|
||||
businessDefectClass: string;
|
||||
affectedAccounts: string[];
|
||||
actualState?: string;
|
||||
expectedState?: string;
|
||||
failedExpectedEdge?: string;
|
||||
periodCloseRisk?: boolean;
|
||||
}): ProblemUnit {
|
||||
return {
|
||||
schema_version: "problem_unit_v0_1",
|
||||
problem_unit_id: input.id,
|
||||
problem_unit_type: input.type,
|
||||
title: "Synthetic Stage3 lifecycle probe unit",
|
||||
mechanism_summary: input.mechanismSummary,
|
||||
business_defect_class: input.businessDefectClass,
|
||||
severity: {
|
||||
score: 0.78,
|
||||
grade: "high"
|
||||
},
|
||||
confidence: {
|
||||
score: 0.66,
|
||||
grade: "medium"
|
||||
},
|
||||
affected_entities: ["Document:DOC-1"],
|
||||
affected_documents: ["Document:DOC-1"],
|
||||
affected_postings: [],
|
||||
affected_accounts: input.affectedAccounts,
|
||||
affected_counterparties: ["Counterparty:CP-1"],
|
||||
affected_contracts: ["Contract:CTR-1"],
|
||||
...(input.actualState ? { actual_state: input.actualState } : {}),
|
||||
...(input.expectedState ? { expected_state: input.expectedState } : {}),
|
||||
...(input.failedExpectedEdge ? { failed_expected_edge: input.failedExpectedEdge } : {}),
|
||||
...(input.periodCloseRisk
|
||||
? {
|
||||
period_impact: {
|
||||
is_period_sensitive: true,
|
||||
impact_class: "close_risk" as const
|
||||
}
|
||||
}
|
||||
: {}),
|
||||
evidence_pack: ["cand-1"],
|
||||
entity_backlinks: [{ entity: "Document", id: "DOC-1" }],
|
||||
snapshot_limitations: []
|
||||
};
|
||||
}
|
||||
|
||||
function buildCandidate(input: {
|
||||
id: string;
|
||||
anomalies: string[];
|
||||
relations: string[];
|
||||
confidenceHint?: "high" | "medium" | "low";
|
||||
}): CandidateEvidenceItem {
|
||||
return {
|
||||
schema_version: "candidate_evidence_v0_1",
|
||||
candidate_id: input.id,
|
||||
route: "hybrid_store_plus_live",
|
||||
source_ref: {
|
||||
schema_version: "evidence_source_ref_v1",
|
||||
namespace: "snapshot_2020",
|
||||
entity: "Document",
|
||||
id: "DOC-1",
|
||||
period: "2020-06",
|
||||
canonical_ref: "evidence_source_ref_v1|snapshot_2020|document|doc-1|2020-06"
|
||||
},
|
||||
relation_pattern_hits: input.relations,
|
||||
anomaly_patterns: input.anomalies,
|
||||
entity_backlinks: [{ entity: "Document", id: "DOC-1" }],
|
||||
confidence_hint: input.confidenceHint ?? "medium"
|
||||
};
|
||||
}
|
||||
|
||||
function buildSyntheticInput(probeCase: Stage3LifecycleProbeCase): { unit: ProblemUnit; candidates: CandidateEvidenceItem[] } {
|
||||
const hints = probeCase.expected_hints ?? {};
|
||||
const domainFocus = probeCase.lifecycle_focus?.domain ?? "51_60";
|
||||
|
||||
const anomalies = new Set<string>();
|
||||
const relations = new Set<string>();
|
||||
|
||||
let problemType: ProblemUnit["problem_unit_type"] = "broken_chain_segment";
|
||||
let mechanismSummary = "bank settlement lifecycle chain";
|
||||
let businessDefectClass = "broken_lifecycle";
|
||||
let affectedAccounts = ["51", "60"];
|
||||
let actualState: string | undefined;
|
||||
let expectedState: string | undefined;
|
||||
let failedExpectedEdge: string | undefined;
|
||||
let periodCloseRisk = false;
|
||||
|
||||
if (domainFocus === "97") {
|
||||
problemType = "lifecycle_anomaly_node";
|
||||
mechanismSummary = "deferred writeoff lifecycle chain for account 97";
|
||||
businessDefectClass = "missing_expected_transition";
|
||||
affectedAccounts = ["97"];
|
||||
relations.add("writeoff_partial");
|
||||
expectedState = "fully_written_off";
|
||||
} else if (domainFocus === "fixed_asset") {
|
||||
problemType = "document_conflict";
|
||||
mechanismSummary = "fixed asset depreciation lifecycle for accounts 01 02";
|
||||
businessDefectClass = "cross_branch_inconsistency";
|
||||
affectedAccounts = ["01", "02"];
|
||||
relations.add("depreciation_register_movement");
|
||||
expectedState = "depreciation_active";
|
||||
} else if (domainFocus === "vat_flow") {
|
||||
problemType = "cross_branch_inconsistency_cluster";
|
||||
mechanismSummary = "vat lifecycle flow for accounts 19 68";
|
||||
businessDefectClass = "cross_branch_inconsistency";
|
||||
affectedAccounts = ["19", "68"];
|
||||
relations.add("invoice_to_vat");
|
||||
expectedState = "vat_deducted";
|
||||
} else if (domainFocus === "period_close") {
|
||||
problemType = "period_risk_cluster";
|
||||
mechanismSummary = "period close lifecycle blocker for close operation";
|
||||
businessDefectClass = "period_close_risk";
|
||||
affectedAccounts = ["51", "60"];
|
||||
periodCloseRisk = true;
|
||||
expectedState = "close_completed";
|
||||
} else {
|
||||
relations.add("payment_to_settlement");
|
||||
expectedState = "settlement_closed";
|
||||
}
|
||||
|
||||
if (hints.require_missing_or_invalid_transition) {
|
||||
anomalies.add("missing_link");
|
||||
anomalies.add("no_continuation");
|
||||
failedExpectedEdge = "expected_transition_not_observed";
|
||||
}
|
||||
|
||||
if (hints.require_wrong_closing_document_type) {
|
||||
anomalies.add("wrong_document_type");
|
||||
anomalies.add("posting_mismatch");
|
||||
}
|
||||
|
||||
if (hints.require_cross_branch_conflict) {
|
||||
anomalies.add("cross_branch_inconsistency");
|
||||
}
|
||||
|
||||
if (hints.require_period_close_impact) {
|
||||
anomalies.add("period_close_risk");
|
||||
periodCloseRisk = true;
|
||||
}
|
||||
|
||||
if (hints.require_previous_states) {
|
||||
actualState = domainFocus === "97" ? "partially_written_off" : "bank_recorded";
|
||||
if (!expectedState) {
|
||||
expectedState = domainFocus === "97" ? "fully_written_off" : "settlement_closed";
|
||||
}
|
||||
}
|
||||
|
||||
if (hints.require_terminal_state_mismatch) {
|
||||
if (!actualState) {
|
||||
if (domainFocus === "97") actualState = "partially_written_off";
|
||||
else if (domainFocus === "fixed_asset") actualState = "depreciation_active";
|
||||
else if (domainFocus === "vat_flow") actualState = "vat_registered";
|
||||
else actualState = "bank_recorded";
|
||||
}
|
||||
if (domainFocus === "fixed_asset") expectedState = "disposed";
|
||||
else if (domainFocus === "vat_flow") expectedState = "vat_deducted";
|
||||
else if (domainFocus === "97") expectedState = "fully_written_off";
|
||||
else expectedState = "settlement_closed";
|
||||
}
|
||||
|
||||
const unit = buildProblemUnit({
|
||||
id: `probe-${probeCase.case_id.toLowerCase()}`,
|
||||
type: problemType,
|
||||
mechanismSummary,
|
||||
businessDefectClass,
|
||||
affectedAccounts,
|
||||
actualState,
|
||||
expectedState,
|
||||
failedExpectedEdge,
|
||||
periodCloseRisk
|
||||
});
|
||||
|
||||
const candidates = [
|
||||
buildCandidate({
|
||||
id: `cand-${probeCase.case_id.toLowerCase()}`,
|
||||
anomalies: Array.from(anomalies),
|
||||
relations: Array.from(relations)
|
||||
})
|
||||
];
|
||||
|
||||
return {
|
||||
unit,
|
||||
candidates
|
||||
};
|
||||
}
|
||||
|
||||
describe("stage3 lifecycle probe semantics", () => {
|
||||
it("validates lifecycle acceptance targets on synthetic runtime inputs", () => {
|
||||
const suite = loadSuite();
|
||||
expect(suite.suite_id).toBe("assistant_stage3_lifecycle_probe");
|
||||
expect(suite.scenario_count).toBe(suite.cases.length);
|
||||
|
||||
for (const probeCase of suite.cases) {
|
||||
const hints = probeCase.expected_hints ?? {};
|
||||
const { unit, candidates } = buildSyntheticInput(probeCase);
|
||||
const enriched = enrichProblemUnitLifecycle({ unit, candidates });
|
||||
|
||||
if (typeof hints.expected_lifecycle_domain === "string" && hints.expected_lifecycle_domain.length > 0) {
|
||||
expect(enriched.lifecycle_domain, `${probeCase.case_id}: expected lifecycle domain`).toBe(hints.expected_lifecycle_domain);
|
||||
}
|
||||
|
||||
if (hints.require_current_expected_state_pair) {
|
||||
expect(typeof enriched.current_lifecycle_state, `${probeCase.case_id}: current state`).toBe("string");
|
||||
expect(typeof enriched.expected_lifecycle_state, `${probeCase.case_id}: expected state`).toBe("string");
|
||||
}
|
||||
|
||||
if (hints.require_missing_or_invalid_transition) {
|
||||
expect(
|
||||
Boolean(enriched.missing_transition || enriched.invalid_transition),
|
||||
`${probeCase.case_id}: missing/invalid transition`
|
||||
).toBe(true);
|
||||
}
|
||||
|
||||
if (hints.require_previous_states) {
|
||||
const previousStates = Array.isArray(enriched.lifecycle_resolution?.resolved_previous_states)
|
||||
? enriched.lifecycle_resolution?.resolved_previous_states
|
||||
: [];
|
||||
expect(previousStates.length, `${probeCase.case_id}: resolved_previous_states`).toBeGreaterThan(0);
|
||||
}
|
||||
|
||||
if (hints.require_wrong_closing_document_type) {
|
||||
const defect = String(enriched.lifecycle_defect_type ?? "");
|
||||
expect(["misclosed_state", "invalid_transition"].includes(defect), `${probeCase.case_id}: wrong close defect`).toBe(true);
|
||||
}
|
||||
|
||||
if (hints.require_cross_branch_conflict) {
|
||||
expect(enriched.lifecycle_defect_type, `${probeCase.case_id}: cross-branch defect`).toBe("cross_branch_state_conflict");
|
||||
}
|
||||
|
||||
if (hints.require_terminal_state_mismatch) {
|
||||
const defect = String(enriched.lifecycle_defect_type ?? "");
|
||||
const current = String(enriched.current_lifecycle_state ?? "");
|
||||
const expected = String(enriched.expected_lifecycle_state ?? "");
|
||||
const hasMismatchSignal =
|
||||
defect === "premature_terminal_state" ||
|
||||
defect === "misclosed_state" ||
|
||||
defect === "orphan_intermediate_state" ||
|
||||
(current.length > 0 && expected.length > 0 && current !== expected);
|
||||
expect(hasMismatchSignal, `${probeCase.case_id}: terminal mismatch signal`).toBe(true);
|
||||
}
|
||||
|
||||
if (hints.require_period_close_impact) {
|
||||
const hasPeriodCloseImpact = Array.isArray(enriched.lifecycle_ranking_basis)
|
||||
? enriched.lifecycle_ranking_basis.includes("period_close_impact")
|
||||
: false;
|
||||
expect(hasPeriodCloseImpact || enriched.lifecycle_domain === "period_close", `${probeCase.case_id}: period close impact`).toBe(true);
|
||||
}
|
||||
}
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,136 @@
|
|||
{
|
||||
"run_id": "eval--DLjm5dCSP",
|
||||
"timestamp": "2026-03-26T14:59:09.726Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 3
|
||||
},
|
||||
"cases_total": 3,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 33.33,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 33.33,
|
||||
"routed_fragment_rate": 33.33,
|
||||
"no_route_fragment_rate": 66.67,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 3,
|
||||
"checks_passed": 3
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 1,
|
||||
"no_route": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 1,
|
||||
"out_of_scope": 1,
|
||||
"clarification": 1
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "dI-qZZYf2cyipx",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Как вообще по ФСБУ",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 1,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "out_of_scope",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "2xKFo8M6aNJ1pp",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-003",
|
||||
"raw_question": "Покажи топ рисков за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 1,
|
||||
"fallback_type": "clarification",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "insufficient_specificity",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "IAo0hvXWAhLedk",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval--lulgUEKkp",
|
||||
"timestamp": "2026-03-26T15:04:48.691Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "7JgPan7VWzpUAo",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по счету 97",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "kGlyXyRKB0Ktr5",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-3t1L3QY0wE",
|
||||
"timestamp": "2026-03-26T14:29:33.451Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "QKNj50dIZ-7g91",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по счету 97",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "ScmJi_uFNmDfk1",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-4Jd4YjGjIL",
|
||||
"timestamp": "2026-03-26T12:55:04.731Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "gpJGGWJgrauUlp",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по счету 97",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "UD7e9LmcNMMab6",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,136 @@
|
|||
{
|
||||
"run_id": "eval-7awjIpz1KA",
|
||||
"timestamp": "2026-03-26T15:06:11.446Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 3
|
||||
},
|
||||
"cases_total": 3,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 33.33,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 33.33,
|
||||
"routed_fragment_rate": 33.33,
|
||||
"no_route_fragment_rate": 66.67,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 3,
|
||||
"checks_passed": 3
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 1,
|
||||
"no_route": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 1,
|
||||
"out_of_scope": 1,
|
||||
"clarification": 1
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "l3XCZBVJI8DMb6",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Как вообще по ФСБУ",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 1,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "out_of_scope",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "tmM6DRRQ-FM6Ef",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-003",
|
||||
"raw_question": "Покажи топ рисков за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 1,
|
||||
"fallback_type": "clarification",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "insufficient_specificity",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "xrLAKFpmR0k_fB",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,135 @@
|
|||
{
|
||||
"run_id": "eval-86xJU1J7RH",
|
||||
"timestamp": "2026-03-26T12:55:30.782Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 3
|
||||
},
|
||||
"cases_total": 3,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 33.33,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 33.33,
|
||||
"routed_fragment_rate": 33.33,
|
||||
"no_route_fragment_rate": 66.67,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 3,
|
||||
"checks_passed": 3
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 1,
|
||||
"no_route": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 1,
|
||||
"out_of_scope": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "BCfZGlUf_7WrzX",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Как вообще по ФСБУ",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 1,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "out_of_scope",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "hDwHzKUEvGv_WH",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-003",
|
||||
"raw_question": "Покажи топ рисков за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 1,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "insufficient_specificity",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "7ZV0K25MkOXdrL",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-8HiKD7tzkR",
|
||||
"timestamp": "2026-03-26T14:55:02.675Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "nAAz2ScCcfd7ML",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по счету 97",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "fhy4j7cjoB2lTX",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-94ypeytPFD",
|
||||
"timestamp": "2026-03-26T14:48:14.207Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "JWodHRnyHMbmXF",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по счету 97",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "Pxo_T9ekuyP7k_",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-9V1JK90r5M",
|
||||
"timestamp": "2026-03-26T12:55:31.352Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "v2KqGYhtmtYcgC",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по счету 97",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "cL_lvtjUZLKwXk",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,136 @@
|
|||
{
|
||||
"run_id": "eval-AKJhoo1T06",
|
||||
"timestamp": "2026-03-26T14:50:28.656Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 3
|
||||
},
|
||||
"cases_total": 3,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 33.33,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 33.33,
|
||||
"routed_fragment_rate": 33.33,
|
||||
"no_route_fragment_rate": 66.67,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 3,
|
||||
"checks_passed": 3
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 1,
|
||||
"no_route": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 1,
|
||||
"out_of_scope": 1,
|
||||
"clarification": 1
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "Pz3SajCDpZUJZi",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Как вообще по ФСБУ",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 1,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "out_of_scope",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "s8NQvLo0jtmRMg",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-003",
|
||||
"raw_question": "Покажи топ рисков за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 1,
|
||||
"fallback_type": "clarification",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "insufficient_specificity",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "SmjVXdYHtAfh78",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,135 @@
|
|||
{
|
||||
"run_id": "eval-DBE79isARo",
|
||||
"timestamp": "2026-03-26T13:16:13.683Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 3
|
||||
},
|
||||
"cases_total": 3,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 33.33,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 33.33,
|
||||
"routed_fragment_rate": 33.33,
|
||||
"no_route_fragment_rate": 66.67,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 3,
|
||||
"checks_passed": 3
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 1,
|
||||
"no_route": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 1,
|
||||
"out_of_scope": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "xNtCuag50SXqLa",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Как вообще по ФСБУ",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 1,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "out_of_scope",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "N7w7OEJH1AZQVj",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-003",
|
||||
"raw_question": "Покажи топ рисков за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 1,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "insufficient_specificity",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "Sh64q_20hIUmH0",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-ENsXL6tmGP",
|
||||
"timestamp": "2026-03-26T14:54:03.188Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "233Vomq5bQ3Etj",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по счету 97",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "1hNbLs5GT0Zb7w",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-HK1iqqUDF3",
|
||||
"timestamp": "2026-03-26T14:55:02.664Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "-_6kti9x1EAUA0",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по НДС и по закрытию",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "X1KFs_uEDRAn70",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-I9rx4KE5gR",
|
||||
"timestamp": "2026-03-26T15:06:12.667Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "u_fv2077U9ibdi",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по НДС и по закрытию",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "X6i-Ez8mtzYyK2",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,136 @@
|
|||
{
|
||||
"run_id": "eval-ILx0ihm-Lw",
|
||||
"timestamp": "2026-03-26T14:26:14.249Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 3
|
||||
},
|
||||
"cases_total": 3,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 33.33,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 33.33,
|
||||
"routed_fragment_rate": 33.33,
|
||||
"no_route_fragment_rate": 66.67,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 3,
|
||||
"checks_passed": 3
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 1,
|
||||
"no_route": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 1,
|
||||
"out_of_scope": 1,
|
||||
"clarification": 1
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "gdQmHxalu1cmCy",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Как вообще по ФСБУ",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 1,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "out_of_scope",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "GvpfTlrn6ZR1Z6",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-003",
|
||||
"raw_question": "Покажи топ рисков за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 1,
|
||||
"fallback_type": "clarification",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "insufficient_specificity",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "OUOpu8YXjUFck7",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-LYEATvWSQn",
|
||||
"timestamp": "2026-03-26T14:37:56.811Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "OGz5dN9-MYy1G8",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по НДС и по закрытию",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "UMEWQM9gAsokmI",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-NdHWJrLVPY",
|
||||
"timestamp": "2026-03-26T12:55:31.352Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "oc1AOWxwStd60g",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по НДС и по закрытию",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "Aipiy0hn6HL2uY",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-O360CBYk6a",
|
||||
"timestamp": "2026-03-26T14:59:11.293Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "eIdR0sgRZ_tBSt",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по счету 97",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "5tQorln1n1S3yD",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,135 @@
|
|||
{
|
||||
"run_id": "eval-OP8Go8cQU9",
|
||||
"timestamp": "2026-03-26T13:14:25.915Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 3
|
||||
},
|
||||
"cases_total": 3,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 33.33,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 33.33,
|
||||
"routed_fragment_rate": 33.33,
|
||||
"no_route_fragment_rate": 66.67,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 3,
|
||||
"checks_passed": 3
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 1,
|
||||
"no_route": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 1,
|
||||
"out_of_scope": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "PO59mE6XvvZ3k-",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Как вообще по ФСБУ",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 1,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "out_of_scope",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "8Z1GlyNzIH8r6S",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-003",
|
||||
"raw_question": "Покажи топ рисков за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 1,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "insufficient_specificity",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "gPRGR71mhUHYpy",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-PKpQU8_Mgt",
|
||||
"timestamp": "2026-03-26T14:48:14.219Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "-nxxWJd40iccfx",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по НДС и по закрытию",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "cpoY1EIOWYgaZQ",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-Pm5Ja_gY1X",
|
||||
"timestamp": "2026-03-26T12:55:35.823Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "7aEzzfvNPh5iIJ",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по НДС и по закрытию",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "hEQpwfmK5kkJ1R",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-Pwz7xRnbx0",
|
||||
"timestamp": "2026-03-26T12:55:35.822Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "sQv-L0-0hVtFHA",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по счету 97",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "zmI3zT9qCn5yAi",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-QFGmXAHVxo",
|
||||
"timestamp": "2026-03-26T13:16:30.497Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "kLuH1Ts0Ndt7zi",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по НДС и по закрытию",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "uPTFOfSJRHcZEH",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-QTRrUNA92U",
|
||||
"timestamp": "2026-03-26T13:16:30.501Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "0jy-9_PxHtWE9o",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по счету 97",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "FCWOVWFpJeo4Hg",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,136 @@
|
|||
{
|
||||
"run_id": "eval-QnXjMWuNG4",
|
||||
"timestamp": "2026-03-26T14:54:02.477Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 3
|
||||
},
|
||||
"cases_total": 3,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 33.33,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 33.33,
|
||||
"routed_fragment_rate": 33.33,
|
||||
"no_route_fragment_rate": 66.67,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 3,
|
||||
"checks_passed": 3
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 1,
|
||||
"no_route": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 1,
|
||||
"out_of_scope": 1,
|
||||
"clarification": 1
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "sI0z4HgYaC0hwE",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Как вообще по ФСБУ",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 1,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "out_of_scope",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "O68FZ2L5CkIkRC",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-003",
|
||||
"raw_question": "Покажи топ рисков за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 1,
|
||||
"fallback_type": "clarification",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "insufficient_specificity",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "BcvCxZlN6oRSPt",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-T5ubfTDRvc",
|
||||
"timestamp": "2026-03-26T15:04:48.689Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "5FUopKIDNLH9x3",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по НДС и по закрытию",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "tiSeEFAbOlH4HW",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,136 @@
|
|||
{
|
||||
"run_id": "eval-UrYNuoIte3",
|
||||
"timestamp": "2026-03-26T15:04:46.343Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 3
|
||||
},
|
||||
"cases_total": 3,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 33.33,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 33.33,
|
||||
"routed_fragment_rate": 33.33,
|
||||
"no_route_fragment_rate": 66.67,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 3,
|
||||
"checks_passed": 3
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 1,
|
||||
"no_route": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 1,
|
||||
"out_of_scope": 1,
|
||||
"clarification": 1
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "Ky7rKv9Qi2LzoJ",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Как вообще по ФСБУ",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 1,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "out_of_scope",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "EZtK3enHi9g_LM",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-003",
|
||||
"raw_question": "Покажи топ рисков за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 1,
|
||||
"fallback_type": "clarification",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "insufficient_specificity",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "Wu0Y_-rXFh68tp",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,135 @@
|
|||
{
|
||||
"run_id": "eval-VBCi_Qxr5Y",
|
||||
"timestamp": "2026-03-26T13:16:29.901Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 3
|
||||
},
|
||||
"cases_total": 3,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 33.33,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 33.33,
|
||||
"routed_fragment_rate": 33.33,
|
||||
"no_route_fragment_rate": 66.67,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 3,
|
||||
"checks_passed": 3
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 1,
|
||||
"no_route": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 1,
|
||||
"out_of_scope": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "7wuLnq4BH3piOu",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Как вообще по ФСБУ",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 1,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "out_of_scope",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "73aohob638bDuA",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-003",
|
||||
"raw_question": "Покажи топ рисков за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 1,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "insufficient_specificity",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "WjiJJxc9QvCpvV",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,136 @@
|
|||
{
|
||||
"run_id": "eval-WeSd-iJ_P_",
|
||||
"timestamp": "2026-03-26T14:55:01.958Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 3
|
||||
},
|
||||
"cases_total": 3,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 33.33,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 33.33,
|
||||
"routed_fragment_rate": 33.33,
|
||||
"no_route_fragment_rate": 66.67,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 3,
|
||||
"checks_passed": 3
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 1,
|
||||
"no_route": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 1,
|
||||
"out_of_scope": 1,
|
||||
"clarification": 1
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "9kfhdQNrIURTjL",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Как вообще по ФСБУ",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 1,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "out_of_scope",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "Ph7s_xUUKPC96z",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-003",
|
||||
"raw_question": "Покажи топ рисков за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 1,
|
||||
"fallback_type": "clarification",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "insufficient_specificity",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "hdlurNdtueZ0DX",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-X1q04xpajb",
|
||||
"timestamp": "2026-03-26T13:14:26.541Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "G86usNUuaWpz69",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по счету 97",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "9r87IyEYbgT-dl",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-ahKw2z3DNd",
|
||||
"timestamp": "2026-03-26T13:16:14.290Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "R4oqbuO7AZVCD6",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по НДС и по закрытию",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "TDIxd1uZGRFF40",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,136 @@
|
|||
{
|
||||
"run_id": "eval-awkhM6kKyT",
|
||||
"timestamp": "2026-03-26T14:29:32.762Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 3
|
||||
},
|
||||
"cases_total": 3,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 33.33,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 33.33,
|
||||
"routed_fragment_rate": 33.33,
|
||||
"no_route_fragment_rate": 66.67,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 3,
|
||||
"checks_passed": 3
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 1,
|
||||
"no_route": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 1,
|
||||
"out_of_scope": 1,
|
||||
"clarification": 1
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "Yv3aYwxmxBlAyJ",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Как вообще по ФСБУ",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 1,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "out_of_scope",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "XRBd2kl7qFE8aG",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-003",
|
||||
"raw_question": "Покажи топ рисков за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 1,
|
||||
"fallback_type": "clarification",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "insufficient_specificity",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "5yW64FSU_IjWOY",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-br2XpjV0IF",
|
||||
"timestamp": "2026-03-26T14:54:03.186Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "DIyGK1CkPnPicG",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по НДС и по закрытию",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "qrabd_Br0Ccs15",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,135 @@
|
|||
{
|
||||
"run_id": "eval-coE2JZYmvV",
|
||||
"timestamp": "2026-03-26T12:55:03.368Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 3
|
||||
},
|
||||
"cases_total": 3,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 33.33,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 33.33,
|
||||
"routed_fragment_rate": 33.33,
|
||||
"no_route_fragment_rate": 66.67,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 3,
|
||||
"checks_passed": 3
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 1,
|
||||
"no_route": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 1,
|
||||
"out_of_scope": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "XuSyxnPhZ7P_PW",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Как вообще по ФСБУ",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 1,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "out_of_scope",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "4sdBIrJ3fn5aqX",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-003",
|
||||
"raw_question": "Покажи топ рисков за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 1,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "insufficient_specificity",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "9YHZELcCqUbAzQ",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,136 @@
|
|||
{
|
||||
"run_id": "eval-csS0kkZmz9",
|
||||
"timestamp": "2026-03-26T17:04:54.186Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 3
|
||||
},
|
||||
"cases_total": 3,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 33.33,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 33.33,
|
||||
"routed_fragment_rate": 33.33,
|
||||
"no_route_fragment_rate": 66.67,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 3,
|
||||
"checks_passed": 3
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 1,
|
||||
"no_route": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 1,
|
||||
"out_of_scope": 1,
|
||||
"clarification": 1
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "YAW8P_gnmEHbmA",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Как вообще по ФСБУ",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 1,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "out_of_scope",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "wq0q7oxcCBNkY6",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-003",
|
||||
"raw_question": "Покажи топ рисков за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 1,
|
||||
"fallback_type": "clarification",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "insufficient_specificity",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "vbjwJRq69VeHm5",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-dtpCo4sMdc",
|
||||
"timestamp": "2026-03-26T15:06:12.670Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "wFRYwzhrxkIZ9l",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по счету 97",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "1xHDWvJ7ZrJbB5",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-e-1UopyweC",
|
||||
"timestamp": "2026-03-26T13:16:14.288Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "5mHNGyM4epF9AB",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по счету 97",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "unZlAEc3PGwu8l",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-fNnAM4rRXw",
|
||||
"timestamp": "2026-03-26T14:26:14.869Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "2SO3wpAWN_CpSC",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по НДС и по закрытию",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "9YxDVbJdguj2rx",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-fTyrcI8ATt",
|
||||
"timestamp": "2026-03-26T13:14:26.535Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "Ynl9X1eX9phUX_",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по НДС и по закрытию",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "JYnY80AE-3YW0R",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-kZZ5rtfv4X",
|
||||
"timestamp": "2026-03-26T14:26:14.890Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "o6eQLf5SO1Gf2p",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по счету 97",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "HKJlZl2REXlhIa",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,136 @@
|
|||
{
|
||||
"run_id": "eval-nCABoTHSuw",
|
||||
"timestamp": "2026-03-26T14:37:55.515Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 3
|
||||
},
|
||||
"cases_total": 3,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 33.33,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 33.33,
|
||||
"routed_fragment_rate": 33.33,
|
||||
"no_route_fragment_rate": 66.67,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 3,
|
||||
"checks_passed": 3
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 1,
|
||||
"no_route": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 1,
|
||||
"out_of_scope": 1,
|
||||
"clarification": 1
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "Bcb_BEMUY56h-G",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Как вообще по ФСБУ",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 1,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "out_of_scope",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "8hm25X41b9Fm46",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-003",
|
||||
"raw_question": "Покажи топ рисков за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 1,
|
||||
"fallback_type": "clarification",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "insufficient_specificity",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "ouqPxsmQmXYeoB",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,135 @@
|
|||
{
|
||||
"run_id": "eval-qo49IgK8zT",
|
||||
"timestamp": "2026-03-26T12:55:34.412Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 3
|
||||
},
|
||||
"cases_total": 3,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 33.33,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 33.33,
|
||||
"routed_fragment_rate": 33.33,
|
||||
"no_route_fragment_rate": 66.67,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 3,
|
||||
"checks_passed": 3
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 1,
|
||||
"no_route": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 1,
|
||||
"out_of_scope": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "wKoH1Fcro5huCS",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Как вообще по ФСБУ",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 1,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "out_of_scope",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "oFA0_O1Dd1EEnA",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-003",
|
||||
"raw_question": "Покажи топ рисков за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 1,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "insufficient_specificity",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "2HHlXRriLdK-hF",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-rgZvv7GVlh",
|
||||
"timestamp": "2026-03-26T14:50:29.357Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "DFhLxldnnEi813",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по счету 97",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "THV16s3zq8ch9N",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-uno0zByXlr",
|
||||
"timestamp": "2026-03-26T14:59:11.292Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "tFalCYEY3jS79i",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по НДС и по закрытию",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "hEK6K6QrsUgnnu",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,136 @@
|
|||
{
|
||||
"run_id": "eval-vBRt4_zi8o",
|
||||
"timestamp": "2026-03-26T14:48:13.117Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 3
|
||||
},
|
||||
"cases_total": 3,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 33.33,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 33.33,
|
||||
"routed_fragment_rate": 33.33,
|
||||
"no_route_fragment_rate": 66.67,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 3,
|
||||
"checks_passed": 3
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 1,
|
||||
"no_route": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 1,
|
||||
"out_of_scope": 1,
|
||||
"clarification": 1
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь хвосты по поставщикам и разложи цепочку",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "jcgq3Vum-vmvtb",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Как вообще по ФСБУ",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 1,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "out_of_scope",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "out_of_scope",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "zb3PnxSTYYCKtT",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-003",
|
||||
"raw_question": "Покажи топ рисков за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": false,
|
||||
"scope_confidence": "low",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 0,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 1,
|
||||
"fallback_type": "clarification",
|
||||
"predicted_route_status": "no_route",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": "insufficient_specificity",
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 0,
|
||||
"trace_id": "07LQK4g2WMJmJM",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-vrtzSgGK5J",
|
||||
"timestamp": "2026-03-26T14:37:56.801Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "sUs-83-P-FBJC5",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по счету 97",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "nf7HAY_H_yIRz8",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-wO0GIYPaNz",
|
||||
"timestamp": "2026-03-26T12:55:04.737Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "ozP1_JjXjrXdNb",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по НДС и по закрытию",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "gKP8y15WrKRjM6",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-wPEGrtVlnN",
|
||||
"timestamp": "2026-03-26T17:04:55.445Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "9RsDbvSEMVzwy9",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по НДС и по закрытию",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "gDKgk0Yj_UhrIX",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-wnexicBv1L",
|
||||
"timestamp": "2026-03-26T17:04:55.449Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "LChjIRCQvUVBJI",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по счету 97",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "jBg8Projzr0MZ0",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-xwO4M4uHeu",
|
||||
"timestamp": "2026-03-26T14:50:29.376Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "Eh2F-VtWbDijzY",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по НДС и по закрытию",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "4-DS8kKXsGNGSv",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
{
|
||||
"run_id": "eval-ynin-rwW6N",
|
||||
"timestamp": "2026-03-26T14:29:33.467Z",
|
||||
"mode": "single-pass-strict",
|
||||
"use_mock": true,
|
||||
"prompt_version": "normalizer_v2_0_2",
|
||||
"schema_version": "v2_0_2",
|
||||
"dataset": {
|
||||
"source": "inline_raw_questions",
|
||||
"file": null,
|
||||
"raw_questions_count": 2
|
||||
},
|
||||
"cases_total": 2,
|
||||
"metrics": {
|
||||
"schema_validation_pass_rate": 100,
|
||||
"scope_detection_accuracy": null,
|
||||
"scope_in_scope_rate": 100,
|
||||
"multi_intent_detected_rate": 0,
|
||||
"clarification_required_rate": 0,
|
||||
"avg_fragments_per_message": 1,
|
||||
"out_of_scope_fragment_rate": 0,
|
||||
"routed_fragment_rate": 100,
|
||||
"no_route_fragment_rate": 0,
|
||||
"route_resolution_accuracy": null,
|
||||
"no_route_precision": null,
|
||||
"false_no_route_rate": null,
|
||||
"execution_state_consistency_rate": 100,
|
||||
"executable_with_soft_assumptions_rate": 100,
|
||||
"soft_assumption_used_fragment_rate": 100,
|
||||
"clarification_precision": null,
|
||||
"clarification_recall": null,
|
||||
"false_clarification_rate": null
|
||||
},
|
||||
"budget": {
|
||||
"requests_total": 0,
|
||||
"retries_used": 0
|
||||
},
|
||||
"clarification_eval": {
|
||||
"labeled_cases": 0,
|
||||
"true_positive": 0,
|
||||
"false_positive": 0,
|
||||
"false_negative": 0
|
||||
},
|
||||
"route_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0,
|
||||
"expected_routed_cases": 0,
|
||||
"no_route_true_positive": 0,
|
||||
"no_route_false_positive": 0
|
||||
},
|
||||
"scope_eval": {
|
||||
"labeled_cases": 0,
|
||||
"correct_cases": 0
|
||||
},
|
||||
"execution_state_eval": {
|
||||
"checks_total": 2,
|
||||
"checks_passed": 2
|
||||
},
|
||||
"route_distribution": {
|
||||
"store_feature_risk": 2
|
||||
},
|
||||
"fallback_distribution": {
|
||||
"none": 2
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"case_id": "BQ-001",
|
||||
"raw_question": "Проверь счет 60 за июнь 2020",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "491D4elziSwn2k",
|
||||
"request_count_for_case": 0
|
||||
},
|
||||
{
|
||||
"case_id": "BQ-002",
|
||||
"raw_question": "Покажи риски по НДС и по закрытию",
|
||||
"validation_passed": true,
|
||||
"message_in_scope": true,
|
||||
"scope_confidence": "high",
|
||||
"contains_multiple_tasks": false,
|
||||
"fragments_total": 1,
|
||||
"in_scope_fragments": 1,
|
||||
"out_of_scope_fragments": 0,
|
||||
"unclear_fragments": 0,
|
||||
"fallback_type": "none",
|
||||
"predicted_route_status": "routed",
|
||||
"expected_route_status": null,
|
||||
"predicted_no_route_reason": null,
|
||||
"expected_no_route_reason": null,
|
||||
"predicted_clarification_required": false,
|
||||
"expected_clarification_required": null,
|
||||
"executable_with_soft_assumptions_fragments": 1,
|
||||
"trace_id": "_iYyGaR7yizo7k",
|
||||
"request_count_for_case": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
Binary file not shown.
|
|
@ -1,29 +1,52 @@
|
|||
# Run Folders
|
||||
# Run Folders
|
||||
|
||||
Эта папка используется для чистого хранения артефактов каждого прогона.
|
||||
Эта папка используется для хранения артефактов каждой отдельной волны.
|
||||
|
||||
Формат:
|
||||
- `docs/runs/YYYY-MM-DD_HH-mm-ss[_label]/`
|
||||
- внутри:
|
||||
- `reports/`
|
||||
- `logs/traces/`
|
||||
- `logs/assistant_sessions/`
|
||||
- `manifest.json`
|
||||
## Обязательный формат имени run-папки
|
||||
|
||||
Запуск архивации:
|
||||
- `docs/runs/YYYY-MM-DD_Stage_<NN>_Wave_<NN>_<short_topic>/`
|
||||
|
||||
Правило порядка строгое:
|
||||
- после даты всегда идет `Stage`;
|
||||
- после `Stage` всегда идет `Wave`;
|
||||
- затем краткая тема волны.
|
||||
|
||||
Пример:
|
||||
- `docs/runs/2026-03-26_Stage_04_Wave_01_Kickoff/`
|
||||
|
||||
## Обязательная структура внутри run-папки
|
||||
|
||||
- `README.md` — что проверяли и зачем;
|
||||
- `run_summary.json` — команды, результаты, ключевые ссылки;
|
||||
- `artifacts/` — отчеты прогонов (test/eval/acceptance/regression);
|
||||
- `prompt_dialogs/` — диалоги user/system/assistant и runtime-контекст.
|
||||
|
||||
## Обязательная структура `prompt_dialogs`
|
||||
|
||||
- `prompt_dialogs/index.json`
|
||||
- `prompt_dialogs/<suite>/<case_id>.json`
|
||||
- `prompt_dialogs/<suite>/<case_id>.md`
|
||||
|
||||
Минимум по каждому кейсу:
|
||||
- вопрос пользователя;
|
||||
- ответ системы (assistant reply);
|
||||
- технический контекст, доступный для анализа (debug/runtime/decomposition/grounding, если есть).
|
||||
|
||||
## Важное правило по волнам
|
||||
|
||||
Артефакты разных волн нельзя смешивать в одной папке.
|
||||
Каждая волна должна иметь собственную run-папку и собственный набор `prompt_dialogs`.
|
||||
|
||||
## Архивация
|
||||
|
||||
```bash
|
||||
npm run artifacts:bundle
|
||||
```
|
||||
|
||||
Архивация с очисткой исходных логов/генерируемых отчетов:
|
||||
|
||||
```bash
|
||||
npm run artifacts:bundle:clean
|
||||
```
|
||||
|
||||
С меткой захода:
|
||||
|
||||
```bash
|
||||
npm run artifacts:bundle:clean -- --label wave2_followup
|
||||
npm run artifacts:bundle:clean -- --label stage4_wave1
|
||||
```
|
||||
|
|
|
|||
|
|
@ -0,0 +1,302 @@
|
|||
{
|
||||
"suite_id": "assistant_stage3_lifecycle_probe",
|
||||
"suite_version": "0.1.0",
|
||||
"schema_version": "assistant_stage3_lifecycle_probe_v0_1",
|
||||
"scenario_count": 9,
|
||||
"case_ids": [
|
||||
"S3-51-WRONG-CLOSE-TYPE",
|
||||
"S3-60-PAYMENT-WITHOUT-CLOSURE",
|
||||
"S3-97-STALLED-NODES",
|
||||
"S3-97-EXPECTED-VS-ACTUAL",
|
||||
"S3-OS-BRANCH-DIVERGENCE",
|
||||
"S3-OS-TERMINAL-GAP",
|
||||
"S3-VAT-CROSS-BRANCH-CONFLICT",
|
||||
"S3-VAT-ACTUAL-VS-EXPECTED",
|
||||
"S3-PERIOD-CLOSE-LIFECYCLE-IMPACT"
|
||||
],
|
||||
"cases": [
|
||||
{
|
||||
"case_id": "S3-51-WRONG-CLOSE-TYPE",
|
||||
"scenario_tag": "51_wrong_closing_document_type",
|
||||
"question_type": "direct",
|
||||
"broadness_level": "medium",
|
||||
"turns": [
|
||||
{
|
||||
"user_message": "Проверь по счёту 51 за июнь 2020, где контур закрыт не тем типом документа и какой ожидаемый завершающий переход не подтверждён."
|
||||
}
|
||||
],
|
||||
"expected_hints": {
|
||||
"expected_reply_type": "partial_coverage",
|
||||
"expected_degraded_to": "partial",
|
||||
"expected_problem_first": true,
|
||||
"expected_problem_unit_types": [
|
||||
"lifecycle_anomaly_node",
|
||||
"document_conflict"
|
||||
],
|
||||
"expected_lifecycle_domain": "bank_settlement",
|
||||
"require_current_expected_state_pair": true,
|
||||
"require_missing_or_invalid_transition": true,
|
||||
"require_wrong_closing_document_type": true,
|
||||
"require_lifecycle_mode": "stage3_lifecycle_aware_v1"
|
||||
},
|
||||
"lifecycle_focus": {
|
||||
"domain": "51_60",
|
||||
"targets": [
|
||||
"expected_vs_actual_state",
|
||||
"missing_transition",
|
||||
"wrong_closing_document_type"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"case_id": "S3-60-PAYMENT-WITHOUT-CLOSURE",
|
||||
"scenario_tag": "60_payment_exists_but_not_closed",
|
||||
"question_type": "direct",
|
||||
"broadness_level": "medium",
|
||||
"turns": [
|
||||
{
|
||||
"user_message": "По поставщикам по счёту 60 за июнь 2020 покажи, где оплата есть, но lifecycle расчёта не дошёл до ожидаемого закрывающего документа."
|
||||
}
|
||||
],
|
||||
"expected_hints": {
|
||||
"expected_reply_type": "partial_coverage",
|
||||
"expected_degraded_to": "partial",
|
||||
"expected_problem_first": true,
|
||||
"expected_problem_unit_types": [
|
||||
"lifecycle_anomaly_node",
|
||||
"unresolved_settlement_cluster"
|
||||
],
|
||||
"expected_lifecycle_domain": "bank_settlement",
|
||||
"require_current_expected_state_pair": true,
|
||||
"require_missing_or_invalid_transition": true,
|
||||
"require_previous_states": true,
|
||||
"require_lifecycle_mode": "stage3_lifecycle_aware_v1"
|
||||
},
|
||||
"lifecycle_focus": {
|
||||
"domain": "51_60",
|
||||
"targets": [
|
||||
"expected_vs_actual_state",
|
||||
"missing_transition",
|
||||
"resolved_previous_states"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"case_id": "S3-97-STALLED-NODES",
|
||||
"scenario_tag": "97_stalled_nodes",
|
||||
"question_type": "direct",
|
||||
"broadness_level": "medium",
|
||||
"turns": [
|
||||
{
|
||||
"user_message": "Проверь по счёту 97 за июнь 2020 по документам, где зависли узлы жизненного цикла: какие стадии уже пройдены и какой ожидаемый переход до списания отсутствует."
|
||||
}
|
||||
],
|
||||
"expected_hints": {
|
||||
"expected_reply_type": "partial_coverage",
|
||||
"expected_degraded_to": "partial",
|
||||
"expected_problem_first": true,
|
||||
"expected_problem_unit_types": [
|
||||
"lifecycle_anomaly_node",
|
||||
"broken_chain_segment"
|
||||
],
|
||||
"expected_lifecycle_domain": "deferred_expense",
|
||||
"require_current_expected_state_pair": true,
|
||||
"require_missing_or_invalid_transition": true,
|
||||
"require_lifecycle_mode": "stage3_lifecycle_aware_v1"
|
||||
},
|
||||
"lifecycle_focus": {
|
||||
"domain": "97",
|
||||
"targets": [
|
||||
"expected_vs_actual_state",
|
||||
"missing_transition"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"case_id": "S3-97-EXPECTED-VS-ACTUAL",
|
||||
"scenario_tag": "97_expected_vs_actual_sequence",
|
||||
"question_type": "direct",
|
||||
"broadness_level": "medium",
|
||||
"turns": [
|
||||
{
|
||||
"user_message": "Проверь по счёту 97 за июнь 2020 по документам, где фактическое состояние расходов будущих периодов расходится с ожидаемой последовательностью списания."
|
||||
}
|
||||
],
|
||||
"expected_hints": {
|
||||
"expected_reply_type": "partial_coverage",
|
||||
"expected_degraded_to": "partial",
|
||||
"expected_problem_first": true,
|
||||
"expected_problem_unit_types": [
|
||||
"lifecycle_anomaly_node",
|
||||
"period_risk_cluster"
|
||||
],
|
||||
"expected_lifecycle_domain": "deferred_expense",
|
||||
"require_current_expected_state_pair": true,
|
||||
"require_terminal_state_mismatch": true,
|
||||
"require_lifecycle_mode": "stage3_lifecycle_aware_v1"
|
||||
},
|
||||
"lifecycle_focus": {
|
||||
"domain": "97",
|
||||
"targets": [
|
||||
"expected_vs_actual_state",
|
||||
"terminal_state_mismatch"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"case_id": "S3-OS-BRANCH-DIVERGENCE",
|
||||
"scenario_tag": "os_card_document_depreciation_divergence",
|
||||
"question_type": "direct",
|
||||
"broadness_level": "medium",
|
||||
"turns": [
|
||||
{
|
||||
"user_message": "По основным средствам по счетам 01 и 02 за июнь 2020 покажи, где lifecycle объекта расходится между карточкой, документом и начислением амортизации."
|
||||
}
|
||||
],
|
||||
"expected_hints": {
|
||||
"expected_reply_type": "partial_coverage",
|
||||
"expected_degraded_to": "partial",
|
||||
"expected_problem_first": true,
|
||||
"expected_problem_unit_types": [
|
||||
"lifecycle_anomaly_node",
|
||||
"cross_branch_inconsistency_cluster"
|
||||
],
|
||||
"expected_lifecycle_domain": "fixed_asset",
|
||||
"require_current_expected_state_pair": true,
|
||||
"require_cross_branch_conflict": true,
|
||||
"require_lifecycle_mode": "stage3_lifecycle_aware_v1"
|
||||
},
|
||||
"lifecycle_focus": {
|
||||
"domain": "fixed_asset",
|
||||
"targets": [
|
||||
"expected_vs_actual_state",
|
||||
"cross_branch_lifecycle_conflict"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"case_id": "S3-OS-TERMINAL-GAP",
|
||||
"scenario_tag": "os_previous_or_terminal_gap",
|
||||
"question_type": "direct",
|
||||
"broadness_level": "medium",
|
||||
"turns": [
|
||||
{
|
||||
"user_message": "Где по ОС по счетам 01/02 за июнь 2020 видно, что цепочка дошла до начисления, но не подтверждён ожидаемый предыдущий или завершающий этап?"
|
||||
}
|
||||
],
|
||||
"expected_hints": {
|
||||
"expected_reply_type": "partial_coverage",
|
||||
"expected_degraded_to": "partial",
|
||||
"expected_problem_first": true,
|
||||
"expected_problem_unit_types": [
|
||||
"lifecycle_anomaly_node",
|
||||
"document_conflict"
|
||||
],
|
||||
"expected_lifecycle_domain": "fixed_asset",
|
||||
"require_current_expected_state_pair": true,
|
||||
"require_terminal_state_mismatch": true,
|
||||
"require_lifecycle_mode": "stage3_lifecycle_aware_v1"
|
||||
},
|
||||
"lifecycle_focus": {
|
||||
"domain": "fixed_asset",
|
||||
"targets": [
|
||||
"resolved_previous_states",
|
||||
"terminal_state_mismatch"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"case_id": "S3-VAT-CROSS-BRANCH-CONFLICT",
|
||||
"scenario_tag": "vat_cross_branch_conflict",
|
||||
"question_type": "direct",
|
||||
"broadness_level": "medium",
|
||||
"turns": [
|
||||
{
|
||||
"user_message": "По НДС за июнь 2020 по счёту 68 покажи, где lifecycle ветвей документов, проводок и регистров расходится и какая ветка выглядит несогласованной."
|
||||
}
|
||||
],
|
||||
"expected_hints": {
|
||||
"expected_reply_type": "partial_coverage",
|
||||
"expected_degraded_to": "partial",
|
||||
"expected_problem_first": true,
|
||||
"expected_problem_unit_types": [
|
||||
"lifecycle_anomaly_node",
|
||||
"cross_branch_inconsistency_cluster"
|
||||
],
|
||||
"expected_lifecycle_domain": "vat_flow",
|
||||
"require_current_expected_state_pair": true,
|
||||
"require_cross_branch_conflict": true,
|
||||
"require_lifecycle_mode": "stage3_lifecycle_aware_v1"
|
||||
},
|
||||
"lifecycle_focus": {
|
||||
"domain": "vat_flow",
|
||||
"targets": [
|
||||
"expected_vs_actual_state",
|
||||
"cross_branch_lifecycle_conflict"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"case_id": "S3-VAT-ACTUAL-VS-EXPECTED",
|
||||
"scenario_tag": "vat_actual_vs_expected_state",
|
||||
"question_type": "direct",
|
||||
"broadness_level": "medium",
|
||||
"turns": [
|
||||
{
|
||||
"user_message": "Где по НДС за июнь 2020 по счетам 19 и 68 есть конфликт фактического и ожидаемого состояния между документом и регистрами?"
|
||||
}
|
||||
],
|
||||
"expected_hints": {
|
||||
"expected_reply_type": "partial_coverage",
|
||||
"expected_degraded_to": "partial",
|
||||
"expected_problem_first": true,
|
||||
"expected_problem_unit_types": [
|
||||
"lifecycle_anomaly_node",
|
||||
"document_conflict"
|
||||
],
|
||||
"expected_lifecycle_domain": "vat_flow",
|
||||
"require_current_expected_state_pair": true,
|
||||
"require_terminal_state_mismatch": true,
|
||||
"require_lifecycle_mode": "stage3_lifecycle_aware_v1"
|
||||
},
|
||||
"lifecycle_focus": {
|
||||
"domain": "vat_flow",
|
||||
"targets": [
|
||||
"expected_vs_actual_state",
|
||||
"terminal_state_mismatch"
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"case_id": "S3-PERIOD-CLOSE-LIFECYCLE-IMPACT",
|
||||
"scenario_tag": "period_close_lifecycle_impact",
|
||||
"question_type": "direct",
|
||||
"broadness_level": "high",
|
||||
"turns": [
|
||||
{
|
||||
"user_message": "Какие lifecycle-дефекты по счетам 51 и 60 за июнь 2020 сильнее всего влияют на риск закрытия периода (period close) и на каком переходе возникает разрыв?"
|
||||
}
|
||||
],
|
||||
"expected_hints": {
|
||||
"expected_reply_type": "clarification_required",
|
||||
"expected_degraded_to": "clarification",
|
||||
"expected_problem_first": true,
|
||||
"expected_problem_unit_types": [
|
||||
"period_risk_cluster",
|
||||
"lifecycle_anomaly_node"
|
||||
],
|
||||
"expected_lifecycle_domain": "period_close",
|
||||
"require_missing_or_invalid_transition": true,
|
||||
"require_period_close_impact": true,
|
||||
"require_lifecycle_mode": "stage3_lifecycle_aware_v1"
|
||||
},
|
||||
"lifecycle_focus": {
|
||||
"domain": "period_close",
|
||||
"targets": [
|
||||
"lifecycle_impact_period_close",
|
||||
"missing_transition"
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1 @@
|
|||
Проверка UTF-8 кириллицы
|
||||
|
|
@ -0,0 +1 @@
|
|||
???????? UTF-8 ?????????
|
||||
Loading…
Reference in New Issue