From 05aad66dc4230cac4bcbd86d2ef90dcf603f4e87 Mon Sep 17 00:00:00 2001
From: dctouch <support@dctouch.ru>
Date: Mon, 20 Apr 2026 08:56:34 +0300
Subject: [PATCH] =?UTF-8?q?DOCS:=20=D0=B4=D0=BE=D0=B1=D0=B0=D0=B2=D0=B8?=
 =?UTF-8?q?=D1=82=D1=8C=20MCP=20semantic=20discovery=20=D0=B2=20=D0=B0?=
 =?UTF-8?q?=D1=80=D1=85=D0=B8=D1=82=D0=B5=D0=BA=D1=82=D1=83=D1=80=D0=BD?=
 =?UTF-8?q?=D1=8B=D0=B9=20=D0=BF=D0=BB=D0=B0=D0=BD?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 ...alog_authority_recovery_plan_2026-04-19.md | 178 +++++++++++++++++-
 .../11 - architecture_turnaround/README.md    |   4 +-
 2 files changed, 178 insertions(+), 4 deletions(-)

diff --git a/docs/ARCH/11 - architecture_turnaround/14 - semantic_dialog_authority_recovery_plan_2026-04-19.md b/docs/ARCH/11 - architecture_turnaround/14 - semantic_dialog_authority_recovery_plan_2026-04-19.md
index 2558fb7..5dbdbf0 100644
--- a/docs/ARCH/11 - architecture_turnaround/14 - semantic_dialog_authority_recovery_plan_2026-04-19.md	
+++ b/docs/ARCH/11 - architecture_turnaround/14 - semantic_dialog_authority_recovery_plan_2026-04-19.md	
@@ -373,9 +373,156 @@ This block is complete only when:
 - truthful limited answers do not look like stale replay;
 - human answer quality becomes a structural acceptance dimension, not a soft preference.
 
+## Big Block 5. MCP Semantic Data Agent Instead Of Route Hardcoding
+
+### Goal
+
+Reduce the need to hardcode every new business question as a separate route by introducing a guarded semantic data-discovery layer over 1C/MCP.
+
+This block does not mean giving Qwen3 unrestricted authority to invent arbitrary 1C queries.
+
+It means letting the model help build and revise a data-search plan while deterministic runtime contracts still own:
+
+- allowed MCP primitives;
+- schema/catalog boundaries;
+- execution budgets;
+- evidence sufficiency;
+- final answer truthfulness.
+
+### Architectural Rule
+
+The assistant may explore 1C data through MCP only through reviewed data primitives and evidence gates.
+
+The model can propose:
+
+- which business object to look for;
+- which metric or evidence axis is needed;
+- which period, organization, counterparty, contract, account, register, or document family should constrain the search;
+- whether the first query result is sufficient or requires a follow-up probe.
+
+The runtime must decide:
+
+- whether the proposed search is allowed;
+- which concrete MCP primitive or query template can execute it;
+- whether returned evidence proves the answer, only supports an inference, or is insufficient;
+- how the answer should describe confirmed facts, inferred facts, and unknowns.
+
+### Required Shift
+
+The route layer should stop being the only way to reach live 1C data.
+
+Today, the common pattern is:
+
+- wording signal;
+- fixed intent;
+- fixed route/capability;
+- fixed query/reply branch.
+
+The target pattern is:
+
+- current-turn meaning authority;
+- semantic data need;
+- guarded MCP discovery plan;
+- evidence object;
+- answer contract.
+
+Exact routes remain valuable for hot, high-confidence contours.
+
+But new or long-tail business questions should be able to enter a controlled discovery lane instead of immediately becoming:
+
+- unsupported;
+- stale carryover;
+- or another hand-coded route request.
+
+### MCP Primitive Families
+
+The discovery lane should expose a small set of broad, reviewed primitives rather than many free-form model tools:
+
+- `search_business_entity`
+- `inspect_1c_metadata`
+- `resolve_entity_reference`
+- `query_movements`
+- `query_documents`
+- `aggregate_by_axis`
+- `drilldown_related_objects`
+- `probe_coverage`
+- `explain_evidence_basis`
+
+These are not final API names.
+
+They describe the architectural shape: the model plans at business level, while runtime adapters execute controlled 1C/MCP operations.
+
+### Required Catalog Brain
+
+The assistant needs a machine-readable 1C schema/catalog memory before this can be safe:
+
+- available catalogs, documents, registers, and accounting axes;
+- known links between counterparties, contracts, documents, accounts, payments, shipments, and balances;
+- safe query templates and field mappings;
+- known MCP limitations and fallback probes;
+- examples of proven query recipes from accepted semantic runs.
+
+Without this catalog brain, a model-led MCP agent will guess.
+
+Guessing is not acceptable for accounting answers.
+
+### Truth And Evidence Requirements
+
+Every discovery result must emit an evidence object before answer composition:
+
+- `confirmed_facts`
+- `inferred_facts`
+- `unknown_facts`
+- `source_rows_summary`
+- `coverage_status`
+- `query_plan`
+- `query_limitations`
+- `confidence_reason`
+- `recommended_next_probe`
+
+The final answer may not present an inference as a confirmed 1C fact.
+
+If the exact fact is unavailable but a useful inference is possible from 1C activity evidence, the answer must say that clearly.
+
+### Stack Mapping
+
+Existing seams that already point in this direction:
+
+- `AssistantDataLayer`
+- `buildLiveMcpCallPlan`
+- `buildSemanticRetrievalProfile`
+- `addressMcpClient.ts`
+- `AddressQueryService`
+- truth/coverage/evidence contracts
+
+Primary new owner candidates:
+
+- `assistantSemanticDataAgentPolicy.ts`
+- `assistantMcpDiscoveryPlanner.ts`
+- `assistantMcpEvidenceGate.ts`
+- `assistantMcpCatalogIndex.ts`
+
+The naming can change, but the ownership split should not:
+
+- planner proposes a business-level data plan;
+- catalog constrains what can be searched;
+- executor runs allowed MCP primitives;
+- evidence gate decides what can be said;
+- answer layer explains the result in human business terms.
+
+### Done Criteria
+
+This block is complete only when:
+
+- at least one long-tail 1C business question can be answered through discovery without adding a one-off route branch;
+- the discovery lane produces machine-readable query/evidence artifacts;
+- failed discovery degrades to a useful "what I checked / what is still unknown" answer, not a generic unsupported fallback;
+- exact hot routes and semantic discovery can coexist without route collisions;
+- semantic replay can prove that the model does not leak internal query mechanics or hallucinate unconfirmed facts.
+
 ## Concrete Stack Plan
 
-This problem should be addressed in the current stack through four large architecture blocks, not through many micro-passes.
+This problem should be addressed in the current stack through five large architecture blocks, not through many micro-passes.
 
 ### Stack Block A. Turn Meaning Layer
 
@@ -441,6 +588,25 @@ Required result:
 - top-block answer correctness becomes part of acceptance;
 - "route technically matched" no longer overrules semantic mismatch.
 
+### Stack Block E. MCP Semantic Data Discovery Layer
+
+Add a guarded discovery lane for business questions that are understood but not yet covered by an exact route.
+
+Primary files and owner seams:
+
+- [addressMcpClient.ts](/x:/1C/NDC_1C/llm_normalizer/backend/src/services/addressMcpClient.ts:1)
+- [addressQueryService.ts](/x:/1C/NDC_1C/llm_normalizer/backend/src/services/addressQueryService.ts:1)
+- future `assistantMcpCatalogIndex.ts`
+- future `assistantMcpDiscoveryPlanner.ts`
+- future `assistantMcpEvidenceGate.ts`
+
+Required result:
+
+- Qwen3 may help plan MCP exploration, but it cannot directly define truth;
+- runtime exposes guarded MCP primitives instead of arbitrary model-generated 1C access;
+- every discovery answer is backed by an explicit evidence object;
+- long-tail understood business questions become recoverable without route-per-question hardcoding.
+
 ## Required Acceptance Invariants
 
 The architecture should not be considered corrected until the following invariants are green:
@@ -453,6 +619,10 @@ The architecture should not be considered corrected until the following invarian
 6. `short_followup_retains_dialog_stem_without_glitch_replay`
 7. `answer_top_block_matches_current_user_intent`
 8. `meta_interrupt_does_not_corrupt_business_thread`
+9. `understood_long_tail_question_enters_guarded_mcp_discovery`
+10. `mcp_discovery_answer_separates_confirmed_inferred_and_unknown_facts`
+11. `model_planned_mcp_probe_cannot_bypass_runtime_evidence_gate`
+12. `failed_discovery_reports_checked_sources_without_hallucinated_fact`
 
 ## Progress Update - 2026-04-20
 
@@ -503,7 +673,8 @@ Implement it as:
 - one shared current-turn meaning authority;
 - one explicit arbitration rule between new meaning and continuity;
 - stronger family-level semantic robustness for supported contours;
-- answer and replay gates that prove the assistant now feels alive to a human user.
+- answer and replay gates that prove the assistant now feels alive to a human user;
+- guarded MCP semantic discovery for understood questions that do not deserve one-off route hardcoding.
 
 ## Bottom Line
 
@@ -513,7 +684,8 @@ It fails because it still lacks a stable architecture for:
 
 - recognizing the meaning of the current turn;
 - subordinating continuity to that meaning;
-- and reflecting that meaning in the final user-visible answer.
+- reflecting that meaning in the final user-visible answer;
+- and discovering relevant 1C evidence through controlled MCP primitives when no exact route exists yet.
 
 That is the next large architecture block.
 
diff --git a/docs/ARCH/11 - architecture_turnaround/README.md b/docs/ARCH/11 - architecture_turnaround/README.md
index bf1cdb9..0a884bc 100644
--- a/docs/ARCH/11 - architecture_turnaround/README.md	
+++ b/docs/ARCH/11 - architecture_turnaround/README.md	
@@ -63,6 +63,7 @@ Current honest status:
   - replay breadth still narrower than the intended multi-domain rollout surface beyond the flagship and late-switch families
   - remaining answer-semantics pressure inside `composeStage.ts` / `answerComposer.ts`
   - insufficient semantic robustness on live user wording, especially short follow-up retarget, typo tolerance, and intent-faithful human answers
+  - no guarded MCP semantic discovery lane yet for understood long-tail 1C questions that should not require one-off route hardcoding
 
 Latest live proof now includes:
 
@@ -77,7 +78,7 @@ Current architectural reading:
 - the system is already materially past the dangerous regression breakpoint;
 - it is now safe for continued architecture hardening and controlled domain-by-domain enablement under replay gates;
 - it is now materially closer to pre-multidomain stability, but still not safe to declare broad low-risk multi-domain expansion.
-- the practical next target is now `90%+ pre-multidomain readiness`, and the remaining gap should be treated as four large architecture iterations rather than as cosmetic cleanup.
+- the practical next target is now `90%+ pre-multidomain readiness`, and the remaining gap should be treated as five large architecture iterations rather than as cosmetic cleanup.
 - from this point onward, readiness must be judged not only by route truth and replay pass rate, but also by whether a new human user would feel that the assistant understands the intent and responds meaningfully in live wording.
 
 For the detailed audit, current percentages, and remaining debt, read:
@@ -151,3 +152,4 @@ The biggest remaining blockers are:
 - central intent pressure in `resolveAddressIntent()`;
 - remaining answer-semantics pressure in `composeStage.ts` and `answerComposer.ts`.
 - semantic robustness gaps where already-supported questions can still look broken to a human user because of typo sensitivity, short follow-up retarget loss, or human-answer mismatch.
+- missing MCP semantic data-discovery layer where Qwen3 can help plan controlled 1C evidence search without bypassing runtime truth gates.