Docs: синхронизировать Post-F Semantic Integrity Hardening

This commit is contained in:
dctouch 2026-04-23 23:40:02 +03:00
parent 2f282f1479
commit da8b328d98
3 changed files with 301 additions and 30 deletions

View File

@ -221,3 +221,46 @@ This phase is successful only when a new human user can ask a structurally new b
5. answer honestly from proved evidence without pretending certainty it does not have.
That is the first point where the assistant will start to feel like it can actually walk 1C on its own within the reviewed MCP boundaries.
## Status Update - 2026-04-23
The D/E/F phase above is no longer only a target architecture.
It is now treated as materially closed baseline.
What is replay-backed and considered operational:
- `D. Question -> Data Need Graph`
- `E. Dynamic Schema Traversal And Primitive Search`
- `F. Multi-Hop Evidence Loop And Clarifying Recovery`
Operationally real in code and replay:
- data-need graphs now represent ranking, comparison, value-flow, metadata, and scoped follow-up asks as machine-readable runtime objects instead of only nearest-family route hints;
- planner selection now uses data-need graph plus catalog/schema signals rather than only short reviewed family recipes;
- metadata ambiguity can survive clarification, pick `documents` or `movements`, and continue the same bounded path;
- multi-hop loops can now recover through missing organization/period gaps, resume the same proof path, and survive year-switch or `all_time` continuation on the validated contours.
Representative replay anchors for this closure include:
- `address_truth_harness_phase33_open_scope_value_flow_comparison_live_rerun5`
- `address_truth_harness_phase39_open_scope_ranking_org_clarification_live_rerun2`
- `address_truth_harness_phase42_catalog_metadata_drilldown_live_rerun2`
- `address_truth_harness_phase45_multi_hop_open_total_clarification_loop_live_rerun2`
- `address_truth_harness_phase52_metadata_lane_choice_with_org_period_to_retrieval_live_rerun1`
- `address_truth_harness_phase63_metadata_movements_to_documents_all_time_live_rerun2`
What this closure does **not** mean:
- the whole assistant is now semantically immune to stale scope or wrong-subject carryover;
- every already-enabled contour is automatically safe under repeated pivots and legacy session memory;
- open-world bounded autonomy is broad enough for arbitrary unfamiliar 1C asks.
That is why the next architecture mainline is now:
- [17 - post_f_semantic_integrity_hardening_2026-04-23.md](./17%20-%20post_f_semantic_integrity_hardening_2026-04-23.md)
That document formalizes the next pressure point:
- not only growing autonomy breadth,
- but protecting semantic correctness inside the autonomy surface that already exists.

View File

@ -0,0 +1,214 @@
# 17 - Post-F Semantic Integrity Hardening (2026-04-23)
## Purpose
This note opens the architecture phase that starts after the practical closure of `Big Block D/E/F`.
It is not a new reset and not a retreat from bounded autonomy.
It exists because the project has now crossed an important threshold:
- the bounded autonomy substrate is materially real in runtime code;
- the planner can already survive metadata, entity, documents, movements, value-flow, ranking, comparison, and multi-hop clarification loops;
- but a new class of failures still remains visible to a human user:
- stale scope contamination;
- wrong subject carryover;
- post-pivot semantic drift;
- already-supported contours that still sometimes answer the wrong business object.
So the next layer is no longer "build the first autonomy substrate".
It is:
- protect semantic correctness inside the already-enabled autonomy surface.
## Baseline Entering This Phase
The following is now treated as replay-backed baseline rather than future intent:
- `A. Metadata-First Self-Navigation`
- `B. Entity And Schema Grounding`
- `C. Planner-Selected Primitive Chains`
- `D. Question -> Data Need Graph`
- `E. Dynamic Schema Traversal And Primitive Search`
- `F. Multi-Hop Evidence Loop And Clarifying Recovery`
This means the project already has:
- metadata-first discovery through reviewed MCP primitives;
- bounded schema grounding and honest ambiguity handling;
- planner-selected chains across entity resolution, value flow, documents, movements, ranking, and comparison;
- multi-hop clarification/recovery loops that can resume the same proof path.
What this does **not** mean:
- that every already-enabled contour is semantically stable under stale memory pressure;
- that repeated pivots are always safe by default;
- that explicit current-turn subject always beats old organization or focus state;
- that an exact route cannot still lose to a stale discovery/meta continuation.
## Why A New Phase Is Necessary
At this point the worst user-facing failures are no longer:
- "the system has no path at all";
- "metadata/chain planning does not exist";
- "clarification cannot resume".
The worst failures are now more dangerous in a different way:
- the system often has a path, but can still answer about the wrong business object;
- the route can be technically healthy while the semantic answer is still wrong;
- a human user can still see a glitch on a question that the architecture should already support.
That is exactly the class of bug that damages trust fastest.
So the phase goal is not breadth-first enablement.
It is semantic integrity over the enabled surface.
## Main Failure Classes
### 1. Stale Scope Contamination
Examples:
- old organization scope contaminates an explicit current-turn counterparty;
- old focus object survives a newly grounded entity;
- metadata scope bleeds into a later data ask.
### 2. Referential Carryover Drift
Examples:
- `по нему`, `ему`, `по этой позиции`, `кроме этого документа` drift into the wrong lane;
- referential follow-up wakes metadata/discovery when it should stay in exact follow-up continuity;
- a pronoun keeps the contour but loses the actual business object.
### 3. Post-Pivot Arbitration Failures
Examples:
- `documents -> payments`
- `payments -> contracts`
- `documents -> contracts -> documents`
- `movements -> documents -> year-switch`
The user means "stay on the same business object, but change the lane".
The system must not:
- invent a new topic;
- wake unrelated discovery;
- or keep the old lane after the pivot was explicit.
### 4. Temporal Continuity Loss After A Correct Pivot
Examples:
- `... -> а за 2021?`
- `... -> а за все время?`
These must continue the active contour.
They must not:
- reset into unsupported;
- revive stale period windows;
- or lose the active subject.
### 5. Exact-But-Invisible Result Materialization
Examples:
- VAT contour selected correctly but rows disappear in a later post-filter;
- exact lane found the correct evidence but the final answer degrades to limited/partial.
This class is especially dangerous because the route looks healthy in debug while the user still gets a bad answer.
## Scope Of This Phase
### In Scope
- protect grounded current-turn subject against stale organization/focus scope;
- protect exact and planner-selected follow-up pivots from stale discovery/meta override;
- protect referential document/object follow-ups from semantic drift;
- protect year-switch and all-time continuation after pivots;
- repair materialization seams where correct evidence is filtered out before the user answer;
- build replay packs that specifically target semantic integrity instead of only route availability.
### Out Of Scope
- broad new-domain enablement for arbitrary unfamiliar 1C questions;
- unrestricted primitive growth without replay-backed proof;
- cosmetic answer phrasing work when the underlying semantic seam is still wrong.
## Acceptance Criteria
This phase is only healthy when the following are true:
1. explicit current-turn subject beats stale scope unless the user explicitly changes the business object;
2. a valid clarification is preserved and resumed on the same proof path;
3. repeated pivots keep the same business object and the intended lane;
4. year-switch and all-time follow-ups preserve the active contour;
5. exact route success cannot silently degrade into a semantically wrong user answer;
6. replay verdict is based first on human business meaning, only then on technical internals.
## Current Status - 2026-04-23
This phase is already active in runtime code and replay-backed.
Materially hardened in this pass:
- explicit current-turn counterparty now overrides stale organization-scoped carryover for value-flow/net asks;
- VAT exact materialization now survives period-window filtering instead of collapsing into `recipe_visibility_gap`;
- stale session focus no longer wins over newly grounded discovery counterparty;
- referential document follow-up no longer wakes metadata/discovery by mistake;
- `documents -> payments/contracts` and repeated pivot chains now survive year-switch and all-time continuation;
- mixed human AGENT dialogues now keep repeated pivots, open-scope organization questions, and explicit counterparty resets inside one semantically stable session.
Replay-backed anchors for the current layer include:
- `address_truth_harness_phase11_manual_followup_meta_quality_live_rerun_vatfix`
- `address_truth_harness_phase20_continuity_stabilization_live_rerun_vatfix`
- `address_truth_harness_phase67_svk_grounded_counterparty_integrity_live_rerun_vatfix`
- `address_truth_harness_phase68_referential_document_followup_integrity_live_rerun1`
- `address_truth_harness_phase69_document_to_payments_pronoun_pivot_live_rerun3`
- `address_truth_harness_phase72_document_to_contracts_year_switch_live_rerun3`
- `address_truth_harness_phase80_repeated_pivots_all_time_after_contracts_live_rerun1`
- `address_truth_harness_phase82_human_mixed_integrity_status_dialog_live_rerun5`
## Honest Remaining Risk
This phase should not be overclaimed.
What is now stronger:
- already-enabled contours are materially harder to derail through stale carryover and post-pivot drift;
- semantic replay now catches more "wrong business object" failures before they hide behind green tests.
What is still not solved globally:
- arbitrary unfamiliar 1C asks outside the grown primitive/search surface;
- every possible stale memory seam across the whole legacy assistant surface;
- the residual centrality pressure inside `resolveAddressIntent()` and nearby orchestration bridges.
So this phase improves trust inside the enabled surface.
It does **not** yet mean:
- universal immunity from semantic drift;
- or general open-world autonomy over any new 1C question.
## Hand-Off
From this point the architecture should treat two tracks as parallel truths:
1. continue growing open-world bounded autonomy breadth where the primitive/search surface is still too narrow;
2. continue semantic integrity hardening where an already-enabled contour can still answer a human user incorrectly.
In practice this means:
- breadth work without semantic replay is not enough;
- semantic polish without protecting the actual business object is not enough;
- user trust now depends on both.

View File

@ -34,12 +34,13 @@ This package answers the next question:
14. [14 - semantic_dialog_authority_recovery_plan_2026-04-19.md](./14%20-%20semantic_dialog_authority_recovery_plan_2026-04-19.md)
15. [15 - mcp_bounded_autonomy_reset_plan_2026-04-21.md](./15%20-%20mcp_bounded_autonomy_reset_plan_2026-04-21.md)
16. [16 - data_need_graph_and_open_world_mcp_plan_2026-04-22.md](./16%20-%20data_need_graph_and_open_world_mcp_plan_2026-04-22.md)
17. [17 - post_f_semantic_integrity_hardening_2026-04-23.md](./17%20-%20post_f_semantic_integrity_hardening_2026-04-23.md)
## Current Status Snapshot (2026-04-22)
## Current Status Snapshot (2026-04-23)
This package is no longer planning-only.
It now documents a turnaround that is already operational in code, already materially past the acute regression breakpoint, and already moved into bounded MCP autonomy work beyond the first stabilization wave:
It now documents a turnaround that is already operational in code, already materially past the acute regression breakpoint, and already moved through the bounded MCP autonomy build-out into the next semantic hardening layer:
- route, transition, boundary, meta, memory, and provider policy owners exist as separate modules;
- exact-lane truth and coverage/evidence contracts exist as explicit runtime artifacts;
@ -47,50 +48,59 @@ It now documents a turnaround that is already operational in code, already mater
- AGENT semantic packs and source catalogs already exist for mixed domain/meta validation.
- the reset toward `MCP-first bounded autonomy` is now formalized;
- `Big Block A/B/C` of that reset are now closed in runtime code and replay-backed;
- the next architecture mainline is no longer continuity polishing, but `D/E/F`:
- `Big Block D/E/F` are now also materially closed in runtime code and replay-backed:
- `Question -> Data Need Graph`
- dynamic schema traversal and primitive search
- multi-hop evidence loop with bounded clarification recovery
- the current architecture mainline is now `Post-F Semantic Integrity Hardening`:
- protect grounded subject integrity against stale scope contamination
- protect exact and planner-selected pivots from metadata/discovery drift
- keep temporal continuity and repeated lane switches semantically stable
- recover already-supported questions that still look broken to a human user
Current honest status:
- turnaround implementation progress: `~96%`
- exit-from-danger-zone readiness: `~91%`
- pre-multidomain readiness: `~78%`
- bounded-autonomy foundation readiness: `~60%`
- graph snapshot after latest rebuild: `5741 nodes`, `12385 edges`, `137 communities`
- turnaround implementation progress: `~98%`
- exit-from-danger-zone readiness: `~95%`
- pre-multidomain readiness: `~88%`
- bounded-autonomy foundation readiness: `~86%`
- open-world bounded-autonomy readiness: `~71%`
- graph snapshot after latest rebuild: `5878 nodes`, `12734 edges`, `135 communities`
- current breakpoint:
- the validated hot paths are no longer structurally broken;
- flagship continuity collapse is no longer the primary risk;
- the main remaining risk is no longer clarification-resume collapse, but the unfinished shift from bounded reviewed chains toward open-world data-need-driven MCP planning;
- pure wording polish is now secondary debt, but semantic robustness plus open-world evidence navigation is now a first-class blocker;
- the practical product risk is no longer only "the route collapsed", but "the assistant still cannot yet understand and explore many non-preworked 1C questions on its own".
- the main remaining risk is no longer "A/B/C or D/E/F do not exist", but "already-supported semantic chains can still be contaminated by stale scope, legacy focus state, or wrong post-pivot arbitration";
- pure wording polish remains secondary debt, but semantic integrity and explicit-subject protection are now first-class blockers;
- the practical product risk is no longer only "the route collapsed", but "the user can still occasionally see a semantically wrong answer on a question that the architecture should already support".
- main remaining architectural pressure:
- no general `Question -> Data Need Graph` authority yet
- planner chain selection is still reviewed-family bounded rather than open-world over the primitive catalog
- schema traversal is still narrower than the intended arbitrary 1C blast radius
- multi-hop evidence recovery is still too shallow for unfamiliar asks
- open-world breadth is still narrower than the intended arbitrary 1C blast radius
- planner-selected chains are now real, but still not broad enough to cover unfamiliar 1C asks without additional primitive/search growth
- semantic integrity can still fail on stale carryover, repeated pivots, and mixed scope contamination if those seams are not replay-hardened
- central domain-intent pressure inside `resolveAddressIntent()`
- replay breadth is still below the future open-world autonomy surface
Latest live proof now includes:
- `address_truth_harness_phase12_wider_saved_session_pool_live_20260419_rerun16` accepted `20/20`
- `address_truth_harness_phase14_counterparty_tail_resume_live_20260418_rerun2` accepted `10/10`
- `address_truth_harness_phase15_answer_inspection_followup_live_20260419_rerun11` accepted `9/9`
- `address_truth_harness_phase16_multicompany_late_pivot_live_20260419_rerun10` accepted
- `address_truth_harness_phase17_clarification_resume_and_counterparty_tail_live_20260419_rerun5` accepted `10/10`
- `address_truth_harness_phase24_metadata_lane_choice_loop_live_rerun5` accepted
- `address_truth_harness_phase25_entity_resolution_chain_live_rerun_full_chain` accepted
- `address_truth_harness_phase24_metadata_lane_choice_loop_live_rerun14` accepted
- `address_truth_harness_phase32_planner_selected_chain_end_to_end_live_rerun2` accepted `6/6`
- `address_truth_harness_phase42_catalog_metadata_drilldown_live_rerun2` accepted
- `address_truth_harness_phase45_multi_hop_open_total_clarification_loop_live_rerun2` accepted
- `address_truth_harness_phase67_svk_grounded_counterparty_integrity_live_rerun_vatfix` accepted
- `address_truth_harness_phase68_referential_document_followup_integrity_live_rerun1` accepted
- `address_truth_harness_phase69_document_to_payments_pronoun_pivot_live_rerun3` accepted
- `address_truth_harness_phase72_document_to_contracts_year_switch_live_rerun3` accepted
- `address_truth_harness_phase80_repeated_pivots_all_time_after_contracts_live_rerun1` accepted
- `address_truth_harness_phase82_human_mixed_integrity_status_dialog_live_rerun5` accepted `19/19`
- `address_truth_harness_phase11_manual_followup_meta_quality_live_rerun_vatfix` accepted `10/10`
- `address_truth_harness_phase20_continuity_stabilization_live_rerun_vatfix` accepted `6/6`
Current architectural reading:
- the system is already materially past the dangerous regression breakpoint;
- it is now safe for continued architecture hardening and controlled domain-by-domain enablement under replay gates;
- it is now materially closer to pre-multidomain stability, but still not safe to declare broad low-risk multi-domain expansion.
- the practical next target is no longer only `90%+ pre-multidomain readiness`, but the first believable `open-world bounded autonomy` over 1C evidence.
- from this point onward, readiness must be judged not only by route truth and replay pass rate, but also by whether a new human user can ask a structurally new 1C data question and still get a bounded, evidence-honest answer path.
- it is materially closer to pre-multidomain stability, but still not safe to declare broad low-risk expansion over arbitrary unfamiliar 1C questions.
- the practical next target is no longer only `90%+ pre-multidomain readiness`, but trustworthy semantic integrity inside already-enabled contours plus broader open-world bounded autonomy over 1C evidence.
- from this point onward, readiness must be judged not only by route truth and replay pass rate, but also by whether already-supported questions stay semantically correct through stale memory, pivots, clarifications, and mixed scope resets.
For the detailed audit, current percentages, and remaining debt, read:
@ -103,6 +113,7 @@ For the detailed audit, current percentages, and remaining debt, read:
- [14 - semantic_dialog_authority_recovery_plan_2026-04-19.md](./14%20-%20semantic_dialog_authority_recovery_plan_2026-04-19.md)
- [15 - mcp_bounded_autonomy_reset_plan_2026-04-21.md](./15%20-%20mcp_bounded_autonomy_reset_plan_2026-04-21.md)
- [16 - data_need_graph_and_open_world_mcp_plan_2026-04-22.md](./16%20-%20data_need_graph_and_open_world_mcp_plan_2026-04-22.md)
- [17 - post_f_semantic_integrity_hardening_2026-04-23.md](./17%20-%20post_f_semantic_integrity_hardening_2026-04-23.md)
## Architectural Objects Of Planning
@ -137,6 +148,7 @@ Read in this order:
15. `14 - semantic_dialog_authority_recovery_plan_2026-04-19.md`
16. `15 - mcp_bounded_autonomy_reset_plan_2026-04-21.md`
17. `16 - data_need_graph_and_open_world_mcp_plan_2026-04-22.md`
18. `17 - post_f_semantic_integrity_hardening_2026-04-23.md`
## Planning Rules
@ -156,14 +168,16 @@ and start being described as:
- "a stateful exact-data assistant with explicit transition contracts and isolated truth gating."
As of `2026-04-22`, the project is already materially closer to the target description and is no longer in the same acute collapse state. The remaining blocker is no longer the original continuity failure itself, but the unfinished convergence from reviewed bounded MCP chains toward open-world data-need-driven autonomy with replay breadth still below the future blast radius.
As of `2026-04-23`, the project is already materially closer to the target description and is no longer in the same acute collapse state. The remaining blocker is no longer the original continuity failure itself, and no longer only the A/B/C or D/E/F build-out. The active blocker is now the combination of:
- unfinished convergence from reviewed bounded MCP chains toward broader open-world autonomy;
- semantic integrity hardening on already-enabled contours, especially where stale scope, repeated pivots, or post-pivot arbitration can still produce a business-wrong answer.
The biggest remaining blockers are:
- no general `Question -> Data Need Graph` runtime authority yet;
- planner-selected primitive chains are real, but still narrower than open-world primitive search;
- dynamic schema traversal is not yet broad enough for unfamiliar 1C asks outside the repaired families;
- multi-hop evidence recovery still depends on bounded reviewed seams and not yet on a general exploration loop;
- broader open-world primitive search is still narrower than the future arbitrary 1C blast radius;
- dynamic schema traversal is still not broad enough for many unfamiliar 1C asks outside the repaired families;
- semantic integrity hardening is still needed on stale scope contamination, repeated pivots, and already-supported but semantically fragile follow-up chains;
- residual `assistantService` overload;
- central intent pressure in `resolveAddressIntent()`;
- semantic robustness gaps where already-supported questions can still look broken to a human user because of typo sensitivity, short follow-up retarget loss, or human-answer mismatch.