# 20 - Planner Autonomy Consolidation (2026-05-01) ## Purpose This note starts the consolidation layer after the first accepted inventory-stock breadth proof. The goal is to move from: - domain pack proves one more slice; - planner still carries too many local recipe branches; to: - reusable MCP primitive and chain descriptors; - planner-selected route fabric; - domain packs as semantic gates, not as the main design mechanism. This is the continuation of the original "MCP as bounded brain" goal. ## Architectural Reading The target is not an unrestricted model agent. The target remains: `user question -> data_need_graph -> catalog chain template -> reviewed primitives -> bounded evidence loop -> truth gate -> answer` The LLM may help choose the path, but only inside reviewed MCP boundaries. ## Code Steps The first consolidation step adds reusable chain templates to `assistantMcpCatalogIndex`. The catalog now describes not only primitive contracts, but also planner route-fabric templates: - `metadata_inspection` - `catalog_drilldown` - `entity_resolution` - `document_evidence` - `movement_evidence` - `value_flow` - `value_flow_comparison` - `value_flow_ranking` - `lifecycle` Each template declares: - semantic data need; - human-readable chain summary; - fallback primitive sequence; - base required axes; - supported fact/action families; - planning tags; - evidence-gate requirement. The planner now instantiates selected evidence chains from this catalog for the first base lanes instead of keeping all route meaning only in local planner branches. The follow-up consolidation step moved the value-flow planner seams onto the same catalog fabric: - bidirectional incoming-vs-outgoing comparison now instantiates `value_flow_comparison`, including explicit-counterparty comparison graphs rather than only subjectless organization-scope graphs; - ranked revenue/payment questions now instantiate `value_flow_ranking`; - organization-scoped open totals now instantiate `value_flow` with subjectless primitives but catalog-owned axes and evidence-gate semantics; - heuristic fallback routes for value-flow, lifecycle, metadata, movement, document, entity, and unclassified metadata inspection now also use catalog chain templates. This keeps behavior stable while making the planner's route meaning inspectable through catalog descriptors instead of only through local `recipeFor()` branches. The next consolidation step strengthened lifecycle as a bounded inference chain instead of a loose age-like shortcut: - the lifecycle template now declares `activity_window` and `legal_fact_boundary` axes; - the template summary explicitly frames the result as a first/latest confirmed 1C activity window, not legal registration age; - planner graph and fallback recipes now emit lifecycle bounded-inference reason codes; - lifecycle evidence facts include the matched row count, first/latest confirmed activity dates, and an explicit legal-fact boundary. Two arbitration seams were also hardened because they are part of the same planner-autonomy surface: - current-turn value-flow aggregate questions can override supported exact legacy routes when the user asks for amount/net/payment totals and the exact route would only produce a narrower lookup/list answer; - broad business evaluation (`broad_business_evaluation`) is intentionally kept in the deterministic living-chat bridge instead of being displaced by generic metadata discovery. These changes keep the route fabric broader without letting the planner pretend that inferred evidence is a formally proven legal fact. The following consolidation step promoted the accepted inventory-stock breadth behavior into reviewed catalog route fabric: - `inventory_stock_snapshot` - `inventory_supplier_overlap` - `inventory_purchase_provenance` - `inventory_sale_trace` These templates are now first-class catalog chain descriptors and can be selected by the data-need graph/planner. They reuse reviewed generic primitives (`query_movements`, `query_documents`, `aggregate_by_axis`, `drilldown_related_objects`, `probe_coverage`, `explain_evidence_basis`) and add inventory-specific axes such as `as_of_date`, `warehouse`, `supplier`, `buyer`, `quantity`, and `evidence_basis`. The first runtime bridge for these inventory templates now delegates through existing exact inventory recipes instead of inventing a new generic inventory executor: - `inventory_stock_snapshot` -> `inventory_on_hand_as_of_date` - `inventory_supplier_overlap` -> `inventory_supplier_stock_overlap_as_of_date` - `inventory_purchase_provenance` -> `inventory_purchase_provenance_for_item` - `inventory_sale_trace` -> `inventory_sale_trace_for_item` The bridge keeps the reviewed MCP route fabric as the planner surface, but uses `addressRecipeCatalog` exact queries and account scope `41.01` as the evidence source. Root inventory templates execute through `query_movements`; selected-item provenance/sale templates execute through `query_documents`. Missing selected-item anchors remain clarification, not a guessed item. The runtime answer boundary still makes unsupported or unconfirmed inventory states explicit: - unsupported inventory route templates get a user-facing "template selected, live execution not yet bridged" answer instead of a generic checked-sources fallback; - `must_not_claim` forbids presenting inventory planning as executed stock, supplier, purchase, or sale evidence; - technical unsupported-pilot limitation text is filtered out of user-facing lines, while existing bounded unknowns for lifecycle/value-flow remain intact. The next local scoring step broadened metadata-surface autonomy without adding a new hard domain route: - if a confirmed metadata surface is unambiguous and only exposes `Document.*`, `Register.*`, or `Catalog.*` objects, the planner can infer the next reviewed lane even when upstream has not yet filled `downstream_route_family`; - inferred document surfaces instantiate `document_evidence`; - inferred register/movement surfaces instantiate `movement_evidence`; - inferred catalog surfaces instantiate `catalog_drilldown`; - mixed or ambiguous surfaces still do not guess and continue through clarification / explicit data-need scoring. The following consolidation step added catalog-level chain-template scoring: - `assistantMcpCatalogIndex` can now score reviewed `chain_templates` directly from fact family, action family, required axes, comparison, ranking, and aggregation needs; - comparison-shaped value-flow ranks `value_flow_comparison` above the generic value-flow template; - ranking-shaped value-flow ranks `value_flow_ranking` above the generic value-flow template; - document/movement/inventory/lifecycle templates can now be inspected as catalog search results, not only as local planner branch constants; - `assistantMcpDiscoveryPlanner` records the top catalog chain-template match in reason codes and exposes the ranked matches as `catalog_chain_template_matches` in the planner contract while preserving existing guarded execution behavior. - the ranked chain-template matches are now propagated into runtime loop state and debug attachment fields, so replay analysis can inspect catalog-fabric intent without parsing reason-code strings. - `catalog_chain_template_alignment` now records whether the selected chain is the top catalog match, its rank, and whether it appeared in the catalog search results; runtime loop state and debug summary expose the same verdict. - planner reason codes now emit stable catalog-alignment telemetry for evaluated top-match, selected-equals-top, selected-lower-rank, selected-outside-match-set, and unscored selected-chain states. - `catalog_chain_template_alignment.alignment_status` now carries the same verdict as one enum-like field, and debug summary exposes it as `mcp_discovery_catalog_chain_alignment_status`. - `domain_truth_harness` and `scenario_acceptance_policy` now carry the alignment status, top catalog match, and selected-matches-top flag into replay artifacts instead of leaving them buried in raw debug JSON. - truth-harness now raises a warning finding for `selected_lower_rank` and `selected_outside_match_set` alignment states unless the replay spec explicitly marks `allow_catalog_alignment_divergence`. - scenario acceptance now groups that warning under `catalog_alignment_ok`, and `final_status.md` prints the invariant alongside direct-answer, temporal, truth-gate, human-answer, meta-context, and selected-object gates. - truth-harness specs can now assert `expected_catalog_alignment_status`, `expected_catalog_chain_top_match`, and `expected_catalog_selected_matches_top` on each step. - `address_truth_harness_phase66_human_org_open_scope_dialog.json` now uses those fields to assert `value_flow`, `value_flow_comparison`, and `value_flow_ranking` top matches across the open-organization money dialog. - `address_truth_harness_phase32_planner_selected_chain_end_to_end.json` now uses the same assertions across selected-counterparty entity grounding, incoming/outgoing/net value-flow, document evidence, and movement evidence follow-ups. - `agent_semantic_pack_builder` now preserves these expected catalog-alignment fields in the reusable source catalog and adds the `planner_catalog_alignment` tag, so future mixed AGENT packs can deliberately select planner-brain regression probes instead of relying on hand-picked replay filenames. - The new `turnaround_11_planner_brain_alignment_mix` builder recipe generates `address_truth_harness_phase83_planner_brain_alignment_mix.json`, a 20-step mixed canary that crosses selected-counterparty value-flow, open-organization totals/comparison/ranking, broad-evaluation continuity, metadata drilldown, and off-domain living-chat safety. - The phase83 live replay now confirms that selected chains match the reviewed catalog top match across the mixed planner-brain pack and that the business-answer path remains usable after cross-stage pivots. - Checked-source failure replies now sanitize raw MCP transport/internal continuation strings from the user-facing answer while keeping the raw diagnostics in technical debug payloads. - Confirmed metadata-surface follow-ups now promote the surface-grounded chain template (`document_evidence`, `movement_evidence`, or `catalog_drilldown`) to the top catalog match when the selected chain came from the same checked surface. This keeps the planner's executed route and catalog-alignment diagnostics consistent without allowing ambiguous or stale surfaces to override explicit current-turn data needs. ## Why This Matters This reduces the pressure to add one hard route per user wording. Future domain enablement should prefer: - add or strengthen primitive descriptors; - add or strengthen chain templates; - let data-need graph and catalog search assemble the path; - use domain packs to verify the scenario tree and catch semantic drift. Domain-specific exact recipes can still exist as fast paths, but they should not be the only way the assistant understands a new business question. ## Validation Local validation after the catalog-template, value-flow, metadata-lane scoring, lifecycle bounded-inference, current-turn value-flow arbitration, and broad-evaluation bridge steps: - `npm.cmd test -- assistantMcpCatalogIndex.test.ts assistantMcpDiscoveryPlanner.test.ts`: passed, `47 passed` - MCP-discovery suite: passed, `227 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5911 nodes`, `12830 edges`, `138 communities` - live value-flow canary: `address_truth_harness_phase66_human_org_open_scope_dialog_planner_template_rerun2`, accepted `7/7` - live metadata movement canary: `address_truth_harness_phase52_metadata_movement_full_recovery_planner_metadata_scoring_rerun2`, accepted `4/4` - live metadata document canary: `address_truth_harness_phase54_metadata_document_full_recovery_planner_metadata_scoring_rerun2`, accepted `4/4` Additional code-level consolidation: - ambiguous metadata surfaces no longer carry both document and movement primitives when the current data-need graph explicitly selects `document_evidence` or `movement_evidence`; - thin neutral metadata follow-ups still do not force a lane and keep the clarification boundary intact; - planner reason codes now expose when an explicit lane family is scored against carried metadata ambiguity: `planner_metadata_surface_scored_with_explicit_lane_family`. Latest validation after the lifecycle and arbitration hardening: - targeted lifecycle/catalog/planner/answer tests: passed, `75 passed`, `1 skipped` - full MCP-discovery suite: passed, `268 passed`, `9 skipped` - broad MCP/living-chat/route/meaning slice: passed, `305 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5912 nodes`, `12833 edges`, `138 communities` - live lifecycle/value-flow response gate: `address_truth_harness_phase19_mcp_discovery_response_gate_planner_lifecycle_rerun4`, accepted `8/8` - live broad-eval to net-flow follow-up: `address_truth_harness_phase21_net_followup_after_broad_eval_planner_lifecycle_rerun2`, accepted `3/3` - live broad-evaluation bridge: `address_truth_harness_phase22_broad_business_evaluation_bridge_planner_lifecycle_rerun2`, accepted `3/3` Latest validation after the inventory catalog-template lift: - targeted catalog/data-need/planner/turn-input tests: passed, `139 passed`, `6 skipped` - full MCP-discovery suite: passed, `276 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5912 nodes`, `12833 edges`, `138 communities` Latest validation after the inventory runtime-boundary hardening: - targeted runtime-bridge/answer-adapter/pilot-executor tests: passed, `68 passed`, `1 skipped` - full MCP-discovery suite: passed, `277 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5913 nodes`, `12837 edges`, `138 communities` Latest validation after the inventory exact-runtime bridge: - targeted runtime-bridge/answer-adapter/pilot-executor tests: passed, `70 passed`, `1 skipped` - full MCP-discovery suite: passed, `279 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5930 nodes`, `12884 edges`, `135 communities` Latest validation after unambiguous metadata-surface lane inference: - targeted planner tests: passed, `36 passed` - full MCP-discovery suite: passed, `281 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5937 nodes`, `12899 edges`, `138 communities` - live inventory full-pack attempt: `inventory_stock_exact_bridge_live_20260501_after_runtime_bridge`, status `partial` - live attempt interpretation: route/intent/recipe/capability selection matched, but MCP execution failed with `MCP fetch failed: This operation was aborted`; direct proxy `get_metadata` also timed out while `/health` reported `active_sessions_count=0` and pending commands, so this is an infrastructure/polling-session blocker rather than accepted semantic evidence. Latest validation after catalog chain-template scoring: - targeted catalog/planner tests: passed, `54 passed` - full MCP-discovery suite: passed, `282 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5938 nodes`, `12903 edges`, `139 communities` Latest validation after structured catalog chain-template contract exposure: - targeted planner tests: passed, `36 passed` - full MCP-discovery suite: passed, `282 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5939 nodes`, `12906 edges`, `138 communities` Latest validation after runtime/debug propagation of structured chain matches: - targeted runtime/debug tests: passed, `18 passed` - full MCP-discovery suite: passed, `282 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5940 nodes`, `12909 edges`, `137 communities` Latest validation after subject-aware bidirectional comparison arbitration: - targeted planner tests: passed, `36 passed` - full MCP-discovery suite: passed, `282 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5940 nodes`, `12909 edges`, `137 communities` Latest validation after structured catalog chain-template alignment verdict: - targeted planner/runtime/debug tests: passed, `54 passed` - full MCP-discovery suite: passed, `282 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5941 nodes`, `12911 edges`, `136 communities` Latest validation after representative catalog-alignment regression guard: - targeted planner tests: passed, `37 passed` - full MCP-discovery suite: passed, `283 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5942 nodes`, `12912 edges`, `140 communities` Latest validation after catalog-alignment reason-code telemetry: - targeted planner/runtime tests: passed, `53 passed` - full MCP-discovery suite: passed, `283 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5943 nodes`, `12915 edges`, `136 communities` Latest validation after explicit catalog-alignment status propagation: - targeted planner/runtime/debug tests: passed, `55 passed` - full MCP-discovery suite: passed, `283 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5943 nodes`, `12915 edges`, `136 communities` Latest validation after truth-harness catalog-alignment artifact surfacing: - Python replay-tooling tests: passed, `4 passed` - graphify rebuild: `5946 nodes`, `12918 edges`, `136 communities` Latest validation after catalog-alignment divergence warning gate: - Python replay-tooling tests: passed, `5 passed` - graphify rebuild: `5947 nodes`, `12920 edges`, `138 communities` Latest validation after catalog-alignment acceptance invariant: - Python replay-tooling tests: passed, `6 passed` - graphify rebuild: `5949 nodes`, `12923 edges`, `136 communities` Latest validation after catalog-alignment spec assertions: - Python replay-tooling tests: passed, `7 passed` - graphify rebuild: `5951 nodes`, `12926 edges`, `139 communities` Latest validation after phase66 catalog-alignment spec hardening: - Python replay-tooling tests: passed, `7 passed` - `load_truth_harness_spec` confirmed the phase66 expected top-match chain sequence: `value_flow`, `value_flow`, `value_flow`, `value_flow_comparison`, `value_flow_comparison`, `value_flow_ranking`, `value_flow_ranking` Latest validation after phase32 catalog-alignment spec hardening and AGENT source-catalog surfacing: - Python replay-tooling tests: passed, `9 passed` - `load_truth_harness_spec` confirmed the phase32 expected top-match chain sequence: `entity_resolution`, `value_flow`, `value_flow`, `value_flow_comparison`, `document_evidence`, `movement_evidence` - `agent_semantic_pack_builder.py inventory` regenerated `agent_semantic_source_catalog.*` with reusable `planner_catalog_alignment` coverage Latest validation after phase83 mixed planner-brain spec generation: - `scripts.test_agent_semantic_pack_builder`: passed, `3 passed` - generated `address_truth_harness_phase83_planner_brain_alignment_mix.json`: `20` steps, `15` expected catalog top-match checks after the phase19/21/22 alignment hardening - regenerated `agent_semantic_source_catalog.*`: `planner_catalog_alignment` is visible with `26` reusable entries, including phase32, phase66, and phase83 probes - graphify rebuild: `5952 nodes`, `12927 edges`, `138 communities` Prior live-readiness diagnosis after phase83 live replay and checked-source error sanitation: - backend health is green on `http://127.0.0.1:8787/api/health`; - proxy health is green on `http://127.0.0.1:6003/health`, with `pending_commands=0`, `active_channels_count=1`, and `active_sessions_count=0`; - targeted checked-source sanitation tests still pass `61/61` with `1` skipped; - `npm.cmd run build` still passes; - full phase83 rerun `phase83_planner_brain_alignment_live_20260501_rerun4` again ended `partial`, with `8/20` pass, `2` warning, `10` fail, and `catalog_alignment_ok=true`; - direct proxy `get_metadata` with a 180-second client timeout also timed out, so the remaining live blocker is below the assistant planner/backend layer: the proxy accepts requests, but the 1C side does not return read-only evidence in time; - `scripts/check_mcp_live_readiness.py` now provides a repo-native preflight that separates backend/proxy health from confirmed live 1C evidence readiness before spending time on a full semantic replay. - graphify rebuild after the readiness preflight/docs sync: `5970 nodes`, `12958 edges`, `140 communities`. Prior follow-up diagnosis of the proxy/1C seam: - `1cv8c` is running locally with the `MCP Toolkit - Бухгалтерия предприятия, редакция 2.0` window title, so the failure is not simply "1C process absent"; - observing a read-only `get_metadata` command on the `default` channel showed `pending_commands=1` for 15 seconds and no pickup by the 1C client; - the diagnostic command was explicitly drained from `/1c/poll` and completed through `/1c/result` with a synthetic cancel result so the proxy queue stayed clean; - the proxy health endpoint now exposes polling telemetry: `polling_channels_count`, `last_poll_at`, `last_delivered_command_at`, and optional `poll_activity_by_channel` when `HEALTH_INCLUDE_CHANNEL_DETAILS=true`; - after proxy restart with this telemetry enabled, `polling_channels_count=0` stayed stable for 20 seconds, proving no `/1c/poll` activity reached the proxy; - `scripts/check_mcp_live_readiness.py --confirm-live` now refuses to create a direct live probe when proxy health already proves no 1C polling activity, preventing abandoned pending commands during readiness checks. - `domain_truth_harness.py run-live --require-mcp-live-readiness` now applies the same readiness gate before the first assistant step, writes `mcp_live_readiness.json`, and exits early when live 1C evidence is unavailable; - smoke of that harness gate against phase83 stopped before step execution with `ready_for_live_replay=false`, so future blocked runs should no longer waste a full semantic replay just to rediscover the missing `/1c/poll`. - readiness can now wait for polling before probing: `--wait-for-polling-seconds` in `check_mcp_live_readiness.py` and `--mcp-wait-for-polling-seconds` in `domain_truth_harness.py run-live`; a 2-second smoke waited twice, observed no polling, and skipped the live probe without leaving proxy queue garbage. Latest validation after guarded phase83 acceptance and surface-grounded catalog promotion: - targeted planner/response-policy/pilot/continuity slice: `npm.cmd test -- assistantMcpDiscoveryPlanner.test.ts assistantMcpDiscoveryResponsePolicy.test.ts assistantMcpDiscoveryPilotExecutor.test.ts assistantContinuityPolicy.test.ts` passed `109/109`; - `npm.cmd run build`: passed; - graphify rebuild: `5973 nodes`, `12971 edges`, `138 communities`; - live-readiness preflight after backend restart: `mcp_live_readiness_phase83_rerun3_after_backend_restart.json` reported `ready`; - full guarded phase83 replay: `phase83_planner_brain_alignment_live_20260501_readygate_rerun3` accepted `20/20`, `0` warnings, `0` failures; - final invariant result: `catalog_alignment_ok=true`, `direct_answer_ok=true`, `temporal_honesty_ok=true`, `selected_object_continuity_ok=true`, `truth_gate_ok=true`, `human_answer_quality_ok=true`, and `meta_context_integrity_ok=true`; - the previously warning step `step_02_neutral_followup_catalog_drilldown` now reports `catalog_alignment_status=selected_matches_top`, `catalog_top_match=catalog_drilldown`, and `catalog_selected_matches_top=True`. - saved autorun canary: `AGENT | Planner Autonomy phase83: мозг маршрутов, pivots и legacy continuity` (`gen-ag05011759-6f85fc`), sourced from the accepted phase83 spec after the live replay was reviewed. ## Next Step The declared Planner Autonomy Consolidation slice is now closed for the phase83 acceptance target. Keep using the live preflight before future full replays: `python scripts/check_mcp_live_readiness.py --confirm-live --wait-for-polling-seconds 60 --poll-interval-seconds 2 --output-json artifacts/runtime/mcp_live_readiness_phase83.json` Run future full candidates with the built-in gate: `python scripts/domain_truth_harness.py run-live --spec docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json --output-dir artifacts/domain_runs/phase83_planner_brain_alignment_live_ --require-mcp-live-readiness --mcp-wait-for-polling-seconds 60 --mcp-poll-interval-seconds 2` Only when readiness reports `ready_for_live_replay=true` should a full replay be treated as meaningful business-evidence proof. If it reports no `/1c/poll` activity, fix the 1C toolkit client/session/channel first; another full replay will only reproduce checked-source partial answers. Recommended order: 1. save the accepted phase83 pack into autoruns only if the product flow needs it as a legacy AGENT canary; 2. continue broader open-world bounded autonomy with phase83 as a regression gate, not as an open blocker; 3. broaden catalog scoring into unfamiliar 1C asks where metadata surface and data-need graph can pick reviewed lanes; 4. grow primitive descriptors only where live replay shows a real evidence gap; 5. keep phase19, phase21, phase22, value-flow, metadata ambiguity, inventory-stock, and phase83 as regression gates. The key rule remains: - do not hide a domain workaround inside the planner; - promote repeated successful domain behavior into a reviewed primitive or chain template. ## Status Canon Addendum - 2026-05-01 The declared Planner Autonomy Consolidation slice is closed at `100%` for the phase83 acceptance target. Future work should not lower this module's percentage. If new defects appear while broadening unfamiliar 1C asks, track them under the next active module: - `Open-World Bounded Autonomy Breadth` Keep phase83 as a regression gate for that next module: - catalog alignment must remain visible in replay artifacts; - direct-answer and human-answer quality remain acceptance invariants; - live-readiness preflight must run before expensive live semantic replays; - checked-source failures must not leak raw MCP/internal continuation errors into the user-facing answer. The short status source of truth is: - [21 - current_status_canon_2026-05-01.md](./21%20-%20current_status_canon_2026-05-01.md)