# 20 - Planner Autonomy Consolidation (2026-05-01) ## Purpose This note starts the consolidation layer after the first accepted inventory-stock breadth proof. The goal is to move from: - domain pack proves one more slice; - planner still carries too many local recipe branches; to: - reusable MCP primitive and chain descriptors; - planner-selected route fabric; - domain packs as semantic gates, not as the main design mechanism. This is the continuation of the original "MCP as bounded brain" goal. ## Architectural Reading The target is not an unrestricted model agent. The target remains: `user question -> data_need_graph -> catalog chain template -> reviewed primitives -> bounded evidence loop -> truth gate -> answer` The LLM may help choose the path, but only inside reviewed MCP boundaries. ## Code Steps The first consolidation step adds reusable chain templates to `assistantMcpCatalogIndex`. The catalog now describes not only primitive contracts, but also planner route-fabric templates: - `metadata_inspection` - `catalog_drilldown` - `entity_resolution` - `document_evidence` - `movement_evidence` - `value_flow` - `value_flow_comparison` - `value_flow_ranking` - `lifecycle` Each template declares: - semantic data need; - human-readable chain summary; - fallback primitive sequence; - base required axes; - supported fact/action families; - planning tags; - evidence-gate requirement. The planner now instantiates selected evidence chains from this catalog for the first base lanes instead of keeping all route meaning only in local planner branches. The follow-up consolidation step moved the value-flow planner seams onto the same catalog fabric: - bidirectional incoming-vs-outgoing comparison now instantiates `value_flow_comparison`, including explicit-counterparty comparison graphs rather than only subjectless organization-scope graphs; - ranked revenue/payment questions now instantiate `value_flow_ranking`; - organization-scoped open totals now instantiate `value_flow` with subjectless primitives but catalog-owned axes and evidence-gate semantics; - heuristic fallback routes for value-flow, lifecycle, metadata, movement, document, entity, and unclassified metadata inspection now also use catalog chain templates. This keeps behavior stable while making the planner's route meaning inspectable through catalog descriptors instead of only through local `recipeFor()` branches. The next consolidation step strengthened lifecycle as a bounded inference chain instead of a loose age-like shortcut: - the lifecycle template now declares `activity_window` and `legal_fact_boundary` axes; - the template summary explicitly frames the result as a first/latest confirmed 1C activity window, not legal registration age; - planner graph and fallback recipes now emit lifecycle bounded-inference reason codes; - lifecycle evidence facts include the matched row count, first/latest confirmed activity dates, and an explicit legal-fact boundary. Two arbitration seams were also hardened because they are part of the same planner-autonomy surface: - current-turn value-flow aggregate questions can override supported exact legacy routes when the user asks for amount/net/payment totals and the exact route would only produce a narrower lookup/list answer; - broad business evaluation (`broad_business_evaluation`) is intentionally kept in the deterministic living-chat bridge instead of being displaced by generic metadata discovery. These changes keep the route fabric broader without letting the planner pretend that inferred evidence is a formally proven legal fact. The following consolidation step promoted the accepted inventory-stock breadth behavior into reviewed catalog route fabric: - `inventory_stock_snapshot` - `inventory_supplier_overlap` - `inventory_purchase_provenance` - `inventory_sale_trace` These templates are now first-class catalog chain descriptors and can be selected by the data-need graph/planner. They reuse reviewed generic primitives (`query_movements`, `query_documents`, `aggregate_by_axis`, `drilldown_related_objects`, `probe_coverage`, `explain_evidence_basis`) and add inventory-specific axes such as `as_of_date`, `warehouse`, `supplier`, `buyer`, `quantity`, and `evidence_basis`. The first runtime bridge for these inventory templates now delegates through existing exact inventory recipes instead of inventing a new generic inventory executor: - `inventory_stock_snapshot` -> `inventory_on_hand_as_of_date` - `inventory_supplier_overlap` -> `inventory_supplier_stock_overlap_as_of_date` - `inventory_purchase_provenance` -> `inventory_purchase_provenance_for_item` - `inventory_sale_trace` -> `inventory_sale_trace_for_item` The bridge keeps the reviewed MCP route fabric as the planner surface, but uses `addressRecipeCatalog` exact queries and account scope `41.01` as the evidence source. Root inventory templates execute through `query_movements`; selected-item provenance/sale templates execute through `query_documents`. Missing selected-item anchors remain clarification, not a guessed item. The runtime answer boundary still makes unsupported or unconfirmed inventory states explicit: - unsupported inventory route templates get a user-facing "template selected, live execution not yet bridged" answer instead of a generic checked-sources fallback; - `must_not_claim` forbids presenting inventory planning as executed stock, supplier, purchase, or sale evidence; - technical unsupported-pilot limitation text is filtered out of user-facing lines, while existing bounded unknowns for lifecycle/value-flow remain intact. The next local scoring step broadened metadata-surface autonomy without adding a new hard domain route: - if a confirmed metadata surface is unambiguous and only exposes `Document.*`, `Register.*`, or `Catalog.*` objects, the planner can infer the next reviewed lane even when upstream has not yet filled `downstream_route_family`; - inferred document surfaces instantiate `document_evidence`; - inferred register/movement surfaces instantiate `movement_evidence`; - inferred catalog surfaces instantiate `catalog_drilldown`; - mixed or ambiguous surfaces still do not guess and continue through clarification / explicit data-need scoring. The following consolidation step added catalog-level chain-template scoring: - `assistantMcpCatalogIndex` can now score reviewed `chain_templates` directly from fact family, action family, required axes, comparison, ranking, and aggregation needs; - comparison-shaped value-flow ranks `value_flow_comparison` above the generic value-flow template; - ranking-shaped value-flow ranks `value_flow_ranking` above the generic value-flow template; - document/movement/inventory/lifecycle templates can now be inspected as catalog search results, not only as local planner branch constants; - `assistantMcpDiscoveryPlanner` records the top catalog chain-template match in reason codes and exposes the ranked matches as `catalog_chain_template_matches` in the planner contract while preserving existing guarded execution behavior. - the ranked chain-template matches are now propagated into runtime loop state and debug attachment fields, so replay analysis can inspect catalog-fabric intent without parsing reason-code strings. - `catalog_chain_template_alignment` now records whether the selected chain is the top catalog match, its rank, and whether it appeared in the catalog search results; runtime loop state and debug summary expose the same verdict. - planner reason codes now emit stable catalog-alignment telemetry for evaluated top-match, selected-equals-top, selected-lower-rank, selected-outside-match-set, and unscored selected-chain states. ## Why This Matters This reduces the pressure to add one hard route per user wording. Future domain enablement should prefer: - add or strengthen primitive descriptors; - add or strengthen chain templates; - let data-need graph and catalog search assemble the path; - use domain packs to verify the scenario tree and catch semantic drift. Domain-specific exact recipes can still exist as fast paths, but they should not be the only way the assistant understands a new business question. ## Validation Local validation after the catalog-template, value-flow, metadata-lane scoring, lifecycle bounded-inference, current-turn value-flow arbitration, and broad-evaluation bridge steps: - `npm.cmd test -- assistantMcpCatalogIndex.test.ts assistantMcpDiscoveryPlanner.test.ts`: passed, `47 passed` - MCP-discovery suite: passed, `227 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5911 nodes`, `12830 edges`, `138 communities` - live value-flow canary: `address_truth_harness_phase66_human_org_open_scope_dialog_planner_template_rerun2`, accepted `7/7` - live metadata movement canary: `address_truth_harness_phase52_metadata_movement_full_recovery_planner_metadata_scoring_rerun2`, accepted `4/4` - live metadata document canary: `address_truth_harness_phase54_metadata_document_full_recovery_planner_metadata_scoring_rerun2`, accepted `4/4` Additional code-level consolidation: - ambiguous metadata surfaces no longer carry both document and movement primitives when the current data-need graph explicitly selects `document_evidence` or `movement_evidence`; - thin neutral metadata follow-ups still do not force a lane and keep the clarification boundary intact; - planner reason codes now expose when an explicit lane family is scored against carried metadata ambiguity: `planner_metadata_surface_scored_with_explicit_lane_family`. Latest validation after the lifecycle and arbitration hardening: - targeted lifecycle/catalog/planner/answer tests: passed, `75 passed`, `1 skipped` - full MCP-discovery suite: passed, `268 passed`, `9 skipped` - broad MCP/living-chat/route/meaning slice: passed, `305 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5912 nodes`, `12833 edges`, `138 communities` - live lifecycle/value-flow response gate: `address_truth_harness_phase19_mcp_discovery_response_gate_planner_lifecycle_rerun4`, accepted `8/8` - live broad-eval to net-flow follow-up: `address_truth_harness_phase21_net_followup_after_broad_eval_planner_lifecycle_rerun2`, accepted `3/3` - live broad-evaluation bridge: `address_truth_harness_phase22_broad_business_evaluation_bridge_planner_lifecycle_rerun2`, accepted `3/3` Latest validation after the inventory catalog-template lift: - targeted catalog/data-need/planner/turn-input tests: passed, `139 passed`, `6 skipped` - full MCP-discovery suite: passed, `276 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5912 nodes`, `12833 edges`, `138 communities` Latest validation after the inventory runtime-boundary hardening: - targeted runtime-bridge/answer-adapter/pilot-executor tests: passed, `68 passed`, `1 skipped` - full MCP-discovery suite: passed, `277 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5913 nodes`, `12837 edges`, `138 communities` Latest validation after the inventory exact-runtime bridge: - targeted runtime-bridge/answer-adapter/pilot-executor tests: passed, `70 passed`, `1 skipped` - full MCP-discovery suite: passed, `279 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5930 nodes`, `12884 edges`, `135 communities` Latest validation after unambiguous metadata-surface lane inference: - targeted planner tests: passed, `36 passed` - full MCP-discovery suite: passed, `281 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5937 nodes`, `12899 edges`, `138 communities` - live inventory full-pack attempt: `inventory_stock_exact_bridge_live_20260501_after_runtime_bridge`, status `partial` - live attempt interpretation: route/intent/recipe/capability selection matched, but MCP execution failed with `MCP fetch failed: This operation was aborted`; direct proxy `get_metadata` also timed out while `/health` reported `active_sessions_count=0` and pending commands, so this is an infrastructure/polling-session blocker rather than accepted semantic evidence. Latest validation after catalog chain-template scoring: - targeted catalog/planner tests: passed, `54 passed` - full MCP-discovery suite: passed, `282 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5938 nodes`, `12903 edges`, `139 communities` Latest validation after structured catalog chain-template contract exposure: - targeted planner tests: passed, `36 passed` - full MCP-discovery suite: passed, `282 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5939 nodes`, `12906 edges`, `138 communities` Latest validation after runtime/debug propagation of structured chain matches: - targeted runtime/debug tests: passed, `18 passed` - full MCP-discovery suite: passed, `282 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5940 nodes`, `12909 edges`, `137 communities` Latest validation after subject-aware bidirectional comparison arbitration: - targeted planner tests: passed, `36 passed` - full MCP-discovery suite: passed, `282 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5940 nodes`, `12909 edges`, `137 communities` Latest validation after structured catalog chain-template alignment verdict: - targeted planner/runtime/debug tests: passed, `54 passed` - full MCP-discovery suite: passed, `282 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5941 nodes`, `12911 edges`, `136 communities` Latest validation after representative catalog-alignment regression guard: - targeted planner tests: passed, `37 passed` - full MCP-discovery suite: passed, `283 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5942 nodes`, `12912 edges`, `140 communities` Latest validation after catalog-alignment reason-code telemetry: - targeted planner/runtime tests: passed, `53 passed` - full MCP-discovery suite: passed, `283 passed`, `9 skipped` - `npm.cmd run build`: passed - graphify rebuild: `5943 nodes`, `12915 edges`, `136 communities` ## Next Step The next safe step is still to re-run live replay once the 1C side is actively polling the proxy. In parallel, local-only consolidation can continue by using the alignment verdict and reason-code telemetry to find remaining manual branches where selected chains diverge from reviewed catalog-fabric intent. Recommended order: 1. reconnect or restart the 1C toolkit polling side, then rerun the inventory canary against live 1C/MCP; 2. rerun a mixed cross-stage canary after the inventory canary is semantically clean; 3. continue broadening catalog scoring into unfamiliar 1C asks where metadata surface and data-need graph can pick reviewed lanes; 4. grow primitive descriptors only where live replay shows a real evidence gap; 5. keep phase19, phase21, phase22, value-flow, metadata ambiguity, and inventory-stock canaries as regression gates. The key rule remains: - do not hide a domain workaround inside the planner; - promote repeated successful domain behavior into a reviewed primitive or chain template.