# 20 - Planner Autonomy Consolidation (2026-05-01)

## Purpose

This note starts the consolidation layer after the first accepted inventory-stock breadth proof.

The goal is to move from:

- domain pack proves one more slice;
- planner still carries too many local recipe branches;

to:

- reusable MCP primitive and chain descriptors;
- planner-selected route fabric;
- domain packs as semantic gates, not as the main design mechanism.

This is the continuation of the original "MCP as bounded brain" goal.

## Architectural Reading

The target is not an unrestricted model agent.

The target remains:

`user question -> data_need_graph -> catalog chain template -> reviewed primitives -> bounded evidence loop -> truth gate -> answer`

The LLM may help choose the path, but only inside reviewed MCP boundaries.

## Code Steps

The first consolidation step adds reusable chain templates to `assistantMcpCatalogIndex`.

The catalog now describes not only primitive contracts, but also planner route-fabric templates:

- `metadata_inspection`
- `catalog_drilldown`
- `entity_resolution`
- `document_evidence`
- `movement_evidence`
- `value_flow`
- `value_flow_comparison`
- `value_flow_ranking`
- `lifecycle`

Each template declares:

- semantic data need;
- human-readable chain summary;
- fallback primitive sequence;
- base required axes;
- supported fact/action families;
- planning tags;
- evidence-gate requirement.

The planner now instantiates selected evidence chains from this catalog for the first base lanes instead of keeping all route meaning only in local planner branches.

The follow-up consolidation step moved the value-flow planner seams onto the same catalog fabric:

- bidirectional incoming-vs-outgoing comparison now instantiates `value_flow_comparison`, including explicit-counterparty comparison graphs rather than only subjectless organization-scope graphs;
- ranked revenue/payment questions now instantiate `value_flow_ranking`;
- organization-scoped open totals now instantiate `value_flow` with subjectless primitives but catalog-owned axes and evidence-gate semantics;
- heuristic fallback routes for value-flow, lifecycle, metadata, movement, document, entity, and unclassified metadata inspection now also use catalog chain templates.

This keeps behavior stable while making the planner's route meaning inspectable through catalog descriptors instead of only through local `recipeFor()` branches.

The next consolidation step strengthened lifecycle as a bounded inference chain instead of a loose age-like shortcut:

- the lifecycle template now declares `activity_window` and `legal_fact_boundary` axes;
- the template summary explicitly frames the result as a first/latest confirmed 1C activity window, not legal registration age;
- planner graph and fallback recipes now emit lifecycle bounded-inference reason codes;
- lifecycle evidence facts include the matched row count, first/latest confirmed activity dates, and an explicit legal-fact boundary.

Two arbitration seams were also hardened because they are part of the same planner-autonomy surface:

- current-turn value-flow aggregate questions can override supported exact legacy routes when the user asks for amount/net/payment totals and the exact route would only produce a narrower lookup/list answer;
- broad business evaluation (`broad_business_evaluation`) is intentionally kept in the deterministic living-chat bridge instead of being displaced by generic metadata discovery.

These changes keep the route fabric broader without letting the planner pretend that inferred evidence is a formally proven legal fact.

The following consolidation step promoted the accepted inventory-stock breadth behavior into reviewed catalog route fabric:

- `inventory_stock_snapshot`
- `inventory_supplier_overlap`
- `inventory_purchase_provenance`
- `inventory_sale_trace`

These templates are now first-class catalog chain descriptors and can be selected by the data-need graph/planner. They reuse reviewed generic primitives (`query_movements`, `query_documents`, `aggregate_by_axis`, `drilldown_related_objects`, `probe_coverage`, `explain_evidence_basis`) and add inventory-specific axes such as `as_of_date`, `warehouse`, `supplier`, `buyer`, `quantity`, and `evidence_basis`.

The first runtime bridge for these inventory templates now delegates through existing exact inventory recipes instead of inventing a new generic inventory executor:

- `inventory_stock_snapshot` -> `inventory_on_hand_as_of_date`
- `inventory_supplier_overlap` -> `inventory_supplier_stock_overlap_as_of_date`
- `inventory_purchase_provenance` -> `inventory_purchase_provenance_for_item`
- `inventory_sale_trace` -> `inventory_sale_trace_for_item`

The bridge keeps the reviewed MCP route fabric as the planner surface, but uses `addressRecipeCatalog` exact queries and account scope `41.01` as the evidence source. Root inventory templates execute through `query_movements`; selected-item provenance/sale templates execute through `query_documents`. Missing selected-item anchors remain clarification, not a guessed item.

The runtime answer boundary still makes unsupported or unconfirmed inventory states explicit:

- unsupported inventory route templates get a user-facing "template selected, live execution not yet bridged" answer instead of a generic checked-sources fallback;
- `must_not_claim` forbids presenting inventory planning as executed stock, supplier, purchase, or sale evidence;
- technical unsupported-pilot limitation text is filtered out of user-facing lines, while existing bounded unknowns for lifecycle/value-flow remain intact.

The next local scoring step broadened metadata-surface autonomy without adding a new hard domain route:

- if a confirmed metadata surface is unambiguous and only exposes `Document.*`, `Register.*`, or `Catalog.*` objects, the planner can infer the next reviewed lane even when upstream has not yet filled `downstream_route_family`;
- inferred document surfaces instantiate `document_evidence`;
- inferred register/movement surfaces instantiate `movement_evidence`;
- inferred catalog surfaces instantiate `catalog_drilldown`;
- mixed or ambiguous surfaces still do not guess and continue through clarification / explicit data-need scoring.

The following consolidation step added catalog-level chain-template scoring:

- `assistantMcpCatalogIndex` can now score reviewed `chain_templates` directly from fact family, action family, required axes, comparison, ranking, and aggregation needs;
- comparison-shaped value-flow ranks `value_flow_comparison` above the generic value-flow template;
- ranking-shaped value-flow ranks `value_flow_ranking` above the generic value-flow template;
- document/movement/inventory/lifecycle templates can now be inspected as catalog search results, not only as local planner branch constants;
- `assistantMcpDiscoveryPlanner` records the top catalog chain-template match in reason codes and exposes the ranked matches as `catalog_chain_template_matches` in the planner contract while preserving existing guarded execution behavior.
- the ranked chain-template matches are now propagated into runtime loop state and debug attachment fields, so replay analysis can inspect catalog-fabric intent without parsing reason-code strings.
- `catalog_chain_template_alignment` now records whether the selected chain is the top catalog match, its rank, and whether it appeared in the catalog search results; runtime loop state and debug summary expose the same verdict.
- planner reason codes now emit stable catalog-alignment telemetry for evaluated top-match, selected-equals-top, selected-lower-rank, selected-outside-match-set, and unscored selected-chain states.

## Why This Matters

This reduces the pressure to add one hard route per user wording.

Future domain enablement should prefer:

- add or strengthen primitive descriptors;
- add or strengthen chain templates;
- let data-need graph and catalog search assemble the path;
- use domain packs to verify the scenario tree and catch semantic drift.

Domain-specific exact recipes can still exist as fast paths, but they should not be the only way the assistant understands a new business question.

## Validation

Local validation after the catalog-template, value-flow, metadata-lane scoring, lifecycle bounded-inference, current-turn value-flow arbitration, and broad-evaluation bridge steps:

- `npm.cmd test -- assistantMcpCatalogIndex.test.ts assistantMcpDiscoveryPlanner.test.ts`: passed, `47 passed`
- MCP-discovery suite: passed, `227 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5911 nodes`, `12830 edges`, `138 communities`
- live value-flow canary: `address_truth_harness_phase66_human_org_open_scope_dialog_planner_template_rerun2`, accepted `7/7`
- live metadata movement canary: `address_truth_harness_phase52_metadata_movement_full_recovery_planner_metadata_scoring_rerun2`, accepted `4/4`
- live metadata document canary: `address_truth_harness_phase54_metadata_document_full_recovery_planner_metadata_scoring_rerun2`, accepted `4/4`

Additional code-level consolidation:

- ambiguous metadata surfaces no longer carry both document and movement primitives when the current data-need graph explicitly selects `document_evidence` or `movement_evidence`;
- thin neutral metadata follow-ups still do not force a lane and keep the clarification boundary intact;
- planner reason codes now expose when an explicit lane family is scored against carried metadata ambiguity:
  `planner_metadata_surface_scored_with_explicit_lane_family`.

Latest validation after the lifecycle and arbitration hardening:

- targeted lifecycle/catalog/planner/answer tests: passed, `75 passed`, `1 skipped`
- full MCP-discovery suite: passed, `268 passed`, `9 skipped`
- broad MCP/living-chat/route/meaning slice: passed, `305 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5912 nodes`, `12833 edges`, `138 communities`
- live lifecycle/value-flow response gate: `address_truth_harness_phase19_mcp_discovery_response_gate_planner_lifecycle_rerun4`, accepted `8/8`
- live broad-eval to net-flow follow-up: `address_truth_harness_phase21_net_followup_after_broad_eval_planner_lifecycle_rerun2`, accepted `3/3`
- live broad-evaluation bridge: `address_truth_harness_phase22_broad_business_evaluation_bridge_planner_lifecycle_rerun2`, accepted `3/3`

Latest validation after the inventory catalog-template lift:

- targeted catalog/data-need/planner/turn-input tests: passed, `139 passed`, `6 skipped`
- full MCP-discovery suite: passed, `276 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5912 nodes`, `12833 edges`, `138 communities`

Latest validation after the inventory runtime-boundary hardening:

- targeted runtime-bridge/answer-adapter/pilot-executor tests: passed, `68 passed`, `1 skipped`
- full MCP-discovery suite: passed, `277 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5913 nodes`, `12837 edges`, `138 communities`

Latest validation after the inventory exact-runtime bridge:

- targeted runtime-bridge/answer-adapter/pilot-executor tests: passed, `70 passed`, `1 skipped`
- full MCP-discovery suite: passed, `279 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5930 nodes`, `12884 edges`, `135 communities`

Latest validation after unambiguous metadata-surface lane inference:

- targeted planner tests: passed, `36 passed`
- full MCP-discovery suite: passed, `281 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5937 nodes`, `12899 edges`, `138 communities`
- live inventory full-pack attempt: `inventory_stock_exact_bridge_live_20260501_after_runtime_bridge`, status `partial`
- live attempt interpretation: route/intent/recipe/capability selection matched, but MCP execution failed with `MCP fetch failed: This operation was aborted`; direct proxy `get_metadata` also timed out while `/health` reported `active_sessions_count=0` and pending commands, so this is an infrastructure/polling-session blocker rather than accepted semantic evidence.

Latest validation after catalog chain-template scoring:

- targeted catalog/planner tests: passed, `54 passed`
- full MCP-discovery suite: passed, `282 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5938 nodes`, `12903 edges`, `139 communities`

Latest validation after structured catalog chain-template contract exposure:

- targeted planner tests: passed, `36 passed`
- full MCP-discovery suite: passed, `282 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5939 nodes`, `12906 edges`, `138 communities`

Latest validation after runtime/debug propagation of structured chain matches:

- targeted runtime/debug tests: passed, `18 passed`
- full MCP-discovery suite: passed, `282 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5940 nodes`, `12909 edges`, `137 communities`

Latest validation after subject-aware bidirectional comparison arbitration:

- targeted planner tests: passed, `36 passed`
- full MCP-discovery suite: passed, `282 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5940 nodes`, `12909 edges`, `137 communities`

Latest validation after structured catalog chain-template alignment verdict:

- targeted planner/runtime/debug tests: passed, `54 passed`
- full MCP-discovery suite: passed, `282 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5941 nodes`, `12911 edges`, `136 communities`

Latest validation after representative catalog-alignment regression guard:

- targeted planner tests: passed, `37 passed`
- full MCP-discovery suite: passed, `283 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5942 nodes`, `12912 edges`, `140 communities`

Latest validation after catalog-alignment reason-code telemetry:

- targeted planner/runtime tests: passed, `53 passed`
- full MCP-discovery suite: passed, `283 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5943 nodes`, `12915 edges`, `136 communities`

## Next Step

The next safe step is still to re-run live replay once the 1C side is actively polling the proxy. In parallel, local-only consolidation can continue by using the alignment verdict and reason-code telemetry to find remaining manual branches where selected chains diverge from reviewed catalog-fabric intent.

Recommended order:

1. reconnect or restart the 1C toolkit polling side, then rerun the inventory canary against live 1C/MCP;
2. rerun a mixed cross-stage canary after the inventory canary is semantically clean;
3. continue broadening catalog scoring into unfamiliar 1C asks where metadata surface and data-need graph can pick reviewed lanes;
4. grow primitive descriptors only where live replay shows a real evidence gap;
5. keep phase19, phase21, phase22, value-flow, metadata ambiguity, and inventory-stock canaries as regression gates.

The key rule remains:

- do not hide a domain workaround inside the planner;
- promote repeated successful domain behavior into a reviewed primitive or chain template.