# 20 - Planner Autonomy Consolidation (2026-05-01)

## Purpose

This note starts the consolidation layer after the first accepted inventory-stock breadth proof.

The goal is to move from:

- domain pack proves one more slice;
- planner still carries too many local recipe branches;

to:

- reusable MCP primitive and chain descriptors;
- planner-selected route fabric;
- domain packs as semantic gates, not as the main design mechanism.

This is the continuation of the original "MCP as bounded brain" goal.

## Architectural Reading

The target is not an unrestricted model agent.

The target remains:

`user question -> data_need_graph -> catalog chain template -> reviewed primitives -> bounded evidence loop -> truth gate -> answer`

The LLM may help choose the path, but only inside reviewed MCP boundaries.

## Code Steps

The first consolidation step adds reusable chain templates to `assistantMcpCatalogIndex`.

The catalog now describes not only primitive contracts, but also planner route-fabric templates:

- `metadata_inspection`
- `catalog_drilldown`
- `entity_resolution`
- `document_evidence`
- `movement_evidence`
- `value_flow`
- `value_flow_comparison`
- `value_flow_ranking`
- `lifecycle`

Each template declares:

- semantic data need;
- human-readable chain summary;
- fallback primitive sequence;
- base required axes;
- supported fact/action families;
- planning tags;
- evidence-gate requirement.

The planner now instantiates selected evidence chains from this catalog for the first base lanes instead of keeping all route meaning only in local planner branches.

The follow-up consolidation step moved the value-flow planner seams onto the same catalog fabric:

- bidirectional incoming-vs-outgoing comparison now instantiates `value_flow_comparison`, including explicit-counterparty comparison graphs rather than only subjectless organization-scope graphs;
- ranked revenue/payment questions now instantiate `value_flow_ranking`;
- organization-scoped open totals now instantiate `value_flow` with subjectless primitives but catalog-owned axes and evidence-gate semantics;
- heuristic fallback routes for value-flow, lifecycle, metadata, movement, document, entity, and unclassified metadata inspection now also use catalog chain templates.

This keeps behavior stable while making the planner's route meaning inspectable through catalog descriptors instead of only through local `recipeFor()` branches.

The next consolidation step strengthened lifecycle as a bounded inference chain instead of a loose age-like shortcut:

- the lifecycle template now declares `activity_window` and `legal_fact_boundary` axes;
- the template summary explicitly frames the result as a first/latest confirmed 1C activity window, not legal registration age;
- planner graph and fallback recipes now emit lifecycle bounded-inference reason codes;
- lifecycle evidence facts include the matched row count, first/latest confirmed activity dates, and an explicit legal-fact boundary.

Two arbitration seams were also hardened because they are part of the same planner-autonomy surface:

- current-turn value-flow aggregate questions can override supported exact legacy routes when the user asks for amount/net/payment totals and the exact route would only produce a narrower lookup/list answer;
- broad business evaluation (`broad_business_evaluation`) is intentionally kept in the deterministic living-chat bridge instead of being displaced by generic metadata discovery.

These changes keep the route fabric broader without letting the planner pretend that inferred evidence is a formally proven legal fact.

The following consolidation step promoted the accepted inventory-stock breadth behavior into reviewed catalog route fabric:

- `inventory_stock_snapshot`
- `inventory_supplier_overlap`
- `inventory_purchase_provenance`
- `inventory_sale_trace`

These templates are now first-class catalog chain descriptors and can be selected by the data-need graph/planner. They reuse reviewed generic primitives (`query_movements`, `query_documents`, `aggregate_by_axis`, `drilldown_related_objects`, `probe_coverage`, `explain_evidence_basis`) and add inventory-specific axes such as `as_of_date`, `warehouse`, `supplier`, `buyer`, `quantity`, and `evidence_basis`.

The first runtime bridge for these inventory templates now delegates through existing exact inventory recipes instead of inventing a new generic inventory executor:

- `inventory_stock_snapshot` -> `inventory_on_hand_as_of_date`
- `inventory_supplier_overlap` -> `inventory_supplier_stock_overlap_as_of_date`
- `inventory_purchase_provenance` -> `inventory_purchase_provenance_for_item`
- `inventory_sale_trace` -> `inventory_sale_trace_for_item`

The bridge keeps the reviewed MCP route fabric as the planner surface, but uses `addressRecipeCatalog` exact queries and account scope `41.01` as the evidence source. Root inventory templates execute through `query_movements`; selected-item provenance/sale templates execute through `query_documents`. Missing selected-item anchors remain clarification, not a guessed item.

The runtime answer boundary still makes unsupported or unconfirmed inventory states explicit:

- unsupported inventory route templates get a user-facing "template selected, live execution not yet bridged" answer instead of a generic checked-sources fallback;
- `must_not_claim` forbids presenting inventory planning as executed stock, supplier, purchase, or sale evidence;
- technical unsupported-pilot limitation text is filtered out of user-facing lines, while existing bounded unknowns for lifecycle/value-flow remain intact.

The next local scoring step broadened metadata-surface autonomy without adding a new hard domain route:

- if a confirmed metadata surface is unambiguous and only exposes `Document.*`, `Register.*`, or `Catalog.*` objects, the planner can infer the next reviewed lane even when upstream has not yet filled `downstream_route_family`;
- inferred document surfaces instantiate `document_evidence`;
- inferred register/movement surfaces instantiate `movement_evidence`;
- inferred catalog surfaces instantiate `catalog_drilldown`;
- mixed or ambiguous surfaces still do not guess and continue through clarification / explicit data-need scoring.

The following consolidation step added catalog-level chain-template scoring:

- `assistantMcpCatalogIndex` can now score reviewed `chain_templates` directly from fact family, action family, required axes, comparison, ranking, and aggregation needs;
- comparison-shaped value-flow ranks `value_flow_comparison` above the generic value-flow template;
- ranking-shaped value-flow ranks `value_flow_ranking` above the generic value-flow template;
- document/movement/inventory/lifecycle templates can now be inspected as catalog search results, not only as local planner branch constants;
- `assistantMcpDiscoveryPlanner` records the top catalog chain-template match in reason codes and exposes the ranked matches as `catalog_chain_template_matches` in the planner contract while preserving existing guarded execution behavior.
- the ranked chain-template matches are now propagated into runtime loop state and debug attachment fields, so replay analysis can inspect catalog-fabric intent without parsing reason-code strings.
- `catalog_chain_template_alignment` now records whether the selected chain is the top catalog match, its rank, and whether it appeared in the catalog search results; runtime loop state and debug summary expose the same verdict.
- planner reason codes now emit stable catalog-alignment telemetry for evaluated top-match, selected-equals-top, selected-lower-rank, selected-outside-match-set, and unscored selected-chain states.
- `catalog_chain_template_alignment.alignment_status` now carries the same verdict as one enum-like field, and debug summary exposes it as `mcp_discovery_catalog_chain_alignment_status`.
- `domain_truth_harness` and `scenario_acceptance_policy` now carry the alignment status, top catalog match, and selected-matches-top flag into replay artifacts instead of leaving them buried in raw debug JSON.
- truth-harness now raises a warning finding for `selected_lower_rank` and `selected_outside_match_set` alignment states unless the replay spec explicitly marks `allow_catalog_alignment_divergence`.
- scenario acceptance now groups that warning under `catalog_alignment_ok`, and `final_status.md` prints the invariant alongside direct-answer, temporal, truth-gate, human-answer, meta-context, and selected-object gates.
- truth-harness specs can now assert `expected_catalog_alignment_status`, `expected_catalog_chain_top_match`, and `expected_catalog_selected_matches_top` on each step.
- `address_truth_harness_phase66_human_org_open_scope_dialog.json` now uses those fields to assert `value_flow`, `value_flow_comparison`, and `value_flow_ranking` top matches across the open-organization money dialog.
- `address_truth_harness_phase32_planner_selected_chain_end_to_end.json` now uses the same assertions across selected-counterparty entity grounding, incoming/outgoing/net value-flow, document evidence, and movement evidence follow-ups.
- `agent_semantic_pack_builder` now preserves these expected catalog-alignment fields in the reusable source catalog and adds the `planner_catalog_alignment` tag, so future mixed AGENT packs can deliberately select planner-brain regression probes instead of relying on hand-picked replay filenames.
- The new `turnaround_11_planner_brain_alignment_mix` builder recipe generates `address_truth_harness_phase83_planner_brain_alignment_mix.json`, a 20-step mixed canary that crosses selected-counterparty value-flow, open-organization totals/comparison/ranking, broad-evaluation continuity, metadata drilldown, and off-domain living-chat safety.
- The phase83 live replay now confirms that selected chains match the reviewed catalog top match across the mixed planner-brain pack and that the business-answer path remains usable after cross-stage pivots.
- Checked-source failure replies now sanitize raw MCP transport/internal continuation strings from the user-facing answer while keeping the raw diagnostics in technical debug payloads.
- Confirmed metadata-surface follow-ups now promote the surface-grounded chain template (`document_evidence`, `movement_evidence`, or `catalog_drilldown`) to the top catalog match when the selected chain came from the same checked surface. This keeps the planner's executed route and catalog-alignment diagnostics consistent without allowing ambiguous or stale surfaces to override explicit current-turn data needs.

## Why This Matters

This reduces the pressure to add one hard route per user wording.

Future domain enablement should prefer:

- add or strengthen primitive descriptors;
- add or strengthen chain templates;
- let data-need graph and catalog search assemble the path;
- use domain packs to verify the scenario tree and catch semantic drift.

Domain-specific exact recipes can still exist as fast paths, but they should not be the only way the assistant understands a new business question.

## Validation

Local validation after the catalog-template, value-flow, metadata-lane scoring, lifecycle bounded-inference, current-turn value-flow arbitration, and broad-evaluation bridge steps:

- `npm.cmd test -- assistantMcpCatalogIndex.test.ts assistantMcpDiscoveryPlanner.test.ts`: passed, `47 passed`
- MCP-discovery suite: passed, `227 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5911 nodes`, `12830 edges`, `138 communities`
- live value-flow canary: `address_truth_harness_phase66_human_org_open_scope_dialog_planner_template_rerun2`, accepted `7/7`
- live metadata movement canary: `address_truth_harness_phase52_metadata_movement_full_recovery_planner_metadata_scoring_rerun2`, accepted `4/4`
- live metadata document canary: `address_truth_harness_phase54_metadata_document_full_recovery_planner_metadata_scoring_rerun2`, accepted `4/4`

Additional code-level consolidation:

- ambiguous metadata surfaces no longer carry both document and movement primitives when the current data-need graph explicitly selects `document_evidence` or `movement_evidence`;
- thin neutral metadata follow-ups still do not force a lane and keep the clarification boundary intact;
- planner reason codes now expose when an explicit lane family is scored against carried metadata ambiguity:
  `planner_metadata_surface_scored_with_explicit_lane_family`.

Latest validation after the lifecycle and arbitration hardening:

- targeted lifecycle/catalog/planner/answer tests: passed, `75 passed`, `1 skipped`
- full MCP-discovery suite: passed, `268 passed`, `9 skipped`
- broad MCP/living-chat/route/meaning slice: passed, `305 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5912 nodes`, `12833 edges`, `138 communities`
- live lifecycle/value-flow response gate: `address_truth_harness_phase19_mcp_discovery_response_gate_planner_lifecycle_rerun4`, accepted `8/8`
- live broad-eval to net-flow follow-up: `address_truth_harness_phase21_net_followup_after_broad_eval_planner_lifecycle_rerun2`, accepted `3/3`
- live broad-evaluation bridge: `address_truth_harness_phase22_broad_business_evaluation_bridge_planner_lifecycle_rerun2`, accepted `3/3`

Latest validation after the inventory catalog-template lift:

- targeted catalog/data-need/planner/turn-input tests: passed, `139 passed`, `6 skipped`
- full MCP-discovery suite: passed, `276 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5912 nodes`, `12833 edges`, `138 communities`

Latest validation after the inventory runtime-boundary hardening:

- targeted runtime-bridge/answer-adapter/pilot-executor tests: passed, `68 passed`, `1 skipped`
- full MCP-discovery suite: passed, `277 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5913 nodes`, `12837 edges`, `138 communities`

Latest validation after the inventory exact-runtime bridge:

- targeted runtime-bridge/answer-adapter/pilot-executor tests: passed, `70 passed`, `1 skipped`
- full MCP-discovery suite: passed, `279 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5930 nodes`, `12884 edges`, `135 communities`

Latest validation after unambiguous metadata-surface lane inference:

- targeted planner tests: passed, `36 passed`
- full MCP-discovery suite: passed, `281 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5937 nodes`, `12899 edges`, `138 communities`
- live inventory full-pack attempt: `inventory_stock_exact_bridge_live_20260501_after_runtime_bridge`, status `partial`
- live attempt interpretation: route/intent/recipe/capability selection matched, but MCP execution failed with `MCP fetch failed: This operation was aborted`; direct proxy `get_metadata` also timed out while `/health` reported `active_sessions_count=0` and pending commands, so this is an infrastructure/polling-session blocker rather than accepted semantic evidence.

Latest validation after catalog chain-template scoring:

- targeted catalog/planner tests: passed, `54 passed`
- full MCP-discovery suite: passed, `282 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5938 nodes`, `12903 edges`, `139 communities`

Latest validation after structured catalog chain-template contract exposure:

- targeted planner tests: passed, `36 passed`
- full MCP-discovery suite: passed, `282 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5939 nodes`, `12906 edges`, `138 communities`

Latest validation after runtime/debug propagation of structured chain matches:

- targeted runtime/debug tests: passed, `18 passed`
- full MCP-discovery suite: passed, `282 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5940 nodes`, `12909 edges`, `137 communities`

Latest validation after subject-aware bidirectional comparison arbitration:

- targeted planner tests: passed, `36 passed`
- full MCP-discovery suite: passed, `282 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5940 nodes`, `12909 edges`, `137 communities`

Latest validation after structured catalog chain-template alignment verdict:

- targeted planner/runtime/debug tests: passed, `54 passed`
- full MCP-discovery suite: passed, `282 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5941 nodes`, `12911 edges`, `136 communities`

Latest validation after representative catalog-alignment regression guard:

- targeted planner tests: passed, `37 passed`
- full MCP-discovery suite: passed, `283 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5942 nodes`, `12912 edges`, `140 communities`

Latest validation after catalog-alignment reason-code telemetry:

- targeted planner/runtime tests: passed, `53 passed`
- full MCP-discovery suite: passed, `283 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5943 nodes`, `12915 edges`, `136 communities`

Latest validation after explicit catalog-alignment status propagation:

- targeted planner/runtime/debug tests: passed, `55 passed`
- full MCP-discovery suite: passed, `283 passed`, `9 skipped`
- `npm.cmd run build`: passed
- graphify rebuild: `5943 nodes`, `12915 edges`, `136 communities`

Latest validation after truth-harness catalog-alignment artifact surfacing:

- Python replay-tooling tests: passed, `4 passed`
- graphify rebuild: `5946 nodes`, `12918 edges`, `136 communities`

Latest validation after catalog-alignment divergence warning gate:

- Python replay-tooling tests: passed, `5 passed`
- graphify rebuild: `5947 nodes`, `12920 edges`, `138 communities`

Latest validation after catalog-alignment acceptance invariant:

- Python replay-tooling tests: passed, `6 passed`
- graphify rebuild: `5949 nodes`, `12923 edges`, `136 communities`

Latest validation after catalog-alignment spec assertions:

- Python replay-tooling tests: passed, `7 passed`
- graphify rebuild: `5951 nodes`, `12926 edges`, `139 communities`

Latest validation after phase66 catalog-alignment spec hardening:

- Python replay-tooling tests: passed, `7 passed`
- `load_truth_harness_spec` confirmed the phase66 expected top-match chain sequence: `value_flow`, `value_flow`, `value_flow`, `value_flow_comparison`, `value_flow_comparison`, `value_flow_ranking`, `value_flow_ranking`

Latest validation after phase32 catalog-alignment spec hardening and AGENT source-catalog surfacing:

- Python replay-tooling tests: passed, `9 passed`
- `load_truth_harness_spec` confirmed the phase32 expected top-match chain sequence: `entity_resolution`, `value_flow`, `value_flow`, `value_flow_comparison`, `document_evidence`, `movement_evidence`
- `agent_semantic_pack_builder.py inventory` regenerated `agent_semantic_source_catalog.*` with reusable `planner_catalog_alignment` coverage

Latest validation after phase83 mixed planner-brain spec generation:

- `scripts.test_agent_semantic_pack_builder`: passed, `3 passed`
- generated `address_truth_harness_phase83_planner_brain_alignment_mix.json`: `20` steps, `15` expected catalog top-match checks after the phase19/21/22 alignment hardening
- regenerated `agent_semantic_source_catalog.*`: `planner_catalog_alignment` is visible with `26` reusable entries, including phase32, phase66, and phase83 probes
- graphify rebuild: `5952 nodes`, `12927 edges`, `138 communities`

Prior live-readiness diagnosis after phase83 live replay and checked-source error sanitation:

- backend health is green on `http://127.0.0.1:8787/api/health`;
- proxy health is green on `http://127.0.0.1:6003/health`, with `pending_commands=0`, `active_channels_count=1`, and `active_sessions_count=0`;
- targeted checked-source sanitation tests still pass `61/61` with `1` skipped;
- `npm.cmd run build` still passes;
- full phase83 rerun `phase83_planner_brain_alignment_live_20260501_rerun4` again ended `partial`, with `8/20` pass, `2` warning, `10` fail, and `catalog_alignment_ok=true`;
- direct proxy `get_metadata` with a 180-second client timeout also timed out, so the remaining live blocker is below the assistant planner/backend layer: the proxy accepts requests, but the 1C side does not return read-only evidence in time;
- `scripts/check_mcp_live_readiness.py` now provides a repo-native preflight that separates backend/proxy health from confirmed live 1C evidence readiness before spending time on a full semantic replay.
- graphify rebuild after the readiness preflight/docs sync: `5970 nodes`, `12958 edges`, `140 communities`.

Prior follow-up diagnosis of the proxy/1C seam:

- `1cv8c` is running locally with the `MCP Toolkit - Бухгалтерия предприятия, редакция 2.0` window title, so the failure is not simply "1C process absent";
- observing a read-only `get_metadata` command on the `default` channel showed `pending_commands=1` for 15 seconds and no pickup by the 1C client;
- the diagnostic command was explicitly drained from `/1c/poll` and completed through `/1c/result` with a synthetic cancel result so the proxy queue stayed clean;
- the proxy health endpoint now exposes polling telemetry: `polling_channels_count`, `last_poll_at`, `last_delivered_command_at`, and optional `poll_activity_by_channel` when `HEALTH_INCLUDE_CHANNEL_DETAILS=true`;
- after proxy restart with this telemetry enabled, `polling_channels_count=0` stayed stable for 20 seconds, proving no `/1c/poll` activity reached the proxy;
- `scripts/check_mcp_live_readiness.py --confirm-live` now refuses to create a direct live probe when proxy health already proves no 1C polling activity, preventing abandoned pending commands during readiness checks.
- `domain_truth_harness.py run-live --require-mcp-live-readiness` now applies the same readiness gate before the first assistant step, writes `mcp_live_readiness.json`, and exits early when live 1C evidence is unavailable;
- smoke of that harness gate against phase83 stopped before step execution with `ready_for_live_replay=false`, so future blocked runs should no longer waste a full semantic replay just to rediscover the missing `/1c/poll`.
- readiness can now wait for polling before probing: `--wait-for-polling-seconds` in `check_mcp_live_readiness.py` and `--mcp-wait-for-polling-seconds` in `domain_truth_harness.py run-live`; a 2-second smoke waited twice, observed no polling, and skipped the live probe without leaving proxy queue garbage.

Latest validation after guarded phase83 acceptance and surface-grounded catalog promotion:

- targeted planner/response-policy/pilot/continuity slice: `npm.cmd test -- assistantMcpDiscoveryPlanner.test.ts assistantMcpDiscoveryResponsePolicy.test.ts assistantMcpDiscoveryPilotExecutor.test.ts assistantContinuityPolicy.test.ts` passed `109/109`;
- `npm.cmd run build`: passed;
- graphify rebuild: `5973 nodes`, `12971 edges`, `138 communities`;
- live-readiness preflight after backend restart: `mcp_live_readiness_phase83_rerun3_after_backend_restart.json` reported `ready`;
- full guarded phase83 replay: `phase83_planner_brain_alignment_live_20260501_readygate_rerun3` accepted `20/20`, `0` warnings, `0` failures;
- final invariant result: `catalog_alignment_ok=true`, `direct_answer_ok=true`, `temporal_honesty_ok=true`, `selected_object_continuity_ok=true`, `truth_gate_ok=true`, `human_answer_quality_ok=true`, and `meta_context_integrity_ok=true`;
- the previously warning step `step_02_neutral_followup_catalog_drilldown` now reports `catalog_alignment_status=selected_matches_top`, `catalog_top_match=catalog_drilldown`, and `catalog_selected_matches_top=True`.
- saved autorun canary: `AGENT | Planner Autonomy phase83: мозг маршрутов, pivots и legacy continuity` (`gen-ag05011759-6f85fc`), sourced from the accepted phase83 spec after the live replay was reviewed.

## Next Step

The declared Planner Autonomy Consolidation slice is now closed for the phase83 acceptance target.

Keep using the live preflight before future full replays:

`python scripts/check_mcp_live_readiness.py --confirm-live --wait-for-polling-seconds 60 --poll-interval-seconds 2 --output-json artifacts/runtime/mcp_live_readiness_phase83.json`

Run future full candidates with the built-in gate:

`python scripts/domain_truth_harness.py run-live --spec docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json --output-dir artifacts/domain_runs/phase83_planner_brain_alignment_live_<stamp> --require-mcp-live-readiness --mcp-wait-for-polling-seconds 60 --mcp-poll-interval-seconds 2`

Only when readiness reports `ready_for_live_replay=true` should a full replay be treated as meaningful business-evidence proof. If it reports no `/1c/poll` activity, fix the 1C toolkit client/session/channel first; another full replay will only reproduce checked-source partial answers.

Recommended order:

1. save the accepted phase83 pack into autoruns only if the product flow needs it as a legacy AGENT canary;
2. continue broader open-world bounded autonomy with phase83 as a regression gate, not as an open blocker;
3. broaden catalog scoring into unfamiliar 1C asks where metadata surface and data-need graph can pick reviewed lanes;
4. grow primitive descriptors only where live replay shows a real evidence gap;
5. keep phase19, phase21, phase22, value-flow, metadata ambiguity, inventory-stock, and phase83 as regression gates.

The key rule remains:

- do not hide a domain workaround inside the planner;
- promote repeated successful domain behavior into a reviewed primitive or chain template.

## Status Canon Addendum - 2026-05-01

The declared Planner Autonomy Consolidation slice is closed at `100%` for the phase83 acceptance target.

Future work should not lower this module's percentage.

If new defects appear while broadening unfamiliar 1C asks, track them under the next active module:

- `Open-World Bounded Autonomy Breadth`

Keep phase83 as a regression gate for that next module:

- catalog alignment must remain visible in replay artifacts;
- direct-answer and human-answer quality remain acceptance invariants;
- live-readiness preflight must run before expensive live semantic replays;
- checked-source failures must not leak raw MCP/internal continuation errors into the user-facing answer.

The short status source of truth is:

- [21 - current_status_canon_2026-05-01.md](./21%20-%20current_status_canon_2026-05-01.md)