19 KiB

Raw Blame History

20 - Planner Autonomy Consolidation (2026-05-01)

Purpose

This note starts the consolidation layer after the first accepted inventory-stock breadth proof.

The goal is to move from:

domain pack proves one more slice;
planner still carries too many local recipe branches;

to:

reusable MCP primitive and chain descriptors;
planner-selected route fabric;
domain packs as semantic gates, not as the main design mechanism.

This is the continuation of the original "MCP as bounded brain" goal.

Architectural Reading

The target is not an unrestricted model agent.

The target remains:

user question -> data_need_graph -> catalog chain template -> reviewed primitives -> bounded evidence loop -> truth gate -> answer

The LLM may help choose the path, but only inside reviewed MCP boundaries.

Code Steps

The first consolidation step adds reusable chain templates to assistantMcpCatalogIndex.

The catalog now describes not only primitive contracts, but also planner route-fabric templates:

metadata_inspection
catalog_drilldown
entity_resolution
document_evidence
movement_evidence
value_flow
value_flow_comparison
value_flow_ranking
lifecycle

Each template declares:

semantic data need;
human-readable chain summary;
fallback primitive sequence;
base required axes;
supported fact/action families;
planning tags;
evidence-gate requirement.

The planner now instantiates selected evidence chains from this catalog for the first base lanes instead of keeping all route meaning only in local planner branches.

The follow-up consolidation step moved the value-flow planner seams onto the same catalog fabric:

bidirectional incoming-vs-outgoing comparison now instantiates value_flow_comparison, including explicit-counterparty comparison graphs rather than only subjectless organization-scope graphs;
ranked revenue/payment questions now instantiate value_flow_ranking;
organization-scoped open totals now instantiate value_flow with subjectless primitives but catalog-owned axes and evidence-gate semantics;
heuristic fallback routes for value-flow, lifecycle, metadata, movement, document, entity, and unclassified metadata inspection now also use catalog chain templates.

This keeps behavior stable while making the planner's route meaning inspectable through catalog descriptors instead of only through local recipeFor() branches.

The next consolidation step strengthened lifecycle as a bounded inference chain instead of a loose age-like shortcut:

the lifecycle template now declares activity_window and legal_fact_boundary axes;
the template summary explicitly frames the result as a first/latest confirmed 1C activity window, not legal registration age;
planner graph and fallback recipes now emit lifecycle bounded-inference reason codes;
lifecycle evidence facts include the matched row count, first/latest confirmed activity dates, and an explicit legal-fact boundary.

Two arbitration seams were also hardened because they are part of the same planner-autonomy surface:

current-turn value-flow aggregate questions can override supported exact legacy routes when the user asks for amount/net/payment totals and the exact route would only produce a narrower lookup/list answer;
broad business evaluation (broad_business_evaluation) is intentionally kept in the deterministic living-chat bridge instead of being displaced by generic metadata discovery.

These changes keep the route fabric broader without letting the planner pretend that inferred evidence is a formally proven legal fact.

The following consolidation step promoted the accepted inventory-stock breadth behavior into reviewed catalog route fabric:

inventory_stock_snapshot
inventory_supplier_overlap
inventory_purchase_provenance
inventory_sale_trace

These templates are now first-class catalog chain descriptors and can be selected by the data-need graph/planner. They reuse reviewed generic primitives (query_movements, query_documents, aggregate_by_axis, drilldown_related_objects, probe_coverage, explain_evidence_basis) and add inventory-specific axes such as as_of_date, warehouse, supplier, buyer, quantity, and evidence_basis.

The first runtime bridge for these inventory templates now delegates through existing exact inventory recipes instead of inventing a new generic inventory executor:

inventory_stock_snapshot -> inventory_on_hand_as_of_date
inventory_supplier_overlap -> inventory_supplier_stock_overlap_as_of_date
inventory_purchase_provenance -> inventory_purchase_provenance_for_item
inventory_sale_trace -> inventory_sale_trace_for_item

The bridge keeps the reviewed MCP route fabric as the planner surface, but uses addressRecipeCatalog exact queries and account scope 41.01 as the evidence source. Root inventory templates execute through query_movements; selected-item provenance/sale templates execute through query_documents. Missing selected-item anchors remain clarification, not a guessed item.

The runtime answer boundary still makes unsupported or unconfirmed inventory states explicit:

unsupported inventory route templates get a user-facing "template selected, live execution not yet bridged" answer instead of a generic checked-sources fallback;
must_not_claim forbids presenting inventory planning as executed stock, supplier, purchase, or sale evidence;
technical unsupported-pilot limitation text is filtered out of user-facing lines, while existing bounded unknowns for lifecycle/value-flow remain intact.

The next local scoring step broadened metadata-surface autonomy without adding a new hard domain route:

if a confirmed metadata surface is unambiguous and only exposes Document.*, Register.*, or Catalog.* objects, the planner can infer the next reviewed lane even when upstream has not yet filled downstream_route_family;
inferred document surfaces instantiate document_evidence;
inferred register/movement surfaces instantiate movement_evidence;
inferred catalog surfaces instantiate catalog_drilldown;
mixed or ambiguous surfaces still do not guess and continue through clarification / explicit data-need scoring.

The following consolidation step added catalog-level chain-template scoring:

assistantMcpCatalogIndex can now score reviewed chain_templates directly from fact family, action family, required axes, comparison, ranking, and aggregation needs;
comparison-shaped value-flow ranks value_flow_comparison above the generic value-flow template;
ranking-shaped value-flow ranks value_flow_ranking above the generic value-flow template;
document/movement/inventory/lifecycle templates can now be inspected as catalog search results, not only as local planner branch constants;
assistantMcpDiscoveryPlanner records the top catalog chain-template match in reason codes and exposes the ranked matches as catalog_chain_template_matches in the planner contract while preserving existing guarded execution behavior.
the ranked chain-template matches are now propagated into runtime loop state and debug attachment fields, so replay analysis can inspect catalog-fabric intent without parsing reason-code strings.
catalog_chain_template_alignment now records whether the selected chain is the top catalog match, its rank, and whether it appeared in the catalog search results; runtime loop state and debug summary expose the same verdict.
planner reason codes now emit stable catalog-alignment telemetry for evaluated top-match, selected-equals-top, selected-lower-rank, selected-outside-match-set, and unscored selected-chain states.
catalog_chain_template_alignment.alignment_status now carries the same verdict as one enum-like field, and debug summary exposes it as mcp_discovery_catalog_chain_alignment_status.
domain_truth_harness and scenario_acceptance_policy now carry the alignment status, top catalog match, and selected-matches-top flag into replay artifacts instead of leaving them buried in raw debug JSON.
truth-harness now raises a warning finding for selected_lower_rank and selected_outside_match_set alignment states unless the replay spec explicitly marks allow_catalog_alignment_divergence.
scenario acceptance now groups that warning under catalog_alignment_ok, and final_status.md prints the invariant alongside direct-answer, temporal, truth-gate, human-answer, meta-context, and selected-object gates.
truth-harness specs can now assert expected_catalog_alignment_status, expected_catalog_chain_top_match, and expected_catalog_selected_matches_top on each step.
address_truth_harness_phase66_human_org_open_scope_dialog.json now uses those fields to assert value_flow, value_flow_comparison, and value_flow_ranking top matches across the open-organization money dialog.
address_truth_harness_phase32_planner_selected_chain_end_to_end.json now uses the same assertions across selected-counterparty entity grounding, incoming/outgoing/net value-flow, document evidence, and movement evidence follow-ups.
agent_semantic_pack_builder now preserves these expected catalog-alignment fields in the reusable source catalog and adds the planner_catalog_alignment tag, so future mixed AGENT packs can deliberately select planner-brain regression probes instead of relying on hand-picked replay filenames.

Why This Matters

This reduces the pressure to add one hard route per user wording.

Future domain enablement should prefer:

add or strengthen primitive descriptors;
add or strengthen chain templates;
let data-need graph and catalog search assemble the path;
use domain packs to verify the scenario tree and catch semantic drift.

Domain-specific exact recipes can still exist as fast paths, but they should not be the only way the assistant understands a new business question.

Validation

Local validation after the catalog-template, value-flow, metadata-lane scoring, lifecycle bounded-inference, current-turn value-flow arbitration, and broad-evaluation bridge steps:

npm.cmd test -- assistantMcpCatalogIndex.test.ts assistantMcpDiscoveryPlanner.test.ts: passed, 47 passed
MCP-discovery suite: passed, 227 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5911 nodes, 12830 edges, 138 communities
live value-flow canary: address_truth_harness_phase66_human_org_open_scope_dialog_planner_template_rerun2, accepted 7/7
live metadata movement canary: address_truth_harness_phase52_metadata_movement_full_recovery_planner_metadata_scoring_rerun2, accepted 4/4
live metadata document canary: address_truth_harness_phase54_metadata_document_full_recovery_planner_metadata_scoring_rerun2, accepted 4/4

Additional code-level consolidation:

ambiguous metadata surfaces no longer carry both document and movement primitives when the current data-need graph explicitly selects document_evidence or movement_evidence;
thin neutral metadata follow-ups still do not force a lane and keep the clarification boundary intact;
planner reason codes now expose when an explicit lane family is scored against carried metadata ambiguity: planner_metadata_surface_scored_with_explicit_lane_family.

Latest validation after the lifecycle and arbitration hardening:

targeted lifecycle/catalog/planner/answer tests: passed, 75 passed, 1 skipped
full MCP-discovery suite: passed, 268 passed, 9 skipped
broad MCP/living-chat/route/meaning slice: passed, 305 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5912 nodes, 12833 edges, 138 communities
live lifecycle/value-flow response gate: address_truth_harness_phase19_mcp_discovery_response_gate_planner_lifecycle_rerun4, accepted 8/8
live broad-eval to net-flow follow-up: address_truth_harness_phase21_net_followup_after_broad_eval_planner_lifecycle_rerun2, accepted 3/3
live broad-evaluation bridge: address_truth_harness_phase22_broad_business_evaluation_bridge_planner_lifecycle_rerun2, accepted 3/3

Latest validation after the inventory catalog-template lift:

targeted catalog/data-need/planner/turn-input tests: passed, 139 passed, 6 skipped
full MCP-discovery suite: passed, 276 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5912 nodes, 12833 edges, 138 communities

Latest validation after the inventory runtime-boundary hardening:

targeted runtime-bridge/answer-adapter/pilot-executor tests: passed, 68 passed, 1 skipped
full MCP-discovery suite: passed, 277 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5913 nodes, 12837 edges, 138 communities

Latest validation after the inventory exact-runtime bridge:

targeted runtime-bridge/answer-adapter/pilot-executor tests: passed, 70 passed, 1 skipped
full MCP-discovery suite: passed, 279 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5930 nodes, 12884 edges, 135 communities

Latest validation after unambiguous metadata-surface lane inference:

targeted planner tests: passed, 36 passed
full MCP-discovery suite: passed, 281 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5937 nodes, 12899 edges, 138 communities
live inventory full-pack attempt: inventory_stock_exact_bridge_live_20260501_after_runtime_bridge, status partial
live attempt interpretation: route/intent/recipe/capability selection matched, but MCP execution failed with MCP fetch failed: This operation was aborted; direct proxy get_metadata also timed out while /health reported active_sessions_count=0 and pending commands, so this is an infrastructure/polling-session blocker rather than accepted semantic evidence.

Latest validation after catalog chain-template scoring:

targeted catalog/planner tests: passed, 54 passed
full MCP-discovery suite: passed, 282 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5938 nodes, 12903 edges, 139 communities

Latest validation after structured catalog chain-template contract exposure:

targeted planner tests: passed, 36 passed
full MCP-discovery suite: passed, 282 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5939 nodes, 12906 edges, 138 communities

Latest validation after runtime/debug propagation of structured chain matches:

targeted runtime/debug tests: passed, 18 passed
full MCP-discovery suite: passed, 282 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5940 nodes, 12909 edges, 137 communities

Latest validation after subject-aware bidirectional comparison arbitration:

targeted planner tests: passed, 36 passed
full MCP-discovery suite: passed, 282 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5940 nodes, 12909 edges, 137 communities

Latest validation after structured catalog chain-template alignment verdict:

targeted planner/runtime/debug tests: passed, 54 passed
full MCP-discovery suite: passed, 282 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5941 nodes, 12911 edges, 136 communities

Latest validation after representative catalog-alignment regression guard:

targeted planner tests: passed, 37 passed
full MCP-discovery suite: passed, 283 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5942 nodes, 12912 edges, 140 communities

Latest validation after catalog-alignment reason-code telemetry:

targeted planner/runtime tests: passed, 53 passed
full MCP-discovery suite: passed, 283 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5943 nodes, 12915 edges, 136 communities

Latest validation after explicit catalog-alignment status propagation:

targeted planner/runtime/debug tests: passed, 55 passed
full MCP-discovery suite: passed, 283 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5943 nodes, 12915 edges, 136 communities

Latest validation after truth-harness catalog-alignment artifact surfacing:

Python replay-tooling tests: passed, 4 passed
graphify rebuild: 5946 nodes, 12918 edges, 136 communities

Latest validation after catalog-alignment divergence warning gate:

Python replay-tooling tests: passed, 5 passed
graphify rebuild: 5947 nodes, 12920 edges, 138 communities

Latest validation after catalog-alignment acceptance invariant:

Python replay-tooling tests: passed, 6 passed
graphify rebuild: 5949 nodes, 12923 edges, 136 communities

Latest validation after catalog-alignment spec assertions:

Python replay-tooling tests: passed, 7 passed
graphify rebuild: 5951 nodes, 12926 edges, 139 communities

Latest validation after phase66 catalog-alignment spec hardening:

Python replay-tooling tests: passed, 7 passed
load_truth_harness_spec confirmed the phase66 expected top-match chain sequence: value_flow, value_flow, value_flow, value_flow_comparison, value_flow_comparison, value_flow_ranking, value_flow_ranking

Latest validation after phase32 catalog-alignment spec hardening and AGENT source-catalog surfacing:

Python replay-tooling tests: passed, 9 passed
load_truth_harness_spec confirmed the phase32 expected top-match chain sequence: entity_resolution, value_flow, value_flow, value_flow_comparison, document_evidence, movement_evidence
agent_semantic_pack_builder.py inventory regenerated agent_semantic_source_catalog.* with reusable planner_catalog_alignment coverage

Next Step

The next safe step is still to re-run live replay once the 1C side is actively polling the proxy. In parallel, local-only consolidation can continue by using the regenerated AGENT source catalog to assemble mixed planner-brain canaries, hardening additional planner-autonomy specs with expected catalog-chain assertions, and using alignment_status, alignment reason-code telemetry, truth-harness artifact surfacing, the soft divergence warning, catalog_alignment_ok, and the representative guard to find remaining manual branches where selected chains diverge from reviewed catalog-fabric intent.

Recommended order:

reconnect or restart the 1C toolkit polling side, then rerun the inventory canary against live 1C/MCP;
rerun a mixed cross-stage canary after the inventory canary is semantically clean;
continue broadening catalog scoring into unfamiliar 1C asks where metadata surface and data-need graph can pick reviewed lanes;
grow primitive descriptors only where live replay shows a real evidence gap;
keep phase19, phase21, phase22, value-flow, metadata ambiguity, and inventory-stock canaries as regression gates.

The key rule remains:

do not hide a domain workaround inside the planner;
promote repeated successful domain behavior into a reviewed primitive or chain template.

19 KiB Raw Blame History