19 KiB
20 - Planner Autonomy Consolidation (2026-05-01)
Purpose
This note starts the consolidation layer after the first accepted inventory-stock breadth proof.
The goal is to move from:
- domain pack proves one more slice;
- planner still carries too many local recipe branches;
to:
- reusable MCP primitive and chain descriptors;
- planner-selected route fabric;
- domain packs as semantic gates, not as the main design mechanism.
This is the continuation of the original "MCP as bounded brain" goal.
Architectural Reading
The target is not an unrestricted model agent.
The target remains:
user question -> data_need_graph -> catalog chain template -> reviewed primitives -> bounded evidence loop -> truth gate -> answer
The LLM may help choose the path, but only inside reviewed MCP boundaries.
Code Steps
The first consolidation step adds reusable chain templates to assistantMcpCatalogIndex.
The catalog now describes not only primitive contracts, but also planner route-fabric templates:
metadata_inspectioncatalog_drilldownentity_resolutiondocument_evidencemovement_evidencevalue_flowvalue_flow_comparisonvalue_flow_rankinglifecycle
Each template declares:
- semantic data need;
- human-readable chain summary;
- fallback primitive sequence;
- base required axes;
- supported fact/action families;
- planning tags;
- evidence-gate requirement.
The planner now instantiates selected evidence chains from this catalog for the first base lanes instead of keeping all route meaning only in local planner branches.
The follow-up consolidation step moved the value-flow planner seams onto the same catalog fabric:
- bidirectional incoming-vs-outgoing comparison now instantiates
value_flow_comparison, including explicit-counterparty comparison graphs rather than only subjectless organization-scope graphs; - ranked revenue/payment questions now instantiate
value_flow_ranking; - organization-scoped open totals now instantiate
value_flowwith subjectless primitives but catalog-owned axes and evidence-gate semantics; - heuristic fallback routes for value-flow, lifecycle, metadata, movement, document, entity, and unclassified metadata inspection now also use catalog chain templates.
This keeps behavior stable while making the planner's route meaning inspectable through catalog descriptors instead of only through local recipeFor() branches.
The next consolidation step strengthened lifecycle as a bounded inference chain instead of a loose age-like shortcut:
- the lifecycle template now declares
activity_windowandlegal_fact_boundaryaxes; - the template summary explicitly frames the result as a first/latest confirmed 1C activity window, not legal registration age;
- planner graph and fallback recipes now emit lifecycle bounded-inference reason codes;
- lifecycle evidence facts include the matched row count, first/latest confirmed activity dates, and an explicit legal-fact boundary.
Two arbitration seams were also hardened because they are part of the same planner-autonomy surface:
- current-turn value-flow aggregate questions can override supported exact legacy routes when the user asks for amount/net/payment totals and the exact route would only produce a narrower lookup/list answer;
- broad business evaluation (
broad_business_evaluation) is intentionally kept in the deterministic living-chat bridge instead of being displaced by generic metadata discovery.
These changes keep the route fabric broader without letting the planner pretend that inferred evidence is a formally proven legal fact.
The following consolidation step promoted the accepted inventory-stock breadth behavior into reviewed catalog route fabric:
inventory_stock_snapshotinventory_supplier_overlapinventory_purchase_provenanceinventory_sale_trace
These templates are now first-class catalog chain descriptors and can be selected by the data-need graph/planner. They reuse reviewed generic primitives (query_movements, query_documents, aggregate_by_axis, drilldown_related_objects, probe_coverage, explain_evidence_basis) and add inventory-specific axes such as as_of_date, warehouse, supplier, buyer, quantity, and evidence_basis.
The first runtime bridge for these inventory templates now delegates through existing exact inventory recipes instead of inventing a new generic inventory executor:
inventory_stock_snapshot->inventory_on_hand_as_of_dateinventory_supplier_overlap->inventory_supplier_stock_overlap_as_of_dateinventory_purchase_provenance->inventory_purchase_provenance_for_iteminventory_sale_trace->inventory_sale_trace_for_item
The bridge keeps the reviewed MCP route fabric as the planner surface, but uses addressRecipeCatalog exact queries and account scope 41.01 as the evidence source. Root inventory templates execute through query_movements; selected-item provenance/sale templates execute through query_documents. Missing selected-item anchors remain clarification, not a guessed item.
The runtime answer boundary still makes unsupported or unconfirmed inventory states explicit:
- unsupported inventory route templates get a user-facing "template selected, live execution not yet bridged" answer instead of a generic checked-sources fallback;
must_not_claimforbids presenting inventory planning as executed stock, supplier, purchase, or sale evidence;- technical unsupported-pilot limitation text is filtered out of user-facing lines, while existing bounded unknowns for lifecycle/value-flow remain intact.
The next local scoring step broadened metadata-surface autonomy without adding a new hard domain route:
- if a confirmed metadata surface is unambiguous and only exposes
Document.*,Register.*, orCatalog.*objects, the planner can infer the next reviewed lane even when upstream has not yet filleddownstream_route_family; - inferred document surfaces instantiate
document_evidence; - inferred register/movement surfaces instantiate
movement_evidence; - inferred catalog surfaces instantiate
catalog_drilldown; - mixed or ambiguous surfaces still do not guess and continue through clarification / explicit data-need scoring.
The following consolidation step added catalog-level chain-template scoring:
assistantMcpCatalogIndexcan now score reviewedchain_templatesdirectly from fact family, action family, required axes, comparison, ranking, and aggregation needs;- comparison-shaped value-flow ranks
value_flow_comparisonabove the generic value-flow template; - ranking-shaped value-flow ranks
value_flow_rankingabove the generic value-flow template; - document/movement/inventory/lifecycle templates can now be inspected as catalog search results, not only as local planner branch constants;
assistantMcpDiscoveryPlannerrecords the top catalog chain-template match in reason codes and exposes the ranked matches ascatalog_chain_template_matchesin the planner contract while preserving existing guarded execution behavior.- the ranked chain-template matches are now propagated into runtime loop state and debug attachment fields, so replay analysis can inspect catalog-fabric intent without parsing reason-code strings.
catalog_chain_template_alignmentnow records whether the selected chain is the top catalog match, its rank, and whether it appeared in the catalog search results; runtime loop state and debug summary expose the same verdict.- planner reason codes now emit stable catalog-alignment telemetry for evaluated top-match, selected-equals-top, selected-lower-rank, selected-outside-match-set, and unscored selected-chain states.
catalog_chain_template_alignment.alignment_statusnow carries the same verdict as one enum-like field, and debug summary exposes it asmcp_discovery_catalog_chain_alignment_status.domain_truth_harnessandscenario_acceptance_policynow carry the alignment status, top catalog match, and selected-matches-top flag into replay artifacts instead of leaving them buried in raw debug JSON.- truth-harness now raises a warning finding for
selected_lower_rankandselected_outside_match_setalignment states unless the replay spec explicitly marksallow_catalog_alignment_divergence. - scenario acceptance now groups that warning under
catalog_alignment_ok, andfinal_status.mdprints the invariant alongside direct-answer, temporal, truth-gate, human-answer, meta-context, and selected-object gates. - truth-harness specs can now assert
expected_catalog_alignment_status,expected_catalog_chain_top_match, andexpected_catalog_selected_matches_topon each step. address_truth_harness_phase66_human_org_open_scope_dialog.jsonnow uses those fields to assertvalue_flow,value_flow_comparison, andvalue_flow_rankingtop matches across the open-organization money dialog.address_truth_harness_phase32_planner_selected_chain_end_to_end.jsonnow uses the same assertions across selected-counterparty entity grounding, incoming/outgoing/net value-flow, document evidence, and movement evidence follow-ups.agent_semantic_pack_buildernow preserves these expected catalog-alignment fields in the reusable source catalog and adds theplanner_catalog_alignmenttag, so future mixed AGENT packs can deliberately select planner-brain regression probes instead of relying on hand-picked replay filenames.
Why This Matters
This reduces the pressure to add one hard route per user wording.
Future domain enablement should prefer:
- add or strengthen primitive descriptors;
- add or strengthen chain templates;
- let data-need graph and catalog search assemble the path;
- use domain packs to verify the scenario tree and catch semantic drift.
Domain-specific exact recipes can still exist as fast paths, but they should not be the only way the assistant understands a new business question.
Validation
Local validation after the catalog-template, value-flow, metadata-lane scoring, lifecycle bounded-inference, current-turn value-flow arbitration, and broad-evaluation bridge steps:
npm.cmd test -- assistantMcpCatalogIndex.test.ts assistantMcpDiscoveryPlanner.test.ts: passed,47 passed- MCP-discovery suite: passed,
227 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5911 nodes,12830 edges,138 communities - live value-flow canary:
address_truth_harness_phase66_human_org_open_scope_dialog_planner_template_rerun2, accepted7/7 - live metadata movement canary:
address_truth_harness_phase52_metadata_movement_full_recovery_planner_metadata_scoring_rerun2, accepted4/4 - live metadata document canary:
address_truth_harness_phase54_metadata_document_full_recovery_planner_metadata_scoring_rerun2, accepted4/4
Additional code-level consolidation:
- ambiguous metadata surfaces no longer carry both document and movement primitives when the current data-need graph explicitly selects
document_evidenceormovement_evidence; - thin neutral metadata follow-ups still do not force a lane and keep the clarification boundary intact;
- planner reason codes now expose when an explicit lane family is scored against carried metadata ambiguity:
planner_metadata_surface_scored_with_explicit_lane_family.
Latest validation after the lifecycle and arbitration hardening:
- targeted lifecycle/catalog/planner/answer tests: passed,
75 passed,1 skipped - full MCP-discovery suite: passed,
268 passed,9 skipped - broad MCP/living-chat/route/meaning slice: passed,
305 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5912 nodes,12833 edges,138 communities - live lifecycle/value-flow response gate:
address_truth_harness_phase19_mcp_discovery_response_gate_planner_lifecycle_rerun4, accepted8/8 - live broad-eval to net-flow follow-up:
address_truth_harness_phase21_net_followup_after_broad_eval_planner_lifecycle_rerun2, accepted3/3 - live broad-evaluation bridge:
address_truth_harness_phase22_broad_business_evaluation_bridge_planner_lifecycle_rerun2, accepted3/3
Latest validation after the inventory catalog-template lift:
- targeted catalog/data-need/planner/turn-input tests: passed,
139 passed,6 skipped - full MCP-discovery suite: passed,
276 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5912 nodes,12833 edges,138 communities
Latest validation after the inventory runtime-boundary hardening:
- targeted runtime-bridge/answer-adapter/pilot-executor tests: passed,
68 passed,1 skipped - full MCP-discovery suite: passed,
277 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5913 nodes,12837 edges,138 communities
Latest validation after the inventory exact-runtime bridge:
- targeted runtime-bridge/answer-adapter/pilot-executor tests: passed,
70 passed,1 skipped - full MCP-discovery suite: passed,
279 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5930 nodes,12884 edges,135 communities
Latest validation after unambiguous metadata-surface lane inference:
- targeted planner tests: passed,
36 passed - full MCP-discovery suite: passed,
281 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5937 nodes,12899 edges,138 communities - live inventory full-pack attempt:
inventory_stock_exact_bridge_live_20260501_after_runtime_bridge, statuspartial - live attempt interpretation: route/intent/recipe/capability selection matched, but MCP execution failed with
MCP fetch failed: This operation was aborted; direct proxyget_metadataalso timed out while/healthreportedactive_sessions_count=0and pending commands, so this is an infrastructure/polling-session blocker rather than accepted semantic evidence.
Latest validation after catalog chain-template scoring:
- targeted catalog/planner tests: passed,
54 passed - full MCP-discovery suite: passed,
282 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5938 nodes,12903 edges,139 communities
Latest validation after structured catalog chain-template contract exposure:
- targeted planner tests: passed,
36 passed - full MCP-discovery suite: passed,
282 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5939 nodes,12906 edges,138 communities
Latest validation after runtime/debug propagation of structured chain matches:
- targeted runtime/debug tests: passed,
18 passed - full MCP-discovery suite: passed,
282 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5940 nodes,12909 edges,137 communities
Latest validation after subject-aware bidirectional comparison arbitration:
- targeted planner tests: passed,
36 passed - full MCP-discovery suite: passed,
282 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5940 nodes,12909 edges,137 communities
Latest validation after structured catalog chain-template alignment verdict:
- targeted planner/runtime/debug tests: passed,
54 passed - full MCP-discovery suite: passed,
282 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5941 nodes,12911 edges,136 communities
Latest validation after representative catalog-alignment regression guard:
- targeted planner tests: passed,
37 passed - full MCP-discovery suite: passed,
283 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5942 nodes,12912 edges,140 communities
Latest validation after catalog-alignment reason-code telemetry:
- targeted planner/runtime tests: passed,
53 passed - full MCP-discovery suite: passed,
283 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5943 nodes,12915 edges,136 communities
Latest validation after explicit catalog-alignment status propagation:
- targeted planner/runtime/debug tests: passed,
55 passed - full MCP-discovery suite: passed,
283 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5943 nodes,12915 edges,136 communities
Latest validation after truth-harness catalog-alignment artifact surfacing:
- Python replay-tooling tests: passed,
4 passed - graphify rebuild:
5946 nodes,12918 edges,136 communities
Latest validation after catalog-alignment divergence warning gate:
- Python replay-tooling tests: passed,
5 passed - graphify rebuild:
5947 nodes,12920 edges,138 communities
Latest validation after catalog-alignment acceptance invariant:
- Python replay-tooling tests: passed,
6 passed - graphify rebuild:
5949 nodes,12923 edges,136 communities
Latest validation after catalog-alignment spec assertions:
- Python replay-tooling tests: passed,
7 passed - graphify rebuild:
5951 nodes,12926 edges,139 communities
Latest validation after phase66 catalog-alignment spec hardening:
- Python replay-tooling tests: passed,
7 passed load_truth_harness_specconfirmed the phase66 expected top-match chain sequence:value_flow,value_flow,value_flow,value_flow_comparison,value_flow_comparison,value_flow_ranking,value_flow_ranking
Latest validation after phase32 catalog-alignment spec hardening and AGENT source-catalog surfacing:
- Python replay-tooling tests: passed,
9 passed load_truth_harness_specconfirmed the phase32 expected top-match chain sequence:entity_resolution,value_flow,value_flow,value_flow_comparison,document_evidence,movement_evidenceagent_semantic_pack_builder.py inventoryregeneratedagent_semantic_source_catalog.*with reusableplanner_catalog_alignmentcoverage
Next Step
The next safe step is still to re-run live replay once the 1C side is actively polling the proxy. In parallel, local-only consolidation can continue by using the regenerated AGENT source catalog to assemble mixed planner-brain canaries, hardening additional planner-autonomy specs with expected catalog-chain assertions, and using alignment_status, alignment reason-code telemetry, truth-harness artifact surfacing, the soft divergence warning, catalog_alignment_ok, and the representative guard to find remaining manual branches where selected chains diverge from reviewed catalog-fabric intent.
Recommended order:
- reconnect or restart the 1C toolkit polling side, then rerun the inventory canary against live 1C/MCP;
- rerun a mixed cross-stage canary after the inventory canary is semantically clean;
- continue broadening catalog scoring into unfamiliar 1C asks where metadata surface and data-need graph can pick reviewed lanes;
- grow primitive descriptors only where live replay shows a real evidence gap;
- keep phase19, phase21, phase22, value-flow, metadata ambiguity, and inventory-stock canaries as regression gates.
The key rule remains:
- do not hide a domain workaround inside the planner;
- promote repeated successful domain behavior into a reviewed primitive or chain template.