25 KiB
20 - Planner Autonomy Consolidation (2026-05-01)
Purpose
This note starts the consolidation layer after the first accepted inventory-stock breadth proof.
The goal is to move from:
- domain pack proves one more slice;
- planner still carries too many local recipe branches;
to:
- reusable MCP primitive and chain descriptors;
- planner-selected route fabric;
- domain packs as semantic gates, not as the main design mechanism.
This is the continuation of the original "MCP as bounded brain" goal.
Architectural Reading
The target is not an unrestricted model agent.
The target remains:
user question -> data_need_graph -> catalog chain template -> reviewed primitives -> bounded evidence loop -> truth gate -> answer
The LLM may help choose the path, but only inside reviewed MCP boundaries.
Code Steps
The first consolidation step adds reusable chain templates to assistantMcpCatalogIndex.
The catalog now describes not only primitive contracts, but also planner route-fabric templates:
metadata_inspectioncatalog_drilldownentity_resolutiondocument_evidencemovement_evidencevalue_flowvalue_flow_comparisonvalue_flow_rankinglifecycle
Each template declares:
- semantic data need;
- human-readable chain summary;
- fallback primitive sequence;
- base required axes;
- supported fact/action families;
- planning tags;
- evidence-gate requirement.
The planner now instantiates selected evidence chains from this catalog for the first base lanes instead of keeping all route meaning only in local planner branches.
The follow-up consolidation step moved the value-flow planner seams onto the same catalog fabric:
- bidirectional incoming-vs-outgoing comparison now instantiates
value_flow_comparison, including explicit-counterparty comparison graphs rather than only subjectless organization-scope graphs; - ranked revenue/payment questions now instantiate
value_flow_ranking; - organization-scoped open totals now instantiate
value_flowwith subjectless primitives but catalog-owned axes and evidence-gate semantics; - heuristic fallback routes for value-flow, lifecycle, metadata, movement, document, entity, and unclassified metadata inspection now also use catalog chain templates.
This keeps behavior stable while making the planner's route meaning inspectable through catalog descriptors instead of only through local recipeFor() branches.
The next consolidation step strengthened lifecycle as a bounded inference chain instead of a loose age-like shortcut:
- the lifecycle template now declares
activity_windowandlegal_fact_boundaryaxes; - the template summary explicitly frames the result as a first/latest confirmed 1C activity window, not legal registration age;
- planner graph and fallback recipes now emit lifecycle bounded-inference reason codes;
- lifecycle evidence facts include the matched row count, first/latest confirmed activity dates, and an explicit legal-fact boundary.
Two arbitration seams were also hardened because they are part of the same planner-autonomy surface:
- current-turn value-flow aggregate questions can override supported exact legacy routes when the user asks for amount/net/payment totals and the exact route would only produce a narrower lookup/list answer;
- broad business evaluation (
broad_business_evaluation) is intentionally kept in the deterministic living-chat bridge instead of being displaced by generic metadata discovery.
These changes keep the route fabric broader without letting the planner pretend that inferred evidence is a formally proven legal fact.
The following consolidation step promoted the accepted inventory-stock breadth behavior into reviewed catalog route fabric:
inventory_stock_snapshotinventory_supplier_overlapinventory_purchase_provenanceinventory_sale_trace
These templates are now first-class catalog chain descriptors and can be selected by the data-need graph/planner. They reuse reviewed generic primitives (query_movements, query_documents, aggregate_by_axis, drilldown_related_objects, probe_coverage, explain_evidence_basis) and add inventory-specific axes such as as_of_date, warehouse, supplier, buyer, quantity, and evidence_basis.
The first runtime bridge for these inventory templates now delegates through existing exact inventory recipes instead of inventing a new generic inventory executor:
inventory_stock_snapshot->inventory_on_hand_as_of_dateinventory_supplier_overlap->inventory_supplier_stock_overlap_as_of_dateinventory_purchase_provenance->inventory_purchase_provenance_for_iteminventory_sale_trace->inventory_sale_trace_for_item
The bridge keeps the reviewed MCP route fabric as the planner surface, but uses addressRecipeCatalog exact queries and account scope 41.01 as the evidence source. Root inventory templates execute through query_movements; selected-item provenance/sale templates execute through query_documents. Missing selected-item anchors remain clarification, not a guessed item.
The runtime answer boundary still makes unsupported or unconfirmed inventory states explicit:
- unsupported inventory route templates get a user-facing "template selected, live execution not yet bridged" answer instead of a generic checked-sources fallback;
must_not_claimforbids presenting inventory planning as executed stock, supplier, purchase, or sale evidence;- technical unsupported-pilot limitation text is filtered out of user-facing lines, while existing bounded unknowns for lifecycle/value-flow remain intact.
The next local scoring step broadened metadata-surface autonomy without adding a new hard domain route:
- if a confirmed metadata surface is unambiguous and only exposes
Document.*,Register.*, orCatalog.*objects, the planner can infer the next reviewed lane even when upstream has not yet filleddownstream_route_family; - inferred document surfaces instantiate
document_evidence; - inferred register/movement surfaces instantiate
movement_evidence; - inferred catalog surfaces instantiate
catalog_drilldown; - mixed or ambiguous surfaces still do not guess and continue through clarification / explicit data-need scoring.
The following consolidation step added catalog-level chain-template scoring:
assistantMcpCatalogIndexcan now score reviewedchain_templatesdirectly from fact family, action family, required axes, comparison, ranking, and aggregation needs;- comparison-shaped value-flow ranks
value_flow_comparisonabove the generic value-flow template; - ranking-shaped value-flow ranks
value_flow_rankingabove the generic value-flow template; - document/movement/inventory/lifecycle templates can now be inspected as catalog search results, not only as local planner branch constants;
assistantMcpDiscoveryPlannerrecords the top catalog chain-template match in reason codes and exposes the ranked matches ascatalog_chain_template_matchesin the planner contract while preserving existing guarded execution behavior.- the ranked chain-template matches are now propagated into runtime loop state and debug attachment fields, so replay analysis can inspect catalog-fabric intent without parsing reason-code strings.
catalog_chain_template_alignmentnow records whether the selected chain is the top catalog match, its rank, and whether it appeared in the catalog search results; runtime loop state and debug summary expose the same verdict.- planner reason codes now emit stable catalog-alignment telemetry for evaluated top-match, selected-equals-top, selected-lower-rank, selected-outside-match-set, and unscored selected-chain states.
catalog_chain_template_alignment.alignment_statusnow carries the same verdict as one enum-like field, and debug summary exposes it asmcp_discovery_catalog_chain_alignment_status.domain_truth_harnessandscenario_acceptance_policynow carry the alignment status, top catalog match, and selected-matches-top flag into replay artifacts instead of leaving them buried in raw debug JSON.- truth-harness now raises a warning finding for
selected_lower_rankandselected_outside_match_setalignment states unless the replay spec explicitly marksallow_catalog_alignment_divergence. - scenario acceptance now groups that warning under
catalog_alignment_ok, andfinal_status.mdprints the invariant alongside direct-answer, temporal, truth-gate, human-answer, meta-context, and selected-object gates. - truth-harness specs can now assert
expected_catalog_alignment_status,expected_catalog_chain_top_match, andexpected_catalog_selected_matches_topon each step. address_truth_harness_phase66_human_org_open_scope_dialog.jsonnow uses those fields to assertvalue_flow,value_flow_comparison, andvalue_flow_rankingtop matches across the open-organization money dialog.address_truth_harness_phase32_planner_selected_chain_end_to_end.jsonnow uses the same assertions across selected-counterparty entity grounding, incoming/outgoing/net value-flow, document evidence, and movement evidence follow-ups.agent_semantic_pack_buildernow preserves these expected catalog-alignment fields in the reusable source catalog and adds theplanner_catalog_alignmenttag, so future mixed AGENT packs can deliberately select planner-brain regression probes instead of relying on hand-picked replay filenames.- The new
turnaround_11_planner_brain_alignment_mixbuilder recipe generatesaddress_truth_harness_phase83_planner_brain_alignment_mix.json, a 20-step mixed canary that crosses selected-counterparty value-flow, open-organization totals/comparison/ranking, broad-evaluation continuity, metadata drilldown, and off-domain living-chat safety. - The phase83 live replay now confirms that selected chains match the reviewed catalog top match across the mixed planner-brain pack and that the business-answer path remains usable after cross-stage pivots.
- Checked-source failure replies now sanitize raw MCP transport/internal continuation strings from the user-facing answer while keeping the raw diagnostics in technical debug payloads.
- Confirmed metadata-surface follow-ups now promote the surface-grounded chain template (
document_evidence,movement_evidence, orcatalog_drilldown) to the top catalog match when the selected chain came from the same checked surface. This keeps the planner's executed route and catalog-alignment diagnostics consistent without allowing ambiguous or stale surfaces to override explicit current-turn data needs.
Why This Matters
This reduces the pressure to add one hard route per user wording.
Future domain enablement should prefer:
- add or strengthen primitive descriptors;
- add or strengthen chain templates;
- let data-need graph and catalog search assemble the path;
- use domain packs to verify the scenario tree and catch semantic drift.
Domain-specific exact recipes can still exist as fast paths, but they should not be the only way the assistant understands a new business question.
Validation
Local validation after the catalog-template, value-flow, metadata-lane scoring, lifecycle bounded-inference, current-turn value-flow arbitration, and broad-evaluation bridge steps:
npm.cmd test -- assistantMcpCatalogIndex.test.ts assistantMcpDiscoveryPlanner.test.ts: passed,47 passed- MCP-discovery suite: passed,
227 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5911 nodes,12830 edges,138 communities - live value-flow canary:
address_truth_harness_phase66_human_org_open_scope_dialog_planner_template_rerun2, accepted7/7 - live metadata movement canary:
address_truth_harness_phase52_metadata_movement_full_recovery_planner_metadata_scoring_rerun2, accepted4/4 - live metadata document canary:
address_truth_harness_phase54_metadata_document_full_recovery_planner_metadata_scoring_rerun2, accepted4/4
Additional code-level consolidation:
- ambiguous metadata surfaces no longer carry both document and movement primitives when the current data-need graph explicitly selects
document_evidenceormovement_evidence; - thin neutral metadata follow-ups still do not force a lane and keep the clarification boundary intact;
- planner reason codes now expose when an explicit lane family is scored against carried metadata ambiguity:
planner_metadata_surface_scored_with_explicit_lane_family.
Latest validation after the lifecycle and arbitration hardening:
- targeted lifecycle/catalog/planner/answer tests: passed,
75 passed,1 skipped - full MCP-discovery suite: passed,
268 passed,9 skipped - broad MCP/living-chat/route/meaning slice: passed,
305 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5912 nodes,12833 edges,138 communities - live lifecycle/value-flow response gate:
address_truth_harness_phase19_mcp_discovery_response_gate_planner_lifecycle_rerun4, accepted8/8 - live broad-eval to net-flow follow-up:
address_truth_harness_phase21_net_followup_after_broad_eval_planner_lifecycle_rerun2, accepted3/3 - live broad-evaluation bridge:
address_truth_harness_phase22_broad_business_evaluation_bridge_planner_lifecycle_rerun2, accepted3/3
Latest validation after the inventory catalog-template lift:
- targeted catalog/data-need/planner/turn-input tests: passed,
139 passed,6 skipped - full MCP-discovery suite: passed,
276 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5912 nodes,12833 edges,138 communities
Latest validation after the inventory runtime-boundary hardening:
- targeted runtime-bridge/answer-adapter/pilot-executor tests: passed,
68 passed,1 skipped - full MCP-discovery suite: passed,
277 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5913 nodes,12837 edges,138 communities
Latest validation after the inventory exact-runtime bridge:
- targeted runtime-bridge/answer-adapter/pilot-executor tests: passed,
70 passed,1 skipped - full MCP-discovery suite: passed,
279 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5930 nodes,12884 edges,135 communities
Latest validation after unambiguous metadata-surface lane inference:
- targeted planner tests: passed,
36 passed - full MCP-discovery suite: passed,
281 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5937 nodes,12899 edges,138 communities - live inventory full-pack attempt:
inventory_stock_exact_bridge_live_20260501_after_runtime_bridge, statuspartial - live attempt interpretation: route/intent/recipe/capability selection matched, but MCP execution failed with
MCP fetch failed: This operation was aborted; direct proxyget_metadataalso timed out while/healthreportedactive_sessions_count=0and pending commands, so this is an infrastructure/polling-session blocker rather than accepted semantic evidence.
Latest validation after catalog chain-template scoring:
- targeted catalog/planner tests: passed,
54 passed - full MCP-discovery suite: passed,
282 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5938 nodes,12903 edges,139 communities
Latest validation after structured catalog chain-template contract exposure:
- targeted planner tests: passed,
36 passed - full MCP-discovery suite: passed,
282 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5939 nodes,12906 edges,138 communities
Latest validation after runtime/debug propagation of structured chain matches:
- targeted runtime/debug tests: passed,
18 passed - full MCP-discovery suite: passed,
282 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5940 nodes,12909 edges,137 communities
Latest validation after subject-aware bidirectional comparison arbitration:
- targeted planner tests: passed,
36 passed - full MCP-discovery suite: passed,
282 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5940 nodes,12909 edges,137 communities
Latest validation after structured catalog chain-template alignment verdict:
- targeted planner/runtime/debug tests: passed,
54 passed - full MCP-discovery suite: passed,
282 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5941 nodes,12911 edges,136 communities
Latest validation after representative catalog-alignment regression guard:
- targeted planner tests: passed,
37 passed - full MCP-discovery suite: passed,
283 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5942 nodes,12912 edges,140 communities
Latest validation after catalog-alignment reason-code telemetry:
- targeted planner/runtime tests: passed,
53 passed - full MCP-discovery suite: passed,
283 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5943 nodes,12915 edges,136 communities
Latest validation after explicit catalog-alignment status propagation:
- targeted planner/runtime/debug tests: passed,
55 passed - full MCP-discovery suite: passed,
283 passed,9 skipped npm.cmd run build: passed- graphify rebuild:
5943 nodes,12915 edges,136 communities
Latest validation after truth-harness catalog-alignment artifact surfacing:
- Python replay-tooling tests: passed,
4 passed - graphify rebuild:
5946 nodes,12918 edges,136 communities
Latest validation after catalog-alignment divergence warning gate:
- Python replay-tooling tests: passed,
5 passed - graphify rebuild:
5947 nodes,12920 edges,138 communities
Latest validation after catalog-alignment acceptance invariant:
- Python replay-tooling tests: passed,
6 passed - graphify rebuild:
5949 nodes,12923 edges,136 communities
Latest validation after catalog-alignment spec assertions:
- Python replay-tooling tests: passed,
7 passed - graphify rebuild:
5951 nodes,12926 edges,139 communities
Latest validation after phase66 catalog-alignment spec hardening:
- Python replay-tooling tests: passed,
7 passed load_truth_harness_specconfirmed the phase66 expected top-match chain sequence:value_flow,value_flow,value_flow,value_flow_comparison,value_flow_comparison,value_flow_ranking,value_flow_ranking
Latest validation after phase32 catalog-alignment spec hardening and AGENT source-catalog surfacing:
- Python replay-tooling tests: passed,
9 passed load_truth_harness_specconfirmed the phase32 expected top-match chain sequence:entity_resolution,value_flow,value_flow,value_flow_comparison,document_evidence,movement_evidenceagent_semantic_pack_builder.py inventoryregeneratedagent_semantic_source_catalog.*with reusableplanner_catalog_alignmentcoverage
Latest validation after phase83 mixed planner-brain spec generation:
scripts.test_agent_semantic_pack_builder: passed,3 passed- generated
address_truth_harness_phase83_planner_brain_alignment_mix.json:20steps,15expected catalog top-match checks after the phase19/21/22 alignment hardening - regenerated
agent_semantic_source_catalog.*:planner_catalog_alignmentis visible with26reusable entries, including phase32, phase66, and phase83 probes - graphify rebuild:
5952 nodes,12927 edges,138 communities
Prior live-readiness diagnosis after phase83 live replay and checked-source error sanitation:
- backend health is green on
http://127.0.0.1:8787/api/health; - proxy health is green on
http://127.0.0.1:6003/health, withpending_commands=0,active_channels_count=1, andactive_sessions_count=0; - targeted checked-source sanitation tests still pass
61/61with1skipped; npm.cmd run buildstill passes;- full phase83 rerun
phase83_planner_brain_alignment_live_20260501_rerun4again endedpartial, with8/20pass,2warning,10fail, andcatalog_alignment_ok=true; - direct proxy
get_metadatawith a 180-second client timeout also timed out, so the remaining live blocker is below the assistant planner/backend layer: the proxy accepts requests, but the 1C side does not return read-only evidence in time; scripts/check_mcp_live_readiness.pynow provides a repo-native preflight that separates backend/proxy health from confirmed live 1C evidence readiness before spending time on a full semantic replay.- graphify rebuild after the readiness preflight/docs sync:
5970 nodes,12958 edges,140 communities.
Prior follow-up diagnosis of the proxy/1C seam:
1cv8cis running locally with theMCP Toolkit - Бухгалтерия предприятия, редакция 2.0window title, so the failure is not simply "1C process absent";- observing a read-only
get_metadatacommand on thedefaultchannel showedpending_commands=1for 15 seconds and no pickup by the 1C client; - the diagnostic command was explicitly drained from
/1c/polland completed through/1c/resultwith a synthetic cancel result so the proxy queue stayed clean; - the proxy health endpoint now exposes polling telemetry:
polling_channels_count,last_poll_at,last_delivered_command_at, and optionalpoll_activity_by_channelwhenHEALTH_INCLUDE_CHANNEL_DETAILS=true; - after proxy restart with this telemetry enabled,
polling_channels_count=0stayed stable for 20 seconds, proving no/1c/pollactivity reached the proxy; scripts/check_mcp_live_readiness.py --confirm-livenow refuses to create a direct live probe when proxy health already proves no 1C polling activity, preventing abandoned pending commands during readiness checks.domain_truth_harness.py run-live --require-mcp-live-readinessnow applies the same readiness gate before the first assistant step, writesmcp_live_readiness.json, and exits early when live 1C evidence is unavailable;- smoke of that harness gate against phase83 stopped before step execution with
ready_for_live_replay=false, so future blocked runs should no longer waste a full semantic replay just to rediscover the missing/1c/poll. - readiness can now wait for polling before probing:
--wait-for-polling-secondsincheck_mcp_live_readiness.pyand--mcp-wait-for-polling-secondsindomain_truth_harness.py run-live; a 2-second smoke waited twice, observed no polling, and skipped the live probe without leaving proxy queue garbage.
Latest validation after guarded phase83 acceptance and surface-grounded catalog promotion:
- targeted planner/response-policy/pilot/continuity slice:
npm.cmd test -- assistantMcpDiscoveryPlanner.test.ts assistantMcpDiscoveryResponsePolicy.test.ts assistantMcpDiscoveryPilotExecutor.test.ts assistantContinuityPolicy.test.tspassed109/109; npm.cmd run build: passed;- graphify rebuild:
5973 nodes,12971 edges,138 communities; - live-readiness preflight after backend restart:
mcp_live_readiness_phase83_rerun3_after_backend_restart.jsonreportedready; - full guarded phase83 replay:
phase83_planner_brain_alignment_live_20260501_readygate_rerun3accepted20/20,0warnings,0failures; - final invariant result:
catalog_alignment_ok=true,direct_answer_ok=true,temporal_honesty_ok=true,selected_object_continuity_ok=true,truth_gate_ok=true,human_answer_quality_ok=true, andmeta_context_integrity_ok=true; - the previously warning step
step_02_neutral_followup_catalog_drilldownnow reportscatalog_alignment_status=selected_matches_top,catalog_top_match=catalog_drilldown, andcatalog_selected_matches_top=True. - saved autorun canary:
AGENT | Planner Autonomy phase83: мозг маршрутов, pivots и legacy continuity(gen-ag05011759-6f85fc), sourced from the accepted phase83 spec after the live replay was reviewed.
Next Step
The declared Planner Autonomy Consolidation slice is now closed for the phase83 acceptance target.
Keep using the live preflight before future full replays:
python scripts/check_mcp_live_readiness.py --confirm-live --wait-for-polling-seconds 60 --poll-interval-seconds 2 --output-json artifacts/runtime/mcp_live_readiness_phase83.json
Run future full candidates with the built-in gate:
python scripts/domain_truth_harness.py run-live --spec docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json --output-dir artifacts/domain_runs/phase83_planner_brain_alignment_live_<stamp> --require-mcp-live-readiness --mcp-wait-for-polling-seconds 60 --mcp-poll-interval-seconds 2
Only when readiness reports ready_for_live_replay=true should a full replay be treated as meaningful business-evidence proof. If it reports no /1c/poll activity, fix the 1C toolkit client/session/channel first; another full replay will only reproduce checked-source partial answers.
Recommended order:
- save the accepted phase83 pack into autoruns only if the product flow needs it as a legacy AGENT canary;
- continue broader open-world bounded autonomy with phase83 as a regression gate, not as an open blocker;
- broaden catalog scoring into unfamiliar 1C asks where metadata surface and data-need graph can pick reviewed lanes;
- grow primitive descriptors only where live replay shows a real evidence gap;
- keep phase19, phase21, phase22, value-flow, metadata ambiguity, inventory-stock, and phase83 as regression gates.
The key rule remains:
- do not hide a domain workaround inside the planner;
- promote repeated successful domain behavior into a reviewed primitive or chain template.