25 KiB

Raw Blame History

20 - Planner Autonomy Consolidation (2026-05-01)

Purpose

This note starts the consolidation layer after the first accepted inventory-stock breadth proof.

The goal is to move from:

domain pack proves one more slice;
planner still carries too many local recipe branches;

to:

reusable MCP primitive and chain descriptors;
planner-selected route fabric;
domain packs as semantic gates, not as the main design mechanism.

This is the continuation of the original "MCP as bounded brain" goal.

Architectural Reading

The target is not an unrestricted model agent.

The target remains:

user question -> data_need_graph -> catalog chain template -> reviewed primitives -> bounded evidence loop -> truth gate -> answer

The LLM may help choose the path, but only inside reviewed MCP boundaries.

Code Steps

The first consolidation step adds reusable chain templates to assistantMcpCatalogIndex.

The catalog now describes not only primitive contracts, but also planner route-fabric templates:

metadata_inspection
catalog_drilldown
entity_resolution
document_evidence
movement_evidence
value_flow
value_flow_comparison
value_flow_ranking
lifecycle

Each template declares:

semantic data need;
human-readable chain summary;
fallback primitive sequence;
base required axes;
supported fact/action families;
planning tags;
evidence-gate requirement.

The planner now instantiates selected evidence chains from this catalog for the first base lanes instead of keeping all route meaning only in local planner branches.

The follow-up consolidation step moved the value-flow planner seams onto the same catalog fabric:

bidirectional incoming-vs-outgoing comparison now instantiates value_flow_comparison, including explicit-counterparty comparison graphs rather than only subjectless organization-scope graphs;
ranked revenue/payment questions now instantiate value_flow_ranking;
organization-scoped open totals now instantiate value_flow with subjectless primitives but catalog-owned axes and evidence-gate semantics;
heuristic fallback routes for value-flow, lifecycle, metadata, movement, document, entity, and unclassified metadata inspection now also use catalog chain templates.

This keeps behavior stable while making the planner's route meaning inspectable through catalog descriptors instead of only through local recipeFor() branches.

The next consolidation step strengthened lifecycle as a bounded inference chain instead of a loose age-like shortcut:

the lifecycle template now declares activity_window and legal_fact_boundary axes;
the template summary explicitly frames the result as a first/latest confirmed 1C activity window, not legal registration age;
planner graph and fallback recipes now emit lifecycle bounded-inference reason codes;
lifecycle evidence facts include the matched row count, first/latest confirmed activity dates, and an explicit legal-fact boundary.

Two arbitration seams were also hardened because they are part of the same planner-autonomy surface:

current-turn value-flow aggregate questions can override supported exact legacy routes when the user asks for amount/net/payment totals and the exact route would only produce a narrower lookup/list answer;
broad business evaluation (broad_business_evaluation) is intentionally kept in the deterministic living-chat bridge instead of being displaced by generic metadata discovery.

These changes keep the route fabric broader without letting the planner pretend that inferred evidence is a formally proven legal fact.

The following consolidation step promoted the accepted inventory-stock breadth behavior into reviewed catalog route fabric:

inventory_stock_snapshot
inventory_supplier_overlap
inventory_purchase_provenance
inventory_sale_trace

These templates are now first-class catalog chain descriptors and can be selected by the data-need graph/planner. They reuse reviewed generic primitives (query_movements, query_documents, aggregate_by_axis, drilldown_related_objects, probe_coverage, explain_evidence_basis) and add inventory-specific axes such as as_of_date, warehouse, supplier, buyer, quantity, and evidence_basis.

The first runtime bridge for these inventory templates now delegates through existing exact inventory recipes instead of inventing a new generic inventory executor:

inventory_stock_snapshot -> inventory_on_hand_as_of_date
inventory_supplier_overlap -> inventory_supplier_stock_overlap_as_of_date
inventory_purchase_provenance -> inventory_purchase_provenance_for_item
inventory_sale_trace -> inventory_sale_trace_for_item

The bridge keeps the reviewed MCP route fabric as the planner surface, but uses addressRecipeCatalog exact queries and account scope 41.01 as the evidence source. Root inventory templates execute through query_movements; selected-item provenance/sale templates execute through query_documents. Missing selected-item anchors remain clarification, not a guessed item.

The runtime answer boundary still makes unsupported or unconfirmed inventory states explicit:

unsupported inventory route templates get a user-facing "template selected, live execution not yet bridged" answer instead of a generic checked-sources fallback;
must_not_claim forbids presenting inventory planning as executed stock, supplier, purchase, or sale evidence;
technical unsupported-pilot limitation text is filtered out of user-facing lines, while existing bounded unknowns for lifecycle/value-flow remain intact.

The next local scoring step broadened metadata-surface autonomy without adding a new hard domain route:

if a confirmed metadata surface is unambiguous and only exposes Document.*, Register.*, or Catalog.* objects, the planner can infer the next reviewed lane even when upstream has not yet filled downstream_route_family;
inferred document surfaces instantiate document_evidence;
inferred register/movement surfaces instantiate movement_evidence;
inferred catalog surfaces instantiate catalog_drilldown;
mixed or ambiguous surfaces still do not guess and continue through clarification / explicit data-need scoring.

The following consolidation step added catalog-level chain-template scoring:

assistantMcpCatalogIndex can now score reviewed chain_templates directly from fact family, action family, required axes, comparison, ranking, and aggregation needs;
comparison-shaped value-flow ranks value_flow_comparison above the generic value-flow template;
ranking-shaped value-flow ranks value_flow_ranking above the generic value-flow template;
document/movement/inventory/lifecycle templates can now be inspected as catalog search results, not only as local planner branch constants;
assistantMcpDiscoveryPlanner records the top catalog chain-template match in reason codes and exposes the ranked matches as catalog_chain_template_matches in the planner contract while preserving existing guarded execution behavior.
the ranked chain-template matches are now propagated into runtime loop state and debug attachment fields, so replay analysis can inspect catalog-fabric intent without parsing reason-code strings.
catalog_chain_template_alignment now records whether the selected chain is the top catalog match, its rank, and whether it appeared in the catalog search results; runtime loop state and debug summary expose the same verdict.
planner reason codes now emit stable catalog-alignment telemetry for evaluated top-match, selected-equals-top, selected-lower-rank, selected-outside-match-set, and unscored selected-chain states.
catalog_chain_template_alignment.alignment_status now carries the same verdict as one enum-like field, and debug summary exposes it as mcp_discovery_catalog_chain_alignment_status.
domain_truth_harness and scenario_acceptance_policy now carry the alignment status, top catalog match, and selected-matches-top flag into replay artifacts instead of leaving them buried in raw debug JSON.
truth-harness now raises a warning finding for selected_lower_rank and selected_outside_match_set alignment states unless the replay spec explicitly marks allow_catalog_alignment_divergence.
scenario acceptance now groups that warning under catalog_alignment_ok, and final_status.md prints the invariant alongside direct-answer, temporal, truth-gate, human-answer, meta-context, and selected-object gates.
truth-harness specs can now assert expected_catalog_alignment_status, expected_catalog_chain_top_match, and expected_catalog_selected_matches_top on each step.
address_truth_harness_phase66_human_org_open_scope_dialog.json now uses those fields to assert value_flow, value_flow_comparison, and value_flow_ranking top matches across the open-organization money dialog.
address_truth_harness_phase32_planner_selected_chain_end_to_end.json now uses the same assertions across selected-counterparty entity grounding, incoming/outgoing/net value-flow, document evidence, and movement evidence follow-ups.
agent_semantic_pack_builder now preserves these expected catalog-alignment fields in the reusable source catalog and adds the planner_catalog_alignment tag, so future mixed AGENT packs can deliberately select planner-brain regression probes instead of relying on hand-picked replay filenames.
The new turnaround_11_planner_brain_alignment_mix builder recipe generates address_truth_harness_phase83_planner_brain_alignment_mix.json, a 20-step mixed canary that crosses selected-counterparty value-flow, open-organization totals/comparison/ranking, broad-evaluation continuity, metadata drilldown, and off-domain living-chat safety.
The phase83 live replay now confirms that selected chains match the reviewed catalog top match across the mixed planner-brain pack and that the business-answer path remains usable after cross-stage pivots.
Checked-source failure replies now sanitize raw MCP transport/internal continuation strings from the user-facing answer while keeping the raw diagnostics in technical debug payloads.
Confirmed metadata-surface follow-ups now promote the surface-grounded chain template (document_evidence, movement_evidence, or catalog_drilldown) to the top catalog match when the selected chain came from the same checked surface. This keeps the planner's executed route and catalog-alignment diagnostics consistent without allowing ambiguous or stale surfaces to override explicit current-turn data needs.

Why This Matters

This reduces the pressure to add one hard route per user wording.

Future domain enablement should prefer:

add or strengthen primitive descriptors;
add or strengthen chain templates;
let data-need graph and catalog search assemble the path;
use domain packs to verify the scenario tree and catch semantic drift.

Domain-specific exact recipes can still exist as fast paths, but they should not be the only way the assistant understands a new business question.

Validation

Local validation after the catalog-template, value-flow, metadata-lane scoring, lifecycle bounded-inference, current-turn value-flow arbitration, and broad-evaluation bridge steps:

npm.cmd test -- assistantMcpCatalogIndex.test.ts assistantMcpDiscoveryPlanner.test.ts: passed, 47 passed
MCP-discovery suite: passed, 227 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5911 nodes, 12830 edges, 138 communities
live value-flow canary: address_truth_harness_phase66_human_org_open_scope_dialog_planner_template_rerun2, accepted 7/7
live metadata movement canary: address_truth_harness_phase52_metadata_movement_full_recovery_planner_metadata_scoring_rerun2, accepted 4/4
live metadata document canary: address_truth_harness_phase54_metadata_document_full_recovery_planner_metadata_scoring_rerun2, accepted 4/4

Additional code-level consolidation:

ambiguous metadata surfaces no longer carry both document and movement primitives when the current data-need graph explicitly selects document_evidence or movement_evidence;
thin neutral metadata follow-ups still do not force a lane and keep the clarification boundary intact;
planner reason codes now expose when an explicit lane family is scored against carried metadata ambiguity: planner_metadata_surface_scored_with_explicit_lane_family.

Latest validation after the lifecycle and arbitration hardening:

targeted lifecycle/catalog/planner/answer tests: passed, 75 passed, 1 skipped
full MCP-discovery suite: passed, 268 passed, 9 skipped
broad MCP/living-chat/route/meaning slice: passed, 305 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5912 nodes, 12833 edges, 138 communities
live lifecycle/value-flow response gate: address_truth_harness_phase19_mcp_discovery_response_gate_planner_lifecycle_rerun4, accepted 8/8
live broad-eval to net-flow follow-up: address_truth_harness_phase21_net_followup_after_broad_eval_planner_lifecycle_rerun2, accepted 3/3
live broad-evaluation bridge: address_truth_harness_phase22_broad_business_evaluation_bridge_planner_lifecycle_rerun2, accepted 3/3

Latest validation after the inventory catalog-template lift:

targeted catalog/data-need/planner/turn-input tests: passed, 139 passed, 6 skipped
full MCP-discovery suite: passed, 276 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5912 nodes, 12833 edges, 138 communities

Latest validation after the inventory runtime-boundary hardening:

targeted runtime-bridge/answer-adapter/pilot-executor tests: passed, 68 passed, 1 skipped
full MCP-discovery suite: passed, 277 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5913 nodes, 12837 edges, 138 communities

Latest validation after the inventory exact-runtime bridge:

targeted runtime-bridge/answer-adapter/pilot-executor tests: passed, 70 passed, 1 skipped
full MCP-discovery suite: passed, 279 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5930 nodes, 12884 edges, 135 communities

Latest validation after unambiguous metadata-surface lane inference:

targeted planner tests: passed, 36 passed
full MCP-discovery suite: passed, 281 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5937 nodes, 12899 edges, 138 communities
live inventory full-pack attempt: inventory_stock_exact_bridge_live_20260501_after_runtime_bridge, status partial
live attempt interpretation: route/intent/recipe/capability selection matched, but MCP execution failed with MCP fetch failed: This operation was aborted; direct proxy get_metadata also timed out while /health reported active_sessions_count=0 and pending commands, so this is an infrastructure/polling-session blocker rather than accepted semantic evidence.

Latest validation after catalog chain-template scoring:

targeted catalog/planner tests: passed, 54 passed
full MCP-discovery suite: passed, 282 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5938 nodes, 12903 edges, 139 communities

Latest validation after structured catalog chain-template contract exposure:

targeted planner tests: passed, 36 passed
full MCP-discovery suite: passed, 282 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5939 nodes, 12906 edges, 138 communities

Latest validation after runtime/debug propagation of structured chain matches:

targeted runtime/debug tests: passed, 18 passed
full MCP-discovery suite: passed, 282 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5940 nodes, 12909 edges, 137 communities

Latest validation after subject-aware bidirectional comparison arbitration:

targeted planner tests: passed, 36 passed
full MCP-discovery suite: passed, 282 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5940 nodes, 12909 edges, 137 communities

Latest validation after structured catalog chain-template alignment verdict:

targeted planner/runtime/debug tests: passed, 54 passed
full MCP-discovery suite: passed, 282 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5941 nodes, 12911 edges, 136 communities

Latest validation after representative catalog-alignment regression guard:

targeted planner tests: passed, 37 passed
full MCP-discovery suite: passed, 283 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5942 nodes, 12912 edges, 140 communities

Latest validation after catalog-alignment reason-code telemetry:

targeted planner/runtime tests: passed, 53 passed
full MCP-discovery suite: passed, 283 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5943 nodes, 12915 edges, 136 communities

Latest validation after explicit catalog-alignment status propagation:

targeted planner/runtime/debug tests: passed, 55 passed
full MCP-discovery suite: passed, 283 passed, 9 skipped
npm.cmd run build: passed
graphify rebuild: 5943 nodes, 12915 edges, 136 communities

Latest validation after truth-harness catalog-alignment artifact surfacing:

Python replay-tooling tests: passed, 4 passed
graphify rebuild: 5946 nodes, 12918 edges, 136 communities

Latest validation after catalog-alignment divergence warning gate:

Python replay-tooling tests: passed, 5 passed
graphify rebuild: 5947 nodes, 12920 edges, 138 communities

Latest validation after catalog-alignment acceptance invariant:

Python replay-tooling tests: passed, 6 passed
graphify rebuild: 5949 nodes, 12923 edges, 136 communities

Latest validation after catalog-alignment spec assertions:

Python replay-tooling tests: passed, 7 passed
graphify rebuild: 5951 nodes, 12926 edges, 139 communities

Latest validation after phase66 catalog-alignment spec hardening:

Python replay-tooling tests: passed, 7 passed
load_truth_harness_spec confirmed the phase66 expected top-match chain sequence: value_flow, value_flow, value_flow, value_flow_comparison, value_flow_comparison, value_flow_ranking, value_flow_ranking

Latest validation after phase32 catalog-alignment spec hardening and AGENT source-catalog surfacing:

Python replay-tooling tests: passed, 9 passed
load_truth_harness_spec confirmed the phase32 expected top-match chain sequence: entity_resolution, value_flow, value_flow, value_flow_comparison, document_evidence, movement_evidence
agent_semantic_pack_builder.py inventory regenerated agent_semantic_source_catalog.* with reusable planner_catalog_alignment coverage

Latest validation after phase83 mixed planner-brain spec generation:

scripts.test_agent_semantic_pack_builder: passed, 3 passed
generated address_truth_harness_phase83_planner_brain_alignment_mix.json: 20 steps, 15 expected catalog top-match checks after the phase19/21/22 alignment hardening
regenerated agent_semantic_source_catalog.*: planner_catalog_alignment is visible with 26 reusable entries, including phase32, phase66, and phase83 probes
graphify rebuild: 5952 nodes, 12927 edges, 138 communities

Prior live-readiness diagnosis after phase83 live replay and checked-source error sanitation:

backend health is green on http://127.0.0.1:8787/api/health;
proxy health is green on http://127.0.0.1:6003/health, with pending_commands=0, active_channels_count=1, and active_sessions_count=0;
targeted checked-source sanitation tests still pass 61/61 with 1 skipped;
npm.cmd run build still passes;
full phase83 rerun phase83_planner_brain_alignment_live_20260501_rerun4 again ended partial, with 8/20 pass, 2 warning, 10 fail, and catalog_alignment_ok=true;
direct proxy get_metadata with a 180-second client timeout also timed out, so the remaining live blocker is below the assistant planner/backend layer: the proxy accepts requests, but the 1C side does not return read-only evidence in time;
scripts/check_mcp_live_readiness.py now provides a repo-native preflight that separates backend/proxy health from confirmed live 1C evidence readiness before spending time on a full semantic replay.
graphify rebuild after the readiness preflight/docs sync: 5970 nodes, 12958 edges, 140 communities.

Prior follow-up diagnosis of the proxy/1C seam:

1cv8c is running locally with the MCP Toolkit - Бухгалтерия предприятия, редакция 2.0 window title, so the failure is not simply "1C process absent";
observing a read-only get_metadata command on the default channel showed pending_commands=1 for 15 seconds and no pickup by the 1C client;
the diagnostic command was explicitly drained from /1c/poll and completed through /1c/result with a synthetic cancel result so the proxy queue stayed clean;
the proxy health endpoint now exposes polling telemetry: polling_channels_count, last_poll_at, last_delivered_command_at, and optional poll_activity_by_channel when HEALTH_INCLUDE_CHANNEL_DETAILS=true;
after proxy restart with this telemetry enabled, polling_channels_count=0 stayed stable for 20 seconds, proving no /1c/poll activity reached the proxy;
scripts/check_mcp_live_readiness.py --confirm-live now refuses to create a direct live probe when proxy health already proves no 1C polling activity, preventing abandoned pending commands during readiness checks.
domain_truth_harness.py run-live --require-mcp-live-readiness now applies the same readiness gate before the first assistant step, writes mcp_live_readiness.json, and exits early when live 1C evidence is unavailable;
smoke of that harness gate against phase83 stopped before step execution with ready_for_live_replay=false, so future blocked runs should no longer waste a full semantic replay just to rediscover the missing /1c/poll.
readiness can now wait for polling before probing: --wait-for-polling-seconds in check_mcp_live_readiness.py and --mcp-wait-for-polling-seconds in domain_truth_harness.py run-live; a 2-second smoke waited twice, observed no polling, and skipped the live probe without leaving proxy queue garbage.

Latest validation after guarded phase83 acceptance and surface-grounded catalog promotion:

targeted planner/response-policy/pilot/continuity slice: npm.cmd test -- assistantMcpDiscoveryPlanner.test.ts assistantMcpDiscoveryResponsePolicy.test.ts assistantMcpDiscoveryPilotExecutor.test.ts assistantContinuityPolicy.test.ts passed 109/109;
npm.cmd run build: passed;
graphify rebuild: 5973 nodes, 12971 edges, 138 communities;
live-readiness preflight after backend restart: mcp_live_readiness_phase83_rerun3_after_backend_restart.json reported ready;
full guarded phase83 replay: phase83_planner_brain_alignment_live_20260501_readygate_rerun3 accepted 20/20, 0 warnings, 0 failures;
final invariant result: catalog_alignment_ok=true, direct_answer_ok=true, temporal_honesty_ok=true, selected_object_continuity_ok=true, truth_gate_ok=true, human_answer_quality_ok=true, and meta_context_integrity_ok=true;
the previously warning step step_02_neutral_followup_catalog_drilldown now reports catalog_alignment_status=selected_matches_top, catalog_top_match=catalog_drilldown, and catalog_selected_matches_top=True.
saved autorun canary: AGENT | Planner Autonomy phase83: мозг маршрутов, pivots и legacy continuity (gen-ag05011759-6f85fc), sourced from the accepted phase83 spec after the live replay was reviewed.

Next Step

The declared Planner Autonomy Consolidation slice is now closed for the phase83 acceptance target.

Keep using the live preflight before future full replays:

python scripts/check_mcp_live_readiness.py --confirm-live --wait-for-polling-seconds 60 --poll-interval-seconds 2 --output-json artifacts/runtime/mcp_live_readiness_phase83.json

Run future full candidates with the built-in gate:

python scripts/domain_truth_harness.py run-live --spec docs/orchestration/address_truth_harness_phase83_planner_brain_alignment_mix.json --output-dir artifacts/domain_runs/phase83_planner_brain_alignment_live_<stamp> --require-mcp-live-readiness --mcp-wait-for-polling-seconds 60 --mcp-poll-interval-seconds 2

Only when readiness reports ready_for_live_replay=true should a full replay be treated as meaningful business-evidence proof. If it reports no /1c/poll activity, fix the 1C toolkit client/session/channel first; another full replay will only reproduce checked-source partial answers.

Recommended order:

save the accepted phase83 pack into autoruns only if the product flow needs it as a legacy AGENT canary;
continue broader open-world bounded autonomy with phase83 as a regression gate, not as an open blocker;
broaden catalog scoring into unfamiliar 1C asks where metadata surface and data-need graph can pick reviewed lanes;
grow primitive descriptors only where live replay shows a real evidence gap;
keep phase19, phase21, phase22, value-flow, metadata ambiguity, inventory-stock, and phase83 as regression gates.

The key rule remains:

do not hide a domain workaround inside the planner;
promote repeated successful domain behavior into a reviewed primitive or chain template.

25 KiB Raw Blame History