NODEDC_1C/llm_normalizer/reports/assistant-compare-up7MQKOB.md

783 B

Assistant Stage 1 Baseline vs Current

  • comparison_id: assistant-compare-up7MQKOB
  • baseline_run_id: assistant-stage1-3cX10PNH8P
  • current_run_id: assistant-stage1-0XLmwlePaE
  • suite_version: 0.1.0

Metric Deltas

Metric Baseline Current Delta Trend
retrieval_differentiation_rate 0.67 0.67 0 unchanged
generic_explanation_rate 0.78 0.78 0 unchanged
accountant_actionability_score 0.78 2.67 1.89 improved
false_confidence_rate 0.33 0.22 -0.11 improved
broad_answer_rate 0.25 0.25 0 unchanged
mechanism_specificity_score 0 0 0 unchanged
followup_context_retention_score 3 3 0 unchanged

Scenario Notes Summary

  • improved: 8
  • unchanged: 1
  • weakened: 0