NODEDC_1C/llm_normalizer/reports/assistant-compare-u32A9CuI.md

783 B

Assistant Stage 1 Baseline vs Current

  • comparison_id: assistant-compare-u32A9CuI
  • baseline_run_id: assistant-stage1-TnhIS4Qc6e
  • current_run_id: assistant-stage1-ZQ5JDEUUkW
  • suite_version: 0.1.0

Metric Deltas

Metric Baseline Current Delta Trend
retrieval_differentiation_rate 0.67 0.67 0 unchanged
generic_explanation_rate 0.89 0.78 -0.11 improved
accountant_actionability_score 0.67 2.33 1.66 improved
false_confidence_rate 0.33 0.33 0 unchanged
broad_answer_rate 0.25 0.25 0 unchanged
mechanism_specificity_score 0 0 0 unchanged
followup_context_retention_score 3 3 0 unchanged

Scenario Notes Summary

  • improved: 8
  • unchanged: 1
  • weakened: 0