783 B
783 B
Assistant Stage 1 Baseline vs Current
- comparison_id: assistant-compare-up7MQKOB
- baseline_run_id: assistant-stage1-3cX10PNH8P
- current_run_id: assistant-stage1-0XLmwlePaE
- suite_version: 0.1.0
Metric Deltas
| Metric | Baseline | Current | Delta | Trend |
|---|---|---|---|---|
| retrieval_differentiation_rate | 0.67 | 0.67 | 0 | unchanged |
| generic_explanation_rate | 0.78 | 0.78 | 0 | unchanged |
| accountant_actionability_score | 0.78 | 2.67 | 1.89 | improved |
| false_confidence_rate | 0.33 | 0.22 | -0.11 | improved |
| broad_answer_rate | 0.25 | 0.25 | 0 | unchanged |
| mechanism_specificity_score | 0 | 0 | 0 | unchanged |
| followup_context_retention_score | 3 | 3 | 0 | unchanged |
Scenario Notes Summary
- improved: 8
- unchanged: 1
- weakened: 0