25 lines
783 B
Markdown
25 lines
783 B
Markdown
# Assistant Stage 1 Baseline vs Current
|
|
|
|
- comparison_id: assistant-compare-OP6wunk-
|
|
- baseline_run_id: assistant-stage1-Isj2Tscs4g
|
|
- current_run_id: assistant-stage1-40oxOKsgjV
|
|
- suite_version: 0.1.0
|
|
|
|
## Metric Deltas
|
|
|
|
| Metric | Baseline | Current | Delta | Trend |
|
|
|---|---:|---:|---:|---|
|
|
| retrieval_differentiation_rate | 0.67 | 0.67 | 0 | unchanged |
|
|
| generic_explanation_rate | 0.89 | 0.78 | -0.11 | improved |
|
|
| accountant_actionability_score | 0.67 | 2.33 | 1.66 | improved |
|
|
| false_confidence_rate | 0.33 | 0.33 | 0 | unchanged |
|
|
| broad_answer_rate | 0.25 | 0.25 | 0 | unchanged |
|
|
| mechanism_specificity_score | 0 | 0 | 0 | unchanged |
|
|
| followup_context_retention_score | 3 | 3 | 0 | unchanged |
|
|
|
|
## Scenario Notes Summary
|
|
|
|
- improved: 8
|
|
- unchanged: 1
|
|
- weakened: 0
|