Dumb-1.1
It is an AI model that balances Dumb-1.1-Rc-1.
You can tell more detailed lies, and the details of the benchmark are written.
benchmark result
| Benchmark | Metis-1.5 (898M/A340M) | Dumb-1.1 (18.9M) | Ant (10M) |
|---|---|---|---|
| MMLU (acc) | 23.60% | 23.22% | 25.43% |
| HellaSwag (acc_norm) | 30.40% | 25.74% | 26.72% |
| ARC-Easy (acc_norm) | 41.30% | 30.35% | 25.42% |
| PIQA (acc_norm) | 54.70% | 51.74% | 50.72% |
| BoolQ (acc) | 47.70% | 62.17% | 37.82% |
| SciQ (acc_norm) | - | 32.10% | 21.50% |
| ARC-Challenge (acc_norm) | 25.90% | 20.73% | - |
| WinoGrande (acc) | 51.50% | - | 49.64% |
| OpenBookQA (acc_norm) | 29.60% | 25.20% | - |
- Downloads last month
- 138
Evaluation results
- MMLU (Overall) on MMLUtest set self-reported23.220
- Humanities (Category) on MMLUtest set self-reported24.310
- Social Sciences (Category) on MMLUtest set self-reported21.740
- STEM (Category) on MMLUtest set self-reported22.330
- Other (Category) on MMLUtest set self-reported23.910
- Subtask - Formal Logic on MMLUtest set self-reported34.130
- Subtask - High School European History on MMLUtest set self-reported21.820
- Subtask - High School US History on MMLUtest set self-reported25.000
