First Benchmark

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1.0 none 0 acc ↑ 0.5360 ± 0.0223
none 0 acc_norm ↑ 0.5580 ± 0.0222
gpqa_diamond_zeroshot 2.2 none 0 acc ↑ 0.4495 ± 0.0354
none 0 acc_norm ↑ 0.4495 ± 0.0354
gsm8k 3.0 flexible-extract 5 exact_match ↑ 0.8300 ± 0.0168
strict-match 5 exact_match ↑ 0.8340 ± 0.0167
gsm8k_cot 3.0 flexible-extract 8 exact_match ↑ 0.8440 ± 0.0162
strict-match 8 exact_match ↑ 0.7560 ± 0.0192

Detailed Benchmark

Tasks Version Filter n-shot Metric Value Stderr
stem 2.0 none 0 acc ↑ 0.7707 ± 0.0072
- abstract_algebra 1.0 none 0 acc ↑ 0.6600 ± 0.0476
- anatomy 1.0 none 0 acc ↑ 0.7778 ± 0.0359
- astronomy 1.0 none 0 acc ↑ 0.9211 ± 0.0219
- college_biology 1.0 none 0 acc ↑ 0.9375 ± 0.0202
- college_chemistry 1.0 none 0 acc ↑ 0.6000 ± 0.0492
- college_computer_science 1.0 none 0 acc ↑ 0.7800 ± 0.0416
- college_mathematics 1.0 none 0 acc ↑ 0.5900 ± 0.0494
- college_physics 1.0 none 0 acc ↑ 0.6373 ± 0.0478
- computer_security 1.0 none 0 acc ↑ 0.8500 ± 0.0359
- conceptual_physics 1.0 none 0 acc ↑ 0.8766 ± 0.0215
- electrical_engineering 1.0 none 0 acc ↑ 0.8000 ± 0.0333
- elementary_mathematics 1.0 none 0 acc ↑ 0.7593 ± 0.0220
- high_school_biology 1.0 none 0 acc ↑ 0.9452 ± 0.0130
- high_school_chemistry 1.0 none 0 acc ↑ 0.7685 ± 0.0297
- high_school_computer_science 1.0 none 0 acc ↑ 0.8900 ± 0.0314
- high_school_mathematics 1.0 none 0 acc ↑ 0.5148 ± 0.0305
- high_school_physics 1.0 none 0 acc ↑ 0.7152 ± 0.0368
- high_school_statistics 1.0 none 0 acc ↑ 0.7917 ± 0.0277
- machine_learning 1.0 none 0 acc ↑ 0.6429 ± 0.0455
gpqa_diamond_cot_zeroshot 2.2 flexible-extract 0 exact_match ↑ 0.1869 ± 0.0278
strict-match 0 exact_match ↑ 0.0000 ± 0.0000
gpqa_diamond_zeroshot 2.2 none 0 acc ↑ 0.4444 ± 0.0354
none 0 acc_norm ↑ 0.4444 ± 0.0354
gsm8k 3.0 flexible-extract 5 exact_match ↑ 0.8280 ± 0.0169
strict-match 5 exact_match ↑ 0.8320 ± 0.0167
gsm8k_cot 3.0 flexible-extract 8 exact_match ↑ 0.8460 ± 0.0162
strict-match 8 exact_match ↑ 0.7600 ± 0.0191
Downloads last month
-
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Muhammadreza/OpenMythos-9B-1M-heretic

Finetuned
Qwen/Qwen3.5-9B
Finetuned
(446)
this model