Dumb-1.2-Exp-0616

——a new dumb model!

training time: ~2 hour

"dumb" name

I don't have enough money to create a real SoTA. However, if you have the wisdom and courage to forge "dumb", you can create a small dumb LLM.

Explanation of the architecture of "Dumb 1.2"

In fact, I have progressed from Dumb 1 to Dumb 1.2. This model uses the following complex and short context-specific architecture. Dumb 1.2 has a 34.611M parameter and is not suitable for complex things. However, it is possible to make interesting jokes as a strange AI that does not listen to simple jokes and stories. In some benchmarks, it is better than the previous model despite being a test version, and it is especially excellent in PiQA.

Comparison with other models

It is comparable to the model that is more than 1.5 times larger than the MMLU. In many other benchmarks, there is a difference compared to the competition, but it still shows that the model size is 1.5 times more than 1.5 times, and the Dumb 1.1 is about 3 times smaller than the competition, and the model works even on low-performance PCs. In ARC-Easy, there is a slight challenge in thinking ability at around 65% of the SoTA of the same size of LLM.

In addition, in ARC-Challenge, the trial model has improved by 2% compared to the previous official model. This is great, and there is a high probability that the performance will be better in the official version and mid-term and late previews.

Downloads last month: 46

Safetensors

Model size

34.6M params

Tensor type

F32

56m
/

Dumb-1.2-Exp-0616

Dumb-1.2-Exp-0616

"dumb" name

Explanation of the architecture of "Dumb 1.2"

Comparison with other models

Spaces using 56m/Dumb-1.2-Exp-0616 2