Pretrained baselines for sequence modeling research.
-
puigde/gated-deltanet-360M-15B-slimpajama
Text Generation • 0.4B • Updated • 431 -
puigde/rwkv7-380M-15B-slimpajama
Text Generation • 0.4B • Updated • 733 -
puigde/modern-transformer-gqa-370M-15B-slimpajama
Text Generation • 0.4B • Updated • 158 -
puigde/modern-transformer-mha-370M-15B-slimpajama
Text Generation • 0.4B • Updated • 93