tencent/Sequential-Hidden-Decoding-8B-n8-Instruct Text Generation • 13B • Updated 27 days ago • 125 • 8
Running on CPU Upgrade Featured 3.13k The Smol Training Playbook 📚 3.13k The secrets to building world-class LLMs
Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset Paper • 2508.15096 • Published Aug 20, 2025 • 8
deepseek-ai/DeepSeek-V3-0324 Text Generation • 685B • Updated Mar 27, 2025 • 651k • • 3.11k
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2, 2024 • 108
NousResearch/Nous-Hermes-2-Mistral-7B-DPO Text Generation • 7B • Updated Apr 30, 2024 • 1.43k • 218
Teaching Large Language Models to Reason with Reinforcement Learning Paper • 2403.04642 • Published Mar 7, 2024 • 48
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6, 2024 • 190
HuggingFaceH4/zephyr-7b-alpha Text Generation • 7B • Updated Oct 16, 2024 • 5.36k • • 1.12k