Buckets:
| Name | Size | Uploaded | Xet hash |
|---|---|---|---|
| figures | 2 items | ||
| wechat-assets | 1 items | ||
| .gitattributes | 1.58 kB xet | d69235cb | |
| LICENSE | 1.06 kB xet | 3087be42 | |
| README.md | 2.86 kB xet | d6a14f5f |
LongCat-2.0
Model Introduction
We introduce LongCat-2.0, a large-scale MoE language model with 1.6 trillion total parameters and ~48 billion activated per token — a substantial step up from previous LongCat models, accompanied by several architectural improvements.
Both the full training run and the large-scale deployment are built entirely on AI ASIC superpods. Pretraining spans millions of accelerator-hours across more than 35 trillion tokens, with no rollbacks or irrecoverable loss spikes — demonstrating that we have the capability to conduct frontier-scale training on alternative hardware platforms.
To strengthen the model on long-horizon tasks, we introduce LongCat Sparse Attention and train LongCat-2.0 on hundreds of billions of tokens of 1M-context data. Together with dedicated post-training, this gives LongCat-2.0 strong performance on coding and agentic tasks.
🏋️ Model weights coming soon — stay tuned!
- Total size
- 327 kB
- Files
- 6
- Last updated
- Jun 30
- Pre-warmed CDN
- US EU US EU