pipenetwork/LongCat-2.0-4bit

4-bit MLX quantization (group-size 64, router classifiers @8-bit, MTP dropped) of meituan-longcat/LongCat-2.0 — a 1.6T / ~48B-active MoE. Converted from the FP8 source with mlx-lm.

Requires mlx-lm PR #1464

pip install git+https://github.com/ml-explore/mlx-lm.git@refs/pull/1464/head
from mlx_lm import load, generate
model, tok = load("pipenetwork/LongCat-2.0-4bit")
p = tok.apply_chat_template([{"role":"user","content":"Who is Albert Einstein?"}], add_generation_prompt=True)
print(generate(model, tok, prompt=p, max_tokens=512, verbose=True))
Downloads last month
-
Safetensors
Model size
1.6T params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pipenetwork/LongCat-2.0-4bit

Quantized
(5)
this model