pinned
Sleeping
Agents
fraQtl β Compressed LLM Demo
β‘
Generate text and test retrieval with a compressed Mistralβ7B
KV cache compression, inference optimization, model compression
Run larger models. Same quality. Less memory.
| Model | Compression | Quality |
|---|---|---|
| Mistral-7B | 14.48 GB β 9.84 GB (3.5Γ KV) | +0.35 PPL |
| Qwen 3.6 35B | 4Γ KV cache | -0.027 PPL |
Compression can improve quality (regularization effect on long context)
π https://huggingface.co/spaces/fraQtl/fraQtl-demo
Compressed Mistral-7B running live
https://arxiv.org/abs/2604.11501
Patent pending.