fraQtl/Llama-3.2-3B-optimized
3B β’ Updated β’ 44
We just compressed Qwen 3.6's KV cache 4x with zero quality loss (PPL actually improves slightly).
Works automatically on the hybrid architecture β detects standard vs linear attention layers.
Model card: huggingface.co/fraQtl/Qwen3.6-35B-A3B-fraQtl-kv :)