AI & ML interests

KV cache compression, inference optimization, model compression

Recent Activity

ZenalyzeΒ  updated a model 3 days ago
fraQtl/Qwen3.6-35B-A3B-fraQtl-kv
ZenalyzeΒ  updated a Space 3 days ago
fraQtl/README
ZenalyzeΒ  published a Space 3 days ago
fraQtl/README
View all activity

Organization Card

fraQtl

KV cache + model compression for long-context LLMs

Run larger models. Same quality. Less memory.


⚑ Key Results

Model Compression Quality
Mistral-7B 14.48 GB β†’ 9.84 GB (3.5Γ— KV) +0.35 PPL
Qwen 3.6 35B 4Γ— KV cache -0.027 PPL

Compression can improve quality (regularization effect on long context)


πŸ§ͺ Live Demo

πŸ”— https://huggingface.co/spaces/fraQtl/fraQtl-demo
Compressed Mistral-7B running live


πŸ“Š What’s different

  • Structure-aware KV cache compression
  • Layer-selective quantization (hybrid architectures)
  • No retraining required
  • Works on real long-context workloads (NIAH, etc.)

πŸ“„ Paper

https://arxiv.org/abs/2604.11501


πŸ”’ Models (gated)


🌐 Website

https://fraqtl.ai


πŸ“¬ Contact

contact@fraqtl.ai


Patent pending.

datasets 0

None public yet