fraQtl AI Research

company

https://fraQtl.ai

Activity Feed

AI & ML interests

KV cache compression, inference optimization, model compression

Recent Activity

Zenalyze updated a model 3 days ago

fraQtl/Qwen3.6-35B-A3B-fraQtl-kv

Zenalyze updated a Space 3 days ago

fraQtl/README

Zenalyze published a Space 3 days ago

fraQtl/README

View all activity

Organization Card

Community About org cards

fraQtl

KV cache + model compression for long-context LLMs

Run larger models. Same quality. Less memory.

⚡ Key Results

Model	Compression	Quality
Mistral-7B	14.48 GB → 9.84 GB (3.5× KV)	+0.35 PPL
Qwen 3.6 35B	4× KV cache	-0.027 PPL

Compression can improve quality (regularization effect on long context)

🧪 Live Demo

🔗 https://huggingface.co/spaces/fraQtl/fraQtl-demo
Compressed Mistral-7B running live

📊 What’s different

Structure-aware KV cache compression
Layer-selective quantization (hybrid architectures)
No retraining required
Works on real long-context workloads (NIAH, etc.)

📄 Paper

https://arxiv.org/abs/2604.11501

🔒 Models (gated)

🌐 Website

https://fraqtl.ai

📬 Contact

contact@fraqtl.ai

Patent pending.

spaces 2

fraQtl — Compressed LLM Demo

⚡

Generate text and test retrieval with a compressed Mistral‑7B

models 6

datasets 0

None public yet