Fisher-Adaptive Binary Quantization w/ Residual Codebooks
Collection
5 items β’ Updated
Status: Active Duration: April 2026 - Present
FABQ-RC is a 1-bit quantization method for large language models that adapts per layer rather than using a fixed blocksize. It combines:
Target: ~1.18 bits per parameter, beating BiLLM on quality
| Metric | What it measures | Problem |
|---|---|---|
| Magnitude | Weight absolute value | Big weights aren't always important |
| Hessian | Loss curvature at current ΞΈ | Local only, expensive to compute |
| Fisher | Expected gradientΒ² over data | Captures average importance, tractable |
FP32 Weights
β
βΌ
Stage 1: Fisher-Weighted Channel Importance
β
Stage 2: Mixed-Precision Core Allocation
β Top 5% channels β int4
β Bottom 95% channels β binary Β±1
βΌ
Stage 3: Adaptive Blocksize Selection
β Per-layer blocksize {64, 128, 256, 512}
βΌ
Stage 4: Residual Codebook Clustering
β 4 tiered codebooks Γ 64 centroids
β 4-bit indices per block
βΌ
FABQ-RC Quantized Model
β
βΌ
GGUF Export
BiLLM approximates residuals as a linear function of the weight value. FABQ-RC's k-means codebook is nonlinear and captures arbitrary residual patterns without assuming a functional form.
from huggingface_hub import snapshot_download
model_path = snapshot_download("toxzak/Qwen3.6-27B-FABQ-RC-GGUF")
# Example inference command
./llama-cli -m Qwen3.6-27B-FABQ-RC-Q4_K_M.gguf -n 256 -p "The future of 1-bit quantization is"
# Perplexity on WikiText-2
./llama-perplexity -m Qwen3.6-27B-FABQ-RC-Q4_K_M.gguf -f wikitext.txt
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen3.6-27B |
| Format | GGUF |
| Bits per parameter | ~1.18 bpw |
| Architecture | FABQ-RC (Fisher-Adaptive Binary Quantization with Residual Codebooks) |
| Calibration | C4 dataset, 2048 samples |
| Method | bpw | Perplexity | Notes |
|---|---|---|---|
| FP16 | 16.0 | baseline | |
| Q1_0_g128 | 1.125 | degraded | Bonsai's format |
| BiLLM | 1.08 | ~8.41 (70B) | Best prior work |
| FABQ-RC | ~1.18 | target < 8.0 | Our method |
fabq-rc/
βββ README.md β This file
βββ FABQ_RC_SPEC.md β Full method specification
βββ FABQRC_PLAN.md β Research plan
βββ Main-FABQ-RC-Notebook.ipynb β Main quantization notebook
βββ FABQ-RC-Dense-27B-Notebook.ipynb β Dense model experiments
βββ plans/
βββ CALIBRATION-ROBUSTNESS-PLAN.md β Calibration improvements
βββ FABQ-VP-SPEC.md β Variable precision extension
βββ EBQ-SPEC.md β Error-budget allocation
βββ UNIFIED-SPEC.md β Combined architecture
FABQ-RC: Fisher-Adaptive Binary Quantization with Residual Codebooks
Zach Maronek, 2026
Apache 2.0 (see Hugging Face model page for details)
Built by Zach Maronek Β· April 2026
Base model
Qwen/Qwen3.6-27B