FastContext-1.0-4B-SFT-mlx-4bit-g32

4-bit MLX quantization of microsoft/FastContext-1.0-4B-SFT with group_size=32 for Apple Silicon.

Quantization details

  • Method: Affine 4-bit
  • Group size: 32 (finer than the default 64)
  • Effective bits per weight: 5.0
  • Model size: 2.4 GB (vs 7.5 GB bf16)

Benchmark results

Tested on 10 SWE-bench Multilingual instances against other quantization variants:

Model Bits/Wt Size File F1 Line F1
affine 8-bit g64 8.5 4.0G 0.507 0.140
affine 4-bit g32 (this model) 5.0 2.4G 0.300 0.090
affine 3-bit g64 3.5 1.7G 0.100 0.000
affine 4-bit g64 4.5 2.1G 0.050 0.005
mattrobenolt 4-bit g64 4.5 2.1G 0.025 0.008

The finer group_size=32 delivers 12x better File F1 than standard 4-bit g64 quantization with only 300MB additional size.

Usage

from mlx_lm import load, generate

model, tokenizer = load("rubybear-lgtm/FastContext-1.0-4B-SFT-mlx-4bit-g32")

Or with fastcontext-mcp for Claude Code integration.

Downloads last month
-
Safetensors
Model size
0.8B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rubybear/FastContext-1.0-4B-SFT-mlx-4bit-g32

Quantized
(24)
this model