FastContext-1.0-4B-SFT-mlx-8bit

8-bit MLX quantization of microsoft/FastContext-1.0-4B-SFT for Apple Silicon.

Quantization details

  • Method: Affine 8-bit
  • Group size: 64
  • Effective bits per weight: 8.5
  • Model size: 4.0 GB (vs 7.5 GB bf16)

Benchmark results

Tested on 10 SWE-bench Multilingual instances against other quantization variants:

Model Bits/Wt Size File F1 Line F1
affine 8-bit g64 (this model) 8.5 4.0G 0.507 0.140
affine 4-bit g32 5.0 2.4G 0.300 0.090
affine 3-bit g64 3.5 1.7G 0.100 0.000
affine 4-bit g64 4.5 2.1G 0.050 0.005
mattrobenolt 4-bit g64 4.5 2.1G 0.025 0.008

Highest quality quantization — best File F1 and Line F1 at the cost of larger size and slower inference.

Usage

from mlx_lm import load, generate

model, tokenizer = load("rubybear/FastContext-1.0-4B-SFT-mlx-8bit")

Or with fastcontext-mcp for Claude Code integration.

Downloads last month
-
Safetensors
Model size
1B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rubybear/FastContext-1.0-4B-SFT-mlx-8bit

Quantized
(24)
this model