FastContext-1.0-4B-SFT MLX Dynamic 4-to-6-bit

MLX conversion of microsoft/FastContext-1.0-4B-SFT, quantized with mlx_lm.dynamic_quant.

Quantization

  • Method: MLX-LM dynamic mixed precision
  • Target: 5.0 bits per weight
  • Actual: 4.984 bits per weight
  • Low precision: 4-bit, group size 64
  • High precision: 6-bit, group size 64
  • Source revision: 80b60c036a612354e7c611cdabc005ec67f76993

Local Evaluation

Short-context evaluation command:

mlx_lm.perplexity \
  --model ukint-vs/FastContext-1.0-4B-SFT-Dynamic-4bit-MLX \
  --num-samples 64 \
  --sequence-length 512

Longer-context evaluation command:

mlx_lm.perplexity \
  --model ukint-vs/FastContext-1.0-4B-SFT-Dynamic-4bit-MLX \
  --num-samples 32 \
  --sequence-length 2048

Results on allenai/tulu-3-sft-mixture:

Model Seq len Samples Size Peak memory Perplexity
BF16 MLX 512 64 7.5G 12.34 GB 7.732 +/- 0.121
Dynamic 4-to-6-bit 512 64 2.3G 6.80 GB 7.956 +/- 0.125
BF16 MLX 2048 32 7.5G 24.28 GB 4.258 +/- 0.041
Dynamic 4-to-6-bit 2048 32 2.3G 18.74 GB 4.349 +/- 0.042

Quantization run reported calibration PPL 8.025 -> 8.300 and peak quantization memory 38.217GB.

Usage

mlx_lm.generate \
  --model ukint-vs/FastContext-1.0-4B-SFT-Dynamic-4bit-MLX \
  --prompt "Explain what repository exploration means." \
  --max-tokens 128 \
  --temp 0.0
Downloads last month
340
Safetensors
Model size
0.7B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ukint-vs/FastContext-1.0-4B-SFT-Dynamic-4bit-MLX

Quantized
(25)
this model