InternVL3-8B-MLX-4bit

This repository contains a 4-bit MLX quantized conversion of mlx-community/InternVL3-8B-bf16 for Apple Silicon inference.

Conversion Details

Setting	Value
Source model	`mlx-community/InternVL3-8B-bf16`
Conversion tool	`mlx_vlm.convert`
Quantization bits	`4`
Group size	`64`
Quantization mode	`affine`
Quant predicate	none (uniform quantization)

Conversion command used:

python3 -m mlx_vlm convert \
  --hf-path "mlx-community/InternVL3-8B-bf16" \
  --mlx-path "./models/InternVL3-8B-4bit" \
  -q --q-bits 4 --q-group-size 64

Validation

Test	Status
Text generation load test	passed

Verification command:

python3 -m mlx_vlm generate \
  --model "./models/InternVL3-8B-4bit" \
  --prompt "Reply with exactly: OK" \
  --max-tokens 8 --temperature 0

Observed response: OK

Usage

Install:

python3 -m pip install -U mlx-vlm

Run locally from this folder:

python3 -m mlx_vlm generate \
  --model "." \
  --prompt "Describe the image briefly." \
  --image path/to/image.jpg \
  --max-tokens 256 \
  --temperature 0

Run from Hugging Face after upload:

python3 -m mlx_vlm generate \
  --model "mlx-community/InternVL3-8B-MLX-4bit" \
  --prompt "Describe the image briefly." \
  --image path/to/image.jpg \
  --max-tokens 256 \
  --temperature 0

Notes

This conversion does not upload anything automatically.
Quantization changes numerical behavior relative to bf16 weights.
During local tests, mlx_vlm emitted an upstream tokenizer regex warning from the source model assets.

License

Follows the upstream model license terms from the source repository.

Downloads last month: 127

Safetensors

Model size

1B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for mlx-community/InternVL3-8B-MLX-4bit

Base model

OpenGVLab/InternVL3-1B-Pretrained

Finetuned

OpenGVLab/InternVL3-1B-Instruct

Finetuned

mlx-community/InternVL3-8B-bf16

Quantized

(1)

this model

mlx-community
/

InternVL3-8B-MLX-4bit