GLM-5.2-JANGTQ

JANGTQ (JANG TurboQuant) quantization of zai-org/GLM-5.2 for MLX on Apple silicon. TurboQuant applies a random-sign Hadamard rotation, a per-row FP16 norm, and a per-layer Lloyd-Max codebook to the routed experts, keeping the backbone at higher precision.

Profile: JANGTQ_2L

Component Precision
Attention FP16
Shared experts FP16
Routed experts 2-bit
Embeddings FP16
LM head FP16

Requirements

  • Load with the jang-tools Python package or vMLX. Not supported by stock MLX, LM Studio, or Ollama.
  • Kimi's tokenizer uses tiktoken and custom code: pip install tiktoken blobfile and set TRANSFORMERS_TRUST_REMOTE_CODE=1.

Usage

import os
os.environ["TRANSFORMERS_TRUST_REMOTE_CODE"] = "1"

from jang_tools.load_jangtq import load_jangtq_model as load
from mlx_lm import generate

model, tokenizer = load("bearzi/GLM-5.2-JANGTQ")
msgs = [{"role": "user", "content": "Write a Python function that reverses a string."}]
prompt = tokenizer.apply_chat_template(msgs, add_generation_prompt=True, tokenize=False)
print(generate(model, tokenizer, prompt=prompt, max_tokens=512, verbose=True))

License

Inherited from the base model zai-org/GLM-5.2; quantization does not change the upstream terms. Attribution is required only for very large commercial deployments (see the license link above).

Downloads last month
2,722
Safetensors
Model size
64B params
Tensor type
F16
·
U32
·
U8
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bearzi/GLM-5.2-JANGTQ

Base model

zai-org/GLM-5.2
Finetuned
(9)
this model

Collection including bearzi/GLM-5.2-JANGTQ