GLM-5.2-JANGTQ

JANGTQ (JANG TurboQuant) quantization of zai-org/GLM-5.2 for MLX on Apple silicon. TurboQuant applies a random-sign Hadamard rotation, a per-row FP16 norm, and a per-layer Lloyd-Max codebook to the routed experts, keeping the backbone at higher precision.

Profile: JANGTQ_2L

Component	Precision
Attention	FP16
Shared experts	FP16
Routed experts	2-bit
Embeddings	FP16
LM head	FP16

Requirements

Load with the jang-tools Python package or vMLX. Not supported by stock MLX, LM Studio, or Ollama.
Kimi's tokenizer uses tiktoken and custom code: pip install tiktoken blobfile and set TRANSFORMERS_TRUST_REMOTE_CODE=1.

Usage

import os
os.environ["TRANSFORMERS_TRUST_REMOTE_CODE"] = "1"

from jang_tools.load_jangtq import load_jangtq_model as load
from mlx_lm import generate

model, tokenizer = load("bearzi/GLM-5.2-JANGTQ")
msgs = [{"role": "user", "content": "Write a Python function that reverses a string."}]
prompt = tokenizer.apply_chat_template(msgs, add_generation_prompt=True, tokenize=False)
print(generate(model, tokenizer, prompt=prompt, max_tokens=512, verbose=True))

License

Inherited from the base model zai-org/GLM-5.2; quantization does not change the upstream terms. Attribution is required only for very large commercial deployments (see the license link above).

Downloads last month: 2,722

Safetensors

Model size

64B params

Tensor type

F16

U32

MLX

Hardware compatibility

Quantized

Model tree for bearzi/GLM-5.2-JANGTQ

Base model

zai-org/GLM-5.2

Finetuned

(9)

this model

Collection including bearzi/GLM-5.2-JANGTQ

GLM-5.2-JANGTQ

Collection

3 items • Updated 11 days ago