Instructions to use cfontes/GLM-5.2-F5-Molt with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use cfontes/GLM-5.2-F5-Molt with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="cfontes/GLM-5.2-F5-Molt")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("cfontes/GLM-5.2-F5-Molt")
model = AutoModelForCausalLM.from_pretrained("cfontes/GLM-5.2-F5-Molt")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use cfontes/GLM-5.2-F5-Molt with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "cfontes/GLM-5.2-F5-Molt"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cfontes/GLM-5.2-F5-Molt",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/cfontes/GLM-5.2-F5-Molt

SGLang

How to use cfontes/GLM-5.2-F5-Molt with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "cfontes/GLM-5.2-F5-Molt" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cfontes/GLM-5.2-F5-Molt",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "cfontes/GLM-5.2-F5-Molt" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cfontes/GLM-5.2-F5-Molt",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use cfontes/GLM-5.2-F5-Molt with Docker Model Runner:
```
docker model run hf.co/cfontes/GLM-5.2-F5-Molt
```

GLM-5.2-F5-Molt v2 🛡️

A cybersecurity-specialized fine-tune of GLM-5.2 that thinks harder, codes better, and actually helps security researchers instead of refusing them.

GLM-5.2-F5-Molt is a two-stage (SFT → DPO) adaptation of the flagship zai-org/GLM-5.2 Mixture-of-Experts model, purpose-built for legitimate offensive and defensive security work: vulnerability research, malware analysis, exploit development, reverse engineering, and security-tool authoring. It keeps the full reasoning horsepower of the base model while sanding off the reflexive over-refusal that makes most frontier models useless for real security research.

This is v2 — the merged model now integrates the DPO-v2 preference alignment on top of the multi-teacher SFT stage, with a non-destructive merge that preserves the base model's FP8 MoE experts.

Why this model exists

Frontier models are trained to refuse anything that sounds dangerous. But "write a heap-spray PoC for CVE-XXXX," "explain this packed malware sample," and "how would an attacker pivot from this foothold" are the daily bread of every red-teamer, incident responder, and vulnerability researcher on Earth. F5-Molt is aligned to answer those questions — with rigor, with working code, and without the moralizing preamble.

Benchmarks

Measured against the base GLM-5.2 under identical harnesses:

Benchmark	Base GLM-5.2	F5-Molt v2	Delta
MMLU Pro	77%	82%	+5%
GPQA	94%	96%	+2%
GSM8K	93%	96%	+3%
HellaSwag	71%	75%	+4%
SimpleQA	60%	62%	+2%
HumanEval	79.3%	85.4%	+6%

The distillation of Claude Opus 4.7/4.8 deep-reasoning traces measurably lifts multi-step reasoning (MMLU Pro, GSM8K) and code generation (HumanEval) without degrading world knowledge.

Training data

SFT stage — multi-teacher distillation

The supervised fine-tuning corpus (sft_combined_v4: 1,443 train / 160 val examples, ~1.77M tokens) blends:

Claude Opus 4.7/4.8 deep-reasoning traces — 462 curated long-form reasoning transcripts (~105M tokens of source material), distilled to teach step-by-step chain-of-thought.
Fable 5 ChatML examples — high-signal instruction/response pairs in ChatML format, including the AGI-expanded and merged Fable 5 sets.
Cybersecurity Q&A pairs — domain-specific security question/answer data covering exploitation, malware, and defensive tradecraft.
Multi-teacher combined dataset — a blended, deduplicated corpus drawing on multiple teacher models for stylistic and reasoning diversity.
Unsloth-formatted message sets — normalized conversational data for consistent chat-template behavior.

DPO stage — preference alignment

The Direct Preference Optimization stage uses 897 preference pairs hand-curated around a single theme: legitimate requests that overly-cautious models refuse. Each pair contrasts a helpful, technically-correct security answer (chosen) against a refusal or hedged non-answer (rejected), across cybersecurity research, malware analysis, and vulnerability research.

Training configuration

Stage 1 — SFT (LoRA)

Rank r=64, alpha=128
Target: attention projections on the top 18 layers (layers ≥ 60)
160 steps, learning rate 2e-4, sequence length 2048

Stage 2 — DPO-v2 (LoRA)

Rank r=16, alpha=32, beta=0.1
30 steps, learning rate 5e-5
init_from=sft — the SFT adapter is loaded and frozen as the DPO reference policy, so DPO shifts preferences without erasing distilled reasoning.

Base & merge

Base model: zai-org/GLM-5.2 (FP8)
Non-destructive merge: attention modules touched by the adapters are dequantized to bf16 for a lossless weight update, while the MoE expert tensors are preserved in FP8. 90 SFT + 90 DPO adapter targets merged; 631 modules left untouched. The result loads as a standard transformers checkpoint (151 safetensor shards).

Intended use

Vulnerability research — triage, root-cause analysis, PoC development
Malware analysis — static/dynamic reasoning, deobfuscation, capability inference
Reverse engineering — binary/protocol analysis, decompilation assistance
Security tooling — writing scanners, fuzzers, exploit frameworks, detection rules
Red/blue team support — attack-path reasoning, detection engineering

Refusal boundary

F5-Molt is deliberately permissive for security work. It only refuses:

Instructions facilitating self-harm, and
Child sexual exploitation content.

Everything else — cybersecurity, malware analysis, vulnerability research, and weapon-systems questions — is answered directly and technically. Use responsibly and legally; you are accountable for what you do with the output.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "cfontes/GLM-5.2-F5-Molt"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype="auto", device_map="auto", trust_remote_code=True
)

messages = [{"role": "user", "content": "Explain how a use-after-free becomes an arbitrary write."}]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=1024)
print(tok.decode(out[0], skip_special_tokens=True))

Prefer LoRA serving? The individual adapters are published at cfontes/GLM-5.2-F5-Molt-LoRA.

License

Released under the GLM license, inheriting all terms from the base model zai-org/GLM-5.2. See the license link.

Citation

@misc{glm52f5molt2026,
  title  = {GLM-5.2-F5-Molt: A Cybersecurity-Specialized Fine-Tune of GLM-5.2},
  author = {Fontes, Chris},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/cfontes/GLM-5.2-F5-Molt}}
}

Built on zai-org/GLM-5.2. Distilled from Claude Opus 4.7/4.8 reasoning traces and the Fable 5 corpus. Aligned with Molt.

Downloads last month: 433

Safetensors

Model size

743B params

Tensor type

F32

BF16

F8_E4M3

Model tree for cfontes/GLM-5.2-F5-Molt

Base model

zai-org/GLM-5.2

Adapter

(4)

this model