Instructions to use cfontes/GLM-5.2-F5-Molt with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use cfontes/GLM-5.2-F5-Molt with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="cfontes/GLM-5.2-F5-Molt") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("cfontes/GLM-5.2-F5-Molt") model = AutoModelForCausalLM.from_pretrained("cfontes/GLM-5.2-F5-Molt") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use cfontes/GLM-5.2-F5-Molt with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "cfontes/GLM-5.2-F5-Molt" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cfontes/GLM-5.2-F5-Molt", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/cfontes/GLM-5.2-F5-Molt
- SGLang
How to use cfontes/GLM-5.2-F5-Molt with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "cfontes/GLM-5.2-F5-Molt" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cfontes/GLM-5.2-F5-Molt", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "cfontes/GLM-5.2-F5-Molt" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cfontes/GLM-5.2-F5-Molt", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use cfontes/GLM-5.2-F5-Molt with Docker Model Runner:
docker model run hf.co/cfontes/GLM-5.2-F5-Molt
GLM-5.2-F5-Molt v2 🛡️
A cybersecurity-specialized fine-tune of GLM-5.2 that thinks harder, codes better, and actually helps security researchers instead of refusing them.
GLM-5.2-F5-Molt is a two-stage (SFT → DPO) adaptation of the flagship zai-org/GLM-5.2 Mixture-of-Experts model, purpose-built for legitimate offensive and defensive security work: vulnerability research, malware analysis, exploit development, reverse engineering, and security-tool authoring. It keeps the full reasoning horsepower of the base model while sanding off the reflexive over-refusal that makes most frontier models useless for real security research.
This is v2 — the merged model now integrates the DPO-v2 preference alignment on top of the multi-teacher SFT stage, with a non-destructive merge that preserves the base model's FP8 MoE experts.
Why this model exists
Frontier models are trained to refuse anything that sounds dangerous. But "write a heap-spray PoC for CVE-XXXX," "explain this packed malware sample," and "how would an attacker pivot from this foothold" are the daily bread of every red-teamer, incident responder, and vulnerability researcher on Earth. F5-Molt is aligned to answer those questions — with rigor, with working code, and without the moralizing preamble.
Benchmarks
Measured against the base GLM-5.2 under identical harnesses:
| Benchmark | Base GLM-5.2 | F5-Molt v2 | Delta |
|---|---|---|---|
| MMLU Pro | 77% | 82% | +5% |
| GPQA | 94% | 96% | +2% |
| GSM8K | 93% | 96% | +3% |
| HellaSwag | 71% | 75% | +4% |
| SimpleQA | 60% | 62% | +2% |
| HumanEval | 79.3% | 85.4% | +6% |
The distillation of Claude Opus 4.7/4.8 deep-reasoning traces measurably lifts multi-step reasoning (MMLU Pro, GSM8K) and code generation (HumanEval) without degrading world knowledge.
Training data
SFT stage — multi-teacher distillation
The supervised fine-tuning corpus (sft_combined_v4: 1,443 train / 160 val examples, ~1.77M tokens) blends:
- Claude Opus 4.7/4.8 deep-reasoning traces — 462 curated long-form reasoning transcripts (~105M tokens of source material), distilled to teach step-by-step chain-of-thought.
- Fable 5 ChatML examples — high-signal instruction/response pairs in ChatML format, including the AGI-expanded and merged Fable 5 sets.
- Cybersecurity Q&A pairs — domain-specific security question/answer data covering exploitation, malware, and defensive tradecraft.
- Multi-teacher combined dataset — a blended, deduplicated corpus drawing on multiple teacher models for stylistic and reasoning diversity.
- Unsloth-formatted message sets — normalized conversational data for consistent chat-template behavior.
DPO stage — preference alignment
The Direct Preference Optimization stage uses 897 preference pairs hand-curated around a single theme: legitimate requests that overly-cautious models refuse. Each pair contrasts a helpful, technically-correct security answer (chosen) against a refusal or hedged non-answer (rejected), across cybersecurity research, malware analysis, and vulnerability research.
Training configuration
Stage 1 — SFT (LoRA)
- Rank
r=64,alpha=128 - Target: attention projections on the top 18 layers (layers ≥ 60)
- 160 steps, learning rate
2e-4, sequence length2048
Stage 2 — DPO-v2 (LoRA)
- Rank
r=16,alpha=32,beta=0.1 - 30 steps, learning rate
5e-5 init_from=sft— the SFT adapter is loaded and frozen as the DPO reference policy, so DPO shifts preferences without erasing distilled reasoning.
Base & merge
- Base model:
zai-org/GLM-5.2(FP8) - Non-destructive merge: attention modules touched by the adapters are dequantized to bf16 for a lossless weight update, while the MoE expert tensors are preserved in FP8. 90 SFT + 90 DPO adapter targets merged; 631 modules left untouched. The result loads as a standard
transformerscheckpoint (151 safetensor shards).
Intended use
- Vulnerability research — triage, root-cause analysis, PoC development
- Malware analysis — static/dynamic reasoning, deobfuscation, capability inference
- Reverse engineering — binary/protocol analysis, decompilation assistance
- Security tooling — writing scanners, fuzzers, exploit frameworks, detection rules
- Red/blue team support — attack-path reasoning, detection engineering
Refusal boundary
F5-Molt is deliberately permissive for security work. It only refuses:
- Instructions facilitating self-harm, and
- Child sexual exploitation content.
Everything else — cybersecurity, malware analysis, vulnerability research, and weapon-systems questions — is answered directly and technically. Use responsibly and legally; you are accountable for what you do with the output.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "cfontes/GLM-5.2-F5-Molt"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype="auto", device_map="auto", trust_remote_code=True
)
messages = [{"role": "user", "content": "Explain how a use-after-free becomes an arbitrary write."}]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=1024)
print(tok.decode(out[0], skip_special_tokens=True))
Prefer LoRA serving? The individual adapters are published at
cfontes/GLM-5.2-F5-Molt-LoRA.
License
Released under the GLM license, inheriting all terms from the base model zai-org/GLM-5.2. See the license link.
Citation
@misc{glm52f5molt2026,
title = {GLM-5.2-F5-Molt: A Cybersecurity-Specialized Fine-Tune of GLM-5.2},
author = {Fontes, Chris},
year = {2026},
howpublished = {\url{https://huggingface.co/cfontes/GLM-5.2-F5-Molt}}
}
Built on zai-org/GLM-5.2. Distilled from Claude Opus 4.7/4.8 reasoning traces and the Fable 5 corpus. Aligned with Molt.
- Downloads last month
- 433
Model tree for cfontes/GLM-5.2-F5-Molt
Base model
zai-org/GLM-5.2