Instructions to use Pythagoras-LM/Pythagoras-Prover-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Pythagoras-LM/Pythagoras-Prover-4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Pythagoras-LM/Pythagoras-Prover-4B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("Pythagoras-LM/Pythagoras-Prover-4B") model = AutoModelForMultimodalLM.from_pretrained("Pythagoras-LM/Pythagoras-Prover-4B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Pythagoras-LM/Pythagoras-Prover-4B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Pythagoras-LM/Pythagoras-Prover-4B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Pythagoras-LM/Pythagoras-Prover-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Pythagoras-LM/Pythagoras-Prover-4B
- SGLang
How to use Pythagoras-LM/Pythagoras-Prover-4B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Pythagoras-LM/Pythagoras-Prover-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Pythagoras-LM/Pythagoras-Prover-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Pythagoras-LM/Pythagoras-Prover-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Pythagoras-LM/Pythagoras-Prover-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Pythagoras-LM/Pythagoras-Prover-4B with Docker Model Runner:
docker model run hf.co/Pythagoras-LM/Pythagoras-Prover-4B

Pythagoras-Prover
1. Introduction
We introduce Pythagoras-Prover, a compute-efficient family of open-source large language models for formal theorem proving in Lean 4. The family comprises two autoregressive provers at 4B and 32B parameters, together with Pythagoras-Prover-Diffusion, the first diffusion-based theorem prover, which iteratively refines Lean proofs at inference time. All three models are artefacts of a single methodological approach: a scalable, Lean-verified synthetic data pipeline. At its centre is Augmented Lean Formalisation (ALF), a structured mutation scheme that expands a verified seed corpus into formal variants without per-instance Lean compilation, then re-uses them as a self-distillation signal during training. This design lets careful data construction stand in for raw scale, closing much of the gap between small open provers and their largest counterparts — without relying on inference-time self-correction.
2. Model Summary
A Lean-Verified Synthetic Data Pipeline
- Natural-language problems from general-math and competition sources are autoformalised into Lean and gated on the type-checker using predominantly sub-30B open models, with an auto-informalisation and alignment step discarding faithful-but-wrong formalisations to yield a verified seed corpus partitioned into easy, medium, and hard tiers.
- A rubric-guided distillation stage re-prompts on each rejected instance to target the specific Lean type-checker error responsible for its failure, lifting autoformalisation success and roughly doubling the verified training set.
Model Training
- LoRA-only supervised fine-tuning of Qwen3-4B and Qwen3-32B under an 8K context, paired with a dynamic proof-reasoning filter and a difficulty-ordered easy→medium→hard curriculum, followed by reinforcement learning with a Lean-compilation reward and a final continued-SFT stage on the ALF corpus.
Augmented Lean Formalisation
- ALF emits one structured variant per category — simplification, generalisation, lemma proposal, proof-step decomposition, and reformulation — for every seed statement, replacing per-instance Lean verification with a cheap statement-alignment check and expanding the seed corpus into roughly 2M formal variants.
- The post-RL prover proves the mutations, and these self-distilled proofs form a corpus that trains both the autoregressive and diffusion provers from a single recipe.
The Smallest Efficient Open-Source Lean Theorem Prover
- We train Pythagoras-Prover-4B, one of the smallest and most compute-efficient open-source Lean theorem provers to date, reaching 86.07% on MiniF2F-Test at Pass@32 and surpassing the prior state of the art at a fraction of its parameter count.
- Pythagoras-Prover-Diffusion adapts a block-diffusion formulation with a tactic-based masking objective aligned to the discrete reasoning steps of Lean — to our knowledge the first demonstration that a diffusion language model can verifiably solve Lean theorems at non-trivial rates.
The resulting models set a new bar for compute-efficient formal proving. Pythagoras-Prover-32B achieves state-of-the-art performance among open-source provers, reaching 93.03% on MiniF2F-Test and solving 93 of 672 problems on PutnamBench, while Pythagoras-Prover-4B outperforms DeepSeek-Prover-V2-671B on MiniF2F-Test despite being roughly 167× smaller — with no self-correction or test-time reinforcement learning. We additionally release MiniF2F-ALF, an ALF-mutated companion benchmark on which every evaluated prover degrades.
3. Benchmark Performance
We evaluate Pythagoras-Prover on three Lean 4 benchmarks — MiniF2F-Test, PutnamBench, and the MiniF2F-ALF benchmark we introduce — under a single unified protocol (Lean 4.9.0-rc1, a 30,000-token generation limit, and a verbatim-statement pass criterion). Across all three, Pythagoras-Prover matches or exceeds open-source provers an order of magnitude larger, and does so without inference-time self-correction or test-time reinforcement learning.
| Method | #Params | Pass@32 | Pass@1024 | Best (N) |
|---|---|---|---|---|
| Goedel-Prover-SFT | 7B | 57.6 | – | 62.7 (3200) |
| STP | 7B | – | – | 67.6 (25600) |
| Kimina-Prover-Preview-72B | 72B | 68.85 | – | 80.74 (8192) |
| DeepSeek-Prover-V2-7B | 7B | 75.6 | – | 82.0 (8192) |
| DeepSeek-Prover-V2-671B | 671B | 82.4 | – | 88.9 (8192) |
| Kimina-Prover-8B-Distill | 8B | 77.86 | – | – |
| Kimina-Prover-70B | 70B | 84.0 | 87.7 | 92.2 (TTRL) |
| Goedel-Prover-V2-8B | 8B | 84.6 | 87.9 | 90.2 (8192) |
| + Self-Correction | 8B | 86.7 | 89.3 | – |
| Goedel-Prover-V2-32B | 32B | 88.1 | 91.8 | 92.2 (8192) |
| + Self-Correction | 32B | 90.4 | 92.6 | – |
| Pythagoras-Prover-4B | 4B | 86.07 | 88.11 | 89.75 (2048) |
| Pythagoras-Prover-32B | 32B | 89.75 | 92.62 | 93.03 (2048) |
| # | Model | num-solved | compute |
|---|---|---|---|
| 1 | Pythagoras-Prover-32B | 93 | Pass@2048 |
| 1 | Pythagoras-Prover-32B | 59 | Pass@64 |
| 1 | Pythagoras-Prover-32B | 48 | Pass@32 |
| 2 | Goedel-Prover-V2-32B (self-correction mode) | 86 | Pass@184 |
| 2 | Goedel-Prover-V2-32B (self-correction mode) | 57 | Pass@32 |
| 2 | Goedel-Prover-V2-32B | 43 | Pass@32 |
| 3 | DeepSeek-Prover-V2-671B | 47 | Pass@1024 |
| 3 | DeepSeek-Prover-V2-671B | 22 | Pass@32 |
| 4 | DSP+ | 23 | Pass@128 |
| 5 | Bourbaki | 14 | Pass@512 |
| 6 | Kimina-Prover-7B-Distill | 10 | Pass@192 |
| 7 | Self-play Theorem Prover | 8 | Pass@3200 |
| 8 | Goedel-Prover-SFT | 7 | Pass@512 |
| 9 | ABEL (closed-source) | 7 | Pass@596 |
| Model | Pass@32 |
|---|---|
| DeepSeek-Prover-V2-671B | 79.71 |
| Goedel-Prover-V2-8B | 82.58 |
| Goedel-Prover-V2-32B | 83.61 |
| Pythagoras-Prover-4B | 83.19 |
| Pythagoras-Prover-32B | 85.04 |
4. Model & Dataset Downloads
We release our Pythagoras-Prover models, the training dataset and the new MiniF2F-ALF benchmark for future future research.
| Model | Download |
|---|---|
| Pythagoras-Prover-32B | Coming Soon |
| Pythagoras-Prover-4B | 🤗Download |
| Pythagoras-Prover-Diffusion-4B | Coming Soon |
| Dataset | Download |
|---|---|
| Pythagoras-Prover-SFT | 🤗Download |
| Pythagoras-Prover-Distill-4B | Coming Soon |
| Pythagoras-Prover-Distill-32B | Coming Soon |
5. Quick Start
For model inference., Huggingface's Transformers can directly be used
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "Pythagoras-LM/Pythagoras-Prover-4B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
formal_statement = """
import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
/-- The volume of a cone is given by the formula $V = \frac{1}{3}Bh$, where $B$ is the area of the base and $h$ is the height. The area of the base of a cone is 30 square units, and its height is 6.5 units. What is the number of cubic units in its volume? Show that it is 65.-/
theorem mathd_algebra_478 (b h v : ℝ) (h₀ : 0 < b ∧ 0 < h ∧ 0 < v) (h₁ : v = 1 / 3 * (b * h))
(h₂ : b = 30) (h₃ : h = 13 / 2) : v = 65 := by
sorry
""".strip()
prompt = """
Complete the following Lean 4 code:
```lean4
{}```
Before producing the Lean 4 code to formally prove the given theorem, provide a detailed proof plan outlining the main proof steps and strategies.
The plan should highlight key ideas, intermediate lemmas, and proof structures that will guide the construction of the final formal proof.
""".strip()
chat = [
{"role": "user", "content": prompt.format(formal_statement)},
]
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
inputs = tokenizer.apply_chat_template(chat, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
import time
start = time.time()
outputs = model.generate(inputs, max_new_tokens=8192)
print(tokenizer.batch_decode(outputs))
print(time.time() - start)
Cite
@article{leang2026pythagoras,
title={Pythagoras-Prover: Advancing Efficient Formal Proving via Augmented Lean Formalisation},
author={Leang, Joshua Ong Jun and Zhao, Zheng and Stoian, Mihaela Catalina and Xu, Qiyuan and Li, Haonan and Li, Wenda and Cohen, Shay B. and Giunchiglia, Eleonora},
journal={arXiv preprint arXiv:2606.12594},
year={2026}
}
- Downloads last month
- 49