SmolLM-135M-CGGR-Math

This model is a specialized version of HuggingFaceTB/SmolLM-135M, fine-tuned for mathematical reasoning using Confidence-Gated Gradient Routing (CGGR).

🚀 The CGGR Breakthrough

This model was trained using a novel training strategy that selects only the "hardest" tokens for gradient updates, allowing for:

4.08x Higher Throughput: Processing 4x more data in the same wall-clock time compared to standard training.
66% VRAM Savings: Fitting large-batch training on consumer hardware (RTX 3060).
Superior Convergence: Achieving a +19% relative accuracy improvement on math reasoning tasks (AIME 2024) compared to standard fine-tuning.

Benchmark Results (6-Hour Training Race)

Metric	Standard (Baseline)	CGGR (Ours)	Relative Gain
Solving Accuracy (AIME)	8.00%	9.50%	+18.75%
Training Throughput	14,368 samples	58,716 samples	+308%
Final Loss	0.3610	0.0980	-73% Error
Max Batch Size (12GB)	18	69	3.8x Capacity

📈 Performance Visuals

$Benchmark Dashboard$

Model Details

Architecture: Transformer Decoder (SmolLM-135M)
Training Method: CGGR (Confidence-Gated Gradient Routing)
Selection Strategy: Fixed Quota (Top 25% hardest tokens)
Compute: Trained on a single NVIDIA RTX 3060 (12GB)
Duration: 6 Total Hours

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "MinimaML/SmolLM-135M-CGGR-Math"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "Question: If x + y = 10 and x - y = 2, what is the value of x?\n\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

If you use this model or the CGGR technique in your research, please cite:

@software{cggr2026,
  title={CGGR: Confidence-Gated Gradient Routing},
  author={MinimaML},
  year={2026},
  url={https://github.com/MinimaML/CGGR}
}

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MinimaML/SmolLM-135M-CGGR-Math

Base model

HuggingFaceTB/SmolLM-135M

Finetuned

(122)

this model

MinimaML
/

SmolLM-135M-CGGR-Math