Instructions to use ramankrishna10/npc-reason with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ramankrishna10/npc-reason with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="ramankrishna10/npc-reason",
	filename="npc-reason-f16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use ramankrishna10/npc-reason with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf ramankrishna10/npc-reason:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf ramankrishna10/npc-reason:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf ramankrishna10/npc-reason:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf ramankrishna10/npc-reason:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf ramankrishna10/npc-reason:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf ramankrishna10/npc-reason:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf ramankrishna10/npc-reason:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf ramankrishna10/npc-reason:Q4_K_M

Use Docker

docker model run hf.co/ramankrishna10/npc-reason:Q4_K_M

LM Studio
Jan

vLLM

How to use ramankrishna10/npc-reason with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ramankrishna10/npc-reason"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ramankrishna10/npc-reason",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ramankrishna10/npc-reason:Q4_K_M

Ollama
How to use ramankrishna10/npc-reason with Ollama:
```
ollama run hf.co/ramankrishna10/npc-reason:Q4_K_M
```

Unsloth Studio

How to use ramankrishna10/npc-reason with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ramankrishna10/npc-reason to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ramankrishna10/npc-reason to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for ramankrishna10/npc-reason to start chatting

Atomic Chat new
Docker Model Runner
How to use ramankrishna10/npc-reason with Docker Model Runner:
```
docker model run hf.co/ramankrishna10/npc-reason:Q4_K_M
```

Lemonade

How to use ramankrishna10/npc-reason with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull ramankrishna10/npc-reason:Q4_K_M

Run and chat with the model

lemonade run user.npc-reason-Q4_K_M

List all available models

lemonade list

NPC Reason 1.5B

A math-reasoning model whose every load-bearing arithmetic step emits a mechanically-checkable assertion in the form <<EXPR = RESULT>>. Specialized from DeepSeek-R1-Distill-Qwen-1.5B (MIT). The point is not just a final answer. It is that a pure-code checker can re-execute every step and confirm the chain, so "verifiable-rate" is not the model's opinion. Anyone can run the checker.

Results (frozen held-out eval, n=500, GSM8K + MATH-500, greedy, format prompt)

metric	base R1-Distill	SFT (V4 distill)	NPC Reason (RL)
verifiable-rate	0.0%	76.8%	76.2%
accuracy	61.6%	65.8%	66.6%
verified-and-correct	0.0%	58.0%	59.6%

verified-and-correct (both axes) is the headline. The full arc is shown on purpose, not just the best column.

What actually happened (plain language, no overclaiming)

The base model produces zero mechanically-verifiable chains, even when asked for the format. Only 1 of 500 base outputs even contained a << marker.
The SFT distillation did the heavy lifting: 0 to 76.8% verifiable. Training on a corpus of DeepSeek-V4 chains that the frozen verifier confirmed (verifiable AND correct, 7,546 kept of 13,245 generated) transferred the grounding. Accuracy rose (61.6 to 65.8), it was not bought by sacrificing correctness.
RLVR/GRPO against the frozen verifier was a stable refinement, not a decisive gain. It moved verified-and-correct +1.6pp (58.0 to 59.6) and accuracy +0.8pp, with verifiable flat (-0.6pp). On n=500 that is roughly 8 problems. The RL model and the SFT model are statistically about even.
The shipped model is the RL checkpoint (marginally best on the headline). The SFT model is statistically equivalent and available as a fallback. Either is a defensible "NPC Reason".
The pre-registered 90% verifiable bar was NOT met (stuck near 77%). It was deliberately not chased into instability. This is the open limitation and the next frontier.
Accuracy includes the greedy no-answer floor. The same greedy decoding is used for base and tuned models, so the comparisons are apples-to-apples.

The verifier (the methodological core)

A pure-code Python/SymPy checker, frozen at sha256 d5d146cf..., used as BOTH the evaluation metric AND the RL reward (byte-identical both times). A chain is VERIFIABLE iff every load-bearing <<EXPR=RESULT>> assertion re-executes correctly AND the final answer composes from the last step. Correctness (final == gold) is a separate, independent axis. The verifier is shipped with the model (verifier/step_verifier.py); users run it on the model's own outputs.

Methods finding worth keeping

GRPO with this hard, frozen, pure-code verifier reward trained STABLY: KL stayed flat (~0.0002), no length runaway, no mode collapse, no early-stop trip. This is notable because prior RL attempts in related work were unstable. RLVR with a clean verifier reward is a regime where a small model trains without collapsing. The lift was small here, but the stability is the keepable result.

Intended use and limits

Use: math problems where checkable, grounded reasoning steps matter (arithmetic and arithmetic-reducible word problems). Prompt for the <<EXPR = RESULT>> format (see USAGE.md).
Math-first. Logic, proofs, and general chain-of-thought are NOT claimed and are future work.
Not a general chat model. The 1.5B reasoning ceiling applies. ~23% of format-prompt chains are still not fully verifiable, the unverified tail.
Simulation/research artifact. Verify outputs with the included checker before relying on them.

Lineage and license

Base: DeepSeek-R1-Distill-Qwen-1.5B (MIT).
Training chains distilled from DeepSeek V4 (distillation permitted; output rights assigned to the user). Released under MIT to match the base.
Attribution: Rama Krishna Bachu, Bottensor (Independent Research). ORCID 0009-0000-1298-0681.

Reproducibility

The pre-registration was frozen BEFORE any training and an honest-null clause was in force. Frozen references:

Verifier: VERIFIER.lock sha256 d5d146cf...
Eval set: EVAL.lock sha256 e1573cab...
Pre-registration: PREREG.lock sha256 b5a49437...

GGUF quantizations are provided with a per-quant decision-fidelity check vs the bf16 model (see gguf_fidelity.md); pick the recommended quant, do not choose on file size alone.

Downloads last month: 173

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for ramankrishna10/npc-reason

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Quantized

(242)

this model