Instructions to use ramankrishna10/npc-reason with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use ramankrishna10/npc-reason with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="ramankrishna10/npc-reason", filename="npc-reason-f16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use ramankrishna10/npc-reason with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf ramankrishna10/npc-reason:Q4_K_M # Run inference directly in the terminal: llama cli -hf ramankrishna10/npc-reason:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf ramankrishna10/npc-reason:Q4_K_M # Run inference directly in the terminal: llama cli -hf ramankrishna10/npc-reason:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf ramankrishna10/npc-reason:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf ramankrishna10/npc-reason:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf ramankrishna10/npc-reason:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf ramankrishna10/npc-reason:Q4_K_M
Use Docker
docker model run hf.co/ramankrishna10/npc-reason:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use ramankrishna10/npc-reason with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ramankrishna10/npc-reason" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ramankrishna10/npc-reason", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ramankrishna10/npc-reason:Q4_K_M
- Ollama
How to use ramankrishna10/npc-reason with Ollama:
ollama run hf.co/ramankrishna10/npc-reason:Q4_K_M
- Unsloth Studio
How to use ramankrishna10/npc-reason with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ramankrishna10/npc-reason to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ramankrishna10/npc-reason to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ramankrishna10/npc-reason to start chatting
- Atomic Chat new
- Docker Model Runner
How to use ramankrishna10/npc-reason with Docker Model Runner:
docker model run hf.co/ramankrishna10/npc-reason:Q4_K_M
- Lemonade
How to use ramankrishna10/npc-reason with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull ramankrishna10/npc-reason:Q4_K_M
Run and chat with the model
lemonade run user.npc-reason-Q4_K_M
List all available models
lemonade list
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf ramankrishna10/npc-reason:# Run inference directly in the terminal:
llama cli -hf ramankrishna10/npc-reason:Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf ramankrishna10/npc-reason:# Run inference directly in the terminal:
./llama-cli -hf ramankrishna10/npc-reason:Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf ramankrishna10/npc-reason:# Run inference directly in the terminal:
./build/bin/llama-cli -hf ramankrishna10/npc-reason:Use Docker
docker model run hf.co/ramankrishna10/npc-reason:NPC Reason 1.5B
A math-reasoning model whose every load-bearing arithmetic step emits a mechanically-checkable
assertion in the form <<EXPR = RESULT>>. Specialized from DeepSeek-R1-Distill-Qwen-1.5B (MIT).
The point is not just a final answer. It is that a pure-code checker can re-execute every step and
confirm the chain, so "verifiable-rate" is not the model's opinion. Anyone can run the checker.
Results (frozen held-out eval, n=500, GSM8K + MATH-500, greedy, format prompt)
| metric | base R1-Distill | SFT (V4 distill) | NPC Reason (RL) |
|---|---|---|---|
| verifiable-rate | 0.0% | 76.8% | 76.2% |
| accuracy | 61.6% | 65.8% | 66.6% |
| verified-and-correct | 0.0% | 58.0% | 59.6% |
verified-and-correct (both axes) is the headline. The full arc is shown on purpose, not just the best column.
What actually happened (plain language, no overclaiming)
- The base model produces zero mechanically-verifiable chains, even when asked for the format.
Only 1 of 500 base outputs even contained a
<<marker. - The SFT distillation did the heavy lifting: 0 to 76.8% verifiable. Training on a corpus of DeepSeek-V4 chains that the frozen verifier confirmed (verifiable AND correct, 7,546 kept of 13,245 generated) transferred the grounding. Accuracy rose (61.6 to 65.8), it was not bought by sacrificing correctness.
- RLVR/GRPO against the frozen verifier was a stable refinement, not a decisive gain. It moved verified-and-correct +1.6pp (58.0 to 59.6) and accuracy +0.8pp, with verifiable flat (-0.6pp). On n=500 that is roughly 8 problems. The RL model and the SFT model are statistically about even.
- The shipped model is the RL checkpoint (marginally best on the headline). The SFT model is statistically equivalent and available as a fallback. Either is a defensible "NPC Reason".
- The pre-registered 90% verifiable bar was NOT met (stuck near 77%). It was deliberately not chased into instability. This is the open limitation and the next frontier.
- Accuracy includes the greedy no-answer floor. The same greedy decoding is used for base and tuned models, so the comparisons are apples-to-apples.
The verifier (the methodological core)
A pure-code Python/SymPy checker, frozen at sha256 d5d146cf..., used as BOTH the evaluation
metric AND the RL reward (byte-identical both times). A chain is VERIFIABLE iff every load-bearing
<<EXPR=RESULT>> assertion re-executes correctly AND the final answer composes from the last step.
Correctness (final == gold) is a separate, independent axis. The verifier is shipped with the model
(verifier/step_verifier.py); users run it on the model's own outputs.
Methods finding worth keeping
GRPO with this hard, frozen, pure-code verifier reward trained STABLY: KL stayed flat (~0.0002), no length runaway, no mode collapse, no early-stop trip. This is notable because prior RL attempts in related work were unstable. RLVR with a clean verifier reward is a regime where a small model trains without collapsing. The lift was small here, but the stability is the keepable result.
Intended use and limits
- Use: math problems where checkable, grounded reasoning steps matter (arithmetic and
arithmetic-reducible word problems). Prompt for the
<<EXPR = RESULT>>format (see USAGE.md). - Math-first. Logic, proofs, and general chain-of-thought are NOT claimed and are future work.
- Not a general chat model. The 1.5B reasoning ceiling applies. ~23% of format-prompt chains are still not fully verifiable, the unverified tail.
- Simulation/research artifact. Verify outputs with the included checker before relying on them.
Lineage and license
- Base: DeepSeek-R1-Distill-Qwen-1.5B (MIT).
- Training chains distilled from DeepSeek V4 (distillation permitted; output rights assigned to the user). Released under MIT to match the base.
- Attribution: Rama Krishna Bachu, Bottensor (Independent Research). ORCID 0009-0000-1298-0681.
Reproducibility
The pre-registration was frozen BEFORE any training and an honest-null clause was in force. Frozen references:
- Verifier:
VERIFIER.locksha256d5d146cf... - Eval set:
EVAL.locksha256e1573cab... - Pre-registration:
PREREG.locksha256b5a49437...
GGUF quantizations are provided with a per-quant decision-fidelity check vs the bf16 model
(see gguf_fidelity.md); pick the recommended quant, do not choose on file size alone.
- Downloads last month
- 173
Model tree for ramankrishna10/npc-reason
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
Install (macOS, Linux)
# Start a local OpenAI-compatible server with a web UI: llama serve -hf ramankrishna10/npc-reason:# Run inference directly in the terminal: llama cli -hf ramankrishna10/npc-reason: