Instructions to use MachadoDeCastro/krull-nano with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MachadoDeCastro/krull-nano with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="MachadoDeCastro/krull-nano")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("MachadoDeCastro/krull-nano", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use MachadoDeCastro/krull-nano with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MachadoDeCastro/krull-nano" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MachadoDeCastro/krull-nano", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/MachadoDeCastro/krull-nano
- SGLang
How to use MachadoDeCastro/krull-nano with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "MachadoDeCastro/krull-nano" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MachadoDeCastro/krull-nano", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "MachadoDeCastro/krull-nano" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MachadoDeCastro/krull-nano", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use MachadoDeCastro/krull-nano with Docker Model Runner:
docker model run hf.co/MachadoDeCastro/krull-nano
KRULL-Nano Simple
KRULL means Knowledge Running Under Lightweight Language.
KRULL-Nano is a lightweight decoder-only Small Language Model (SLM) architecture designed for embedded devices, edge AI, offline inference, and privacy-preserving applications.
Unlike cloud-oriented LLMs optimized for massive datacenters, KRULL-Nano is designed from the ground up for:
- low latency
- low memory usage
- deterministic inference
- offline execution
- edge sovereignty
- efficient deployment on constrained hardware
Architecture
KRULL-Nano uses a compact decoder-only transformer architecture with multiple optimizations for edge execution.
Core Components
Decoder-Only Transformer
- autoregressive causal language modeling
- GPT-style token prediction
- sreaming-friendly generation
RMSNorm
KRULL replaces LayerNorm with RMSNorm to:
- reduce computational overhead
- improve low-precision stability
- minimize memory bandwidth
Multi-Query Attention (MQA)
Instead of full multi-head attention:
- multiple query heads
- shared key/value heads
Benefits:
- reduced KV cache size
- faster inference
- lower RAM usage
Gated Feed Forward Network
KRULL uses a gated FFN inspired by modern efficient transformer architectures.
Benefits:
- improved parameter efficiency
- lower compute cost
- better expressivity per parameter
What is included
- Decoder-only GPT-style model
- RMSNorm
- Multi-query attention
- Gated feed-forward block
- Simple character tokenizer
- CPU training script
- Text generation script
- ONNX export script
- Windows-friendly imports
Project structure
krull_nano_simple/
βββ krull/
β βββ __init__.py
β βββ model.py
β βββ tokenizer.py
βββ scripts/
β βββ train_tokenizer.py
β βββ train_lm.py
β βββ generate.py
β βββ export_onnx.py
βββ configs/
β βββ krull_nano.json
βββ data/
β βββ tiny_corpus.txt
βββ artifacts/
βββ requirements.txt
βββ LICENSE
βββ README.md
Setup on Windows
Open PowerShell or CMD:
cd C:\workspace\krull_nano_simple
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
Setup on Linux/macOS
cd krull_nano_simple
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
1. Train tokenizer
python scripts/train_tokenizer.py --input data/tiny_corpus.txt --out artifacts/tokenizer.json
2. Train model
python scripts/train_lm.py --config configs/krull_nano.json --tokenizer artifacts/tokenizer.json --data data/tiny_corpus.txt --out artifacts/krull_nano.pt --epochs 10 --device cpu
3. Generate text
python scripts/generate.py --model artifacts/krull_nano.pt --tokenizer artifacts/tokenizer.json --prompt "KRULL is" --device cpu
4. Export ONNX
python scripts/export_onnx.py --model artifacts/krull_nano.pt --out artifacts/krull_nano.onnx
Notes
This repo is for learning and experimentation. The default dataset is tiny, so the generated text will not be intelligent. Replace data/tiny_corpus.txt with a larger corpus to train a better model.
License
MIT