Instructions to use UniversalComputingResearch/Atom3.4m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use UniversalComputingResearch/Atom3.4m with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="UniversalComputingResearch/Atom3.4m", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("UniversalComputingResearch/Atom3.4m", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use UniversalComputingResearch/Atom3.4m with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "UniversalComputingResearch/Atom3.4m"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "UniversalComputingResearch/Atom3.4m",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/UniversalComputingResearch/Atom3.4m

SGLang

How to use UniversalComputingResearch/Atom3.4m with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "UniversalComputingResearch/Atom3.4m" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "UniversalComputingResearch/Atom3.4m",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "UniversalComputingResearch/Atom3.4m" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "UniversalComputingResearch/Atom3.4m",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use UniversalComputingResearch/Atom3.4m with Docker Model Runner:
```
docker model run hf.co/UniversalComputingResearch/Atom3.4m
```

Atom3.4m / README.md

Maksymilian

Upload folder using huggingface_hub

bdb11fe verified 3 days ago

preview code

Raw

History Blame Contribute Delete

4.29 kB

	---
	license: apache-2.0
	datasets:
	- HuggingFaceFW/fineweb-edu
	- openbmb/Ultra-FineWeb
	- HuggingFaceTB/finemath
	- HuggingFaceTB/smollm-corpus
	- openbmb/UltraData-Math
	language:
	- en
	library_name: transformers
	tags:
	- causal-lm
	- decoder-only
	- grouped-query-attention
	- rope
	- swiglu
	- custom-tokenizer
	- curriculum-learning
	- xsa
	pipeline_tag: text-generation
	---

	![bg](bg.png)
	# Atom 3.4m

	Atom is a 3.4M parameter causal language model developed by Universal Computing Research. It was pretrained from scratch as a compact research model for studying language-model architecture, data curricula, and small-model benchmarking.

	## Model details

	- Architecture: causal decoder-only language model
	- Parameters: 3,412,800
	- Layers: 7
	- Hidden size: 192
	- Attention: 3 query heads and 1 key-value head (grouped-query attention)
	- Head dimension: 64
	- Feed-forward size: 480
	- Context length: 512 tokens
	- Positional encoding: rotary position embeddings (RoPE)
	- RoPE Theta = 5000.0
	- Normalization: RMSNorm
	- Activation: gated SiLU feed-forward network
	- Vocabulary size: 4,096 tokens
	- Tokenizer: custom byte-level BPE, exposed as `GPT2TokenizerFast`
	- Training tokens: approximately 5 billion
	- License: Apache-2.0

	The model uses tied input and output embeddings. Its custom attention implementation combines grouped-query attention with XSE.

	## Tokenizer

	Atom uses a custom byte-level BPE tokenizer trained specifically for this pretraining corpus. The tokenizer has a vocabulary of 4,096 tokens and includes dedicated padding, beginning-of-sequence, end-of-sequence, unknown, and end-of-text tokens.

	## Training data and curriculum

	Atom was trained on a curriculum combining general web text, educational material, synthetic textbook-style content, and mathematical data. The mixture changed gradually during training: general web data was emphasized earlier, while educational, synthetic, and mathematical material received more weight later.

	Approximate proportions over the complete training run were:

	\| Dataset \| Subset / split used \| Approximate proportion \|
	\|---\|---\|---:\|
	\| [HuggingFaceFW/fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) \| All available `CC-MAIN-*` configurations under `data/`, `train` split \| 39% \|
	\| [openbmb/Ultra-FineWeb](https://huggingface.co/datasets/openbmb/Ultra-FineWeb) \| English v1.4 (`ultrafineweb_en_v1_4`; `en` split) \| 31% \|
	\| [HuggingFaceTB/finemath](https://huggingface.co/datasets/HuggingFaceTB/finemath) \| `finemath-3plus`, `train` split \| 12% \|
	\| [HuggingFaceTB/smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) \| `cosmopedia-v2`, `train` split \| 12% \|
	\| [openbmb/UltraData-Math](https://huggingface.co/datasets/openbmb/UltraData-Math) \| `UltraData-Math-L2-preview`, `train` split \| 6% \|

	These percentages describe the approximate aggregate sampling mixture rather than exact document counts. Refer to the individual dataset cards for their source information, licenses, and usage conditions.

	## Intended use

	This is a small base language model intended for research and benchmarking. It may be useful for experiments involving compact architectures, pretraining curricula, tokenization, evaluation pipelines, and resource-constrained inference.

	Atom is a base model and has not been instruction-tuned or aligned for assistant-style interaction.

	## Evaluation

	Atom was evaluated with EleutherAI's `lm-evaluation-harness` and ArithMark-2.0.

	### lm-evaluation-harness

	\| Task \| Metric \| Score \|
	\|---\|---\|---:\|
	\| ARC-Easy \| `acc_norm` \| 33.08% \|
	\| ARC-Challenge \| `acc_norm` \| 21.76% \|
	\| HellaSwag \| `acc_norm` \| 27.65% \|
	\| PIQA \| `acc_norm` \| 55.71% \|

	### ArithMark-2.0

	\| Benchmark \| Metric \| Score \|
	\|---\|---\|---:\|
	\| ArithMark-2.0 \| `acc` \| 27.36% \|

	Average score: 34.54%


	## Limitations

	Atom is a very small model and should not be expected to produce reliable factual, safety-critical, or instruction-following outputs. Its short context window and limited capacity constrain coherence, knowledge recall, reasoning, and long-form generation.

	The model may reproduce errors, biases, or undesirable patterns present in its training data. It has not undergone dedicated safety training and should not be used for high-stakes decisions.