Instructions to use TerminatorPower/ezellm-lite-tokenizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TerminatorPower/ezellm-lite-tokenizer with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TerminatorPower/ezellm-lite-tokenizer")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("TerminatorPower/ezellm-lite-tokenizer", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use TerminatorPower/ezellm-lite-tokenizer with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TerminatorPower/ezellm-lite-tokenizer"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TerminatorPower/ezellm-lite-tokenizer",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/TerminatorPower/ezellm-lite-tokenizer

SGLang

How to use TerminatorPower/ezellm-lite-tokenizer with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "TerminatorPower/ezellm-lite-tokenizer" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TerminatorPower/ezellm-lite-tokenizer",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "TerminatorPower/ezellm-lite-tokenizer" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TerminatorPower/ezellm-lite-tokenizer",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use TerminatorPower/ezellm-lite-tokenizer with Docker Model Runner:
```
docker model run hf.co/TerminatorPower/ezellm-lite-tokenizer
```

ezellm-lite-tokenizer

A 24,600-vocab byte-level BPE tokenizer trained on a 142 GB code-heavy corpus, with a Qwen2.5-Coder-style special-token layout (FIM + repo metadata + reserved slots). Designed for small-to-mid scale code language models where embedding-table size matters.

This is the v2 of the tokenizer.

Quick start

from transformers import AutoTokenizer

tok = AutoTokenizer.from_pretrained("TerminatorPower/ezellm-lite-tokenizer")

ids = tok.encode("def hello(name):\n    print(f'Hello, {name}!')\n")
print(len(ids), tok.decode(ids))

Vocabulary layout

Total vocab: 24,600. The first 24,576 IDs are learned BPE merges; the top 24 IDs are reserved for control tokens.

ID range	Tokens	Purpose
0 – 24,575	Learned BPE pieces	Text/code
24,576	`<\|endoftext\|>`	EOS + PAD
24,577 – 24,580	`<\|fim_prefix\|>`, `<\|fim_middle\|>`, `<\|fim_suffix\|>`, `<\|fim_pad\|>`	Fill-in-the-Middle training
24,581 – 24,583	`<\|file_sep\|>`, `<\|repo_name\|>`, `<\|filename\|>`	Repo-level packing
24,584 – 24,599	`<\|reserved_0\|>` … `<\|reserved_15\|>`	Reserved for downstream use

Only <|endoftext|> is registered as eos_token / pad_token in special_tokens_map. The FIM and repo markers are added tokens but are not flagged "special" — tok.decode(ids, skip_special_tokens=True) will only strip <|endoftext|>. Register the others explicitly if you want them stripped on decode.

Training data

Size: ~142 GB of text and source code
Mix (heavily code-leaning): Python, JavaScript/TypeScript, Java, C/C++, web (HTML/CSS), Markdown, math/scientific Python, technical prose
Algorithm: Byte-level BPE (tiktoken-compatible; tiktoken.bpe and tiktoken.json are bundled alongside the standard tokenizer.json)

The tokenizer files in this repo can be loaded both via 🤗 transformers (tokenizer.json) and via tiktoken directly (tiktoken.bpe).

Benchmarks

Compared against size-matched, code-trained tokenizers in the 24K–50K vocab range. Measured on a held-out multi-domain corpus (~618 KB across 10 categories: C, C++, Java, JavaScript, Markdown, math/Python, prose, general Python, HTML/CSS, raw outputs).

Aggregate compression

Higher chars/token = better compression = shorter context for the same input.

Tokenizer	Vocab	chars/token	tokens / 1k chars
StarCoder2	49,152	3.238	308.9
ezellm-lite	24,600	3.081	324.6
CodeGen-mono	50,295	3.017	331.5
DeepSeek-Coder	32,022	2.836	352.6
GPT-2	50,257	2.487	402.1

ezellm-lite is the second-best tokenizer in this group — within ~5% of StarCoder2 despite having half the vocabulary, ahead of CodeGen-mono and DeepSeek-Coder which both have 30–100% more vocab slots, and ~24% more compressive than GPT-2.

Compression by category — characters per token

Category	ezellm-lite (24.6K)	StarCoder2 (49K)	CodeGen-mono (50K)	DeepSeek-Coder (32K)	GPT-2 (50K)
c	2.839	2.900	2.630	2.534	2.470
cpp	3.157	3.303	2.914	2.793	2.289
java	3.996	4.517	3.606	3.605	2.329
javascript	3.142	3.423	2.988	2.898	2.357
markdown_docs	3.117	3.272	3.125	2.906	2.986
math_python	2.630	2.695	2.482	2.387	2.136
prose	3.661	3.731	4.356	3.855	4.356
python_general	3.680	3.747	3.249	3.169	2.586
web_html_css	2.673	2.897	2.831	2.543	2.289

Reading the table. ezellm-lite either wins or comes within a few percent of the leader on every code category, beating CodeGen-mono and DeepSeek-Coder consistently on Python, JavaScript, Java, and C++. StarCoder2 edges it out on most categories (expected — 2× the vocab, trained on The Stack v2). The one place ezellm-lite clearly trails is prose, where the prose-heavy GPT-2 vocabulary still wins — a deliberate trade for a code-focused tokenizer.

Efficiency per vocabulary slot

A 24K vocab tokenizer is at a structural disadvantage on raw chars/token. A fairer cross-vocab metric is chars/token × log₂(vocab) — roughly "input bits carried per token."

Tokenizer	Vocab	chars/tok	chars/tok × log₂(V)
StarCoder2	49,152	3.238	50.46
CodeGen-mono	50,295	3.017	47.12
ezellm-lite	24,600	3.081	44.94
DeepSeek-Coder	32,022	2.836	42.45
GPT-2	50,257	2.487	38.84

By this measure ezellm-lite sits above CodeGen-mono on raw compression while using less than half the embedding parameters, and clearly above DeepSeek-Coder and GPT-2.

At d_model=1024, the embedding/output table sizes are: ezellm-lite 25M, DeepSeek-Coder 33M, StarCoder2 / CodeGen-mono / GPT-2 ~50M. For a small code LM (≤2B params) those tens of millions of softmax parameters are real budget.

Encoding speed

All five tokenizers are sub-millisecond on the 60–90 KB sample shards in this benchmark; encoding speed is not the bottleneck for any of them.

Files

File	Purpose
`tokenizer.json`	🤗 Tokenizers / `transformers`-loadable model
`tokenizer_config.json`	Special-token metadata for `transformers`
`tiktoken.bpe`	tiktoken-format merge table
`tiktoken.json`	tiktoken metadata (pattern, special tokens)

Intended use

Pretraining / fine-tuning small-to-mid code LMs (≤2B params) where vocabulary size dominates parameter count
FIM-style training out of the box (FIM specials are pre-allocated)
Repo-aware packing using <|repo_name|>, <|filename|>, <|file_sep|>

Limitations

Not optimized for non-English natural language; the corpus is English + code.
Compression on dense punctuation languages (C, HTML/CSS) is noticeably tighter than for Python/prose — budget context length accordingly.
The 16 reserved slots are unused — they will never appear in trained text and need to be registered explicitly if you want to repurpose them (e.g. chat-template tags).

License

Apache-2.0.

Downloads last month: -; Downloads are not tracked for this model. How to track