Instructions to use LiquidAI/LFM2.5-8B-A1B-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LiquidAI/LFM2.5-8B-A1B-Base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LiquidAI/LFM2.5-8B-A1B-Base") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2.5-8B-A1B-Base") model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2.5-8B-A1B-Base") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use LiquidAI/LFM2.5-8B-A1B-Base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LiquidAI/LFM2.5-8B-A1B-Base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-8B-A1B-Base", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/LiquidAI/LFM2.5-8B-A1B-Base
- SGLang
How to use LiquidAI/LFM2.5-8B-A1B-Base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2.5-8B-A1B-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-8B-A1B-Base", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2.5-8B-A1B-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-8B-A1B-Base", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use LiquidAI/LFM2.5-8B-A1B-Base with Docker Model Runner:
docker model run hf.co/LiquidAI/LFM2.5-8B-A1B-Base
LFM2.5-8B-A1B-Base
LFM2.5 is a new family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-training and reinforcement learning.
- On-device personal assistant: Designed to power real-life applications, chaining tool calls, and following complex instructions on all devices.
- Compressed performance: Competitive with much larger dense and MoE models on instruction following and agentic tasks.
- Unmatched throughput: Fastest in its size class on both CPU and GPU inference, with day-one support for llama.cpp, MLX, vLLM, and SGLang.
Find more information about LFM2.5-8B-A1B in our blog post.
*AA-Omniscience Index (higher is better) rewards correct answers and penalizes hallucinations. Scores range from -100 to 100. See more results on Artificial Analysis.
🗒️ Model Details
| Model | Parameters | Description |
|---|---|---|
| LFM2.5-8B-A1B-Base | 8.3B total / 1.5B active | Pre-trained base model for fine-tuning |
| LFM2.5-8B-A1B | 8.3B total / 1.5B active | Reasoning-tuned general-purpose model |
LFM2.5-8B-A1B is a general-purpose text-only model with the following features:
- Total parameters: 8.3B
- Active parameters: 1.5B
- Number of layers: 24 (18 double-gated LIV conv + 6 GQA)
- Training budget: 38 trillion tokens
- Context length: 131,072
- Vocabulary size: 128,000
- Languages: English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, Spanish
- Generation parameters: We recommend the following parameters:
temperature: 0.2top_p: 80repetition_penalty: 1.05
| Model | Description |
|---|---|
| LFM2.5-8B-A1B | Original model checkpoint in native format. Best for fine-tuning or inference with Transformers, vLLM, and SGLang. |
| LFM2.5-8B-A1B-GGUF | Quantized format for llama.cpp and compatible tools. Optimized for edge inference and local deployment. |
| LFM2.5-8B-A1B-ONNX | ONNX Runtime format for cross-platform deployment. |
| LFM2.5-8B-A1B-MLX | MLX format for Apple Silicon. Optimized for fast inference on Mac devices. |
We recommend using LFM2.5-8B-A1B for agentic workflows, tool use, structured outputs, multilingual assistants, and on-device personal-assistant applications. It is not the best fit for heavy programming or knowledge-intensive question answering without retrieval.
🏃 Inference
LFM2.5-8B-A1B is supported by many inference frameworks. See the Inference documentation for the full list.
| Name | Description | Docs | Notebook |
|---|---|---|---|
| Transformers | Simple inference with direct access to model internals. | Link | ![]() |
| vLLM | High-throughput production deployments with GPU. | Link | ![]() |
| llama.cpp | Cross-platform inference with CPU offloading. | Link | ![]() |
| MLX | Apple's machine learning framework optimized for Apple Silicon. | Link | — |
| LM Studio | Desktop application for running LLMs locally. | Link | — |
Quick start with Transformers (compatible with transformers>=5.0.0):
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
model_id = "LiquidAI/LFM2.5-8B-A1B-Base"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
dtype="bfloat16",
# attn_implementation="flash_attention_2" <- uncomment on compatible GPU
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
prompt = "What is C. elegans?"
input_ids = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
add_generation_prompt=True,
return_tensors="pt",
tokenize=True,
).to(model.device)
output = model.generate(
input_ids,
do_sample=True,
temperature=0.2,
top_k=80,
repetition_penalty=1.05,
max_new_tokens=8192,
streamer=streamer,
)
🔧 Fine-Tuning
We recommend fine-tuning LFM2.5 for your specific use case to achieve the best results.
| Name | Description | Docs | Notebook |
|---|---|---|---|
| CPT (Unsloth) | Continued Pre-Training using Unsloth for text completion. | Link | ![]() |
| CPT (Unsloth) | Continued Pre-Training using Unsloth for translation. | Link | ![]() |
| SFT (Unsloth) | Supervised Fine-Tuning with LoRA using Unsloth. | Link | ![]() |
| SFT (TRL) | Supervised Fine-Tuning with LoRA using TRL. | Link | ![]() |
| DPO (TRL) | Direct Preference Optimization with LoRA using TRL. | Link | ![]() |
| GRPO (Unsloth) | GRPO with LoRA using Unsloth. | Link | ![]() |
| GRPO (TRL) | GRPO with LoRA using TRL. | Link | ![]() |
📬 Contact
- Got questions or want to connect? Join our Discord community.
- If you are interested in custom solutions with edge deployment, please contact our sales team.
Citation
@article{liquidAI20268BA1B,
author = {Liquid AI},
title = {LFM2.5-8B-A1B: Personal Assistant On Your Laptop},
journal = {Liquid AI Blog},
year = {2026},
note = {www.liquid.ai/blog/lfm2-5-8b-a1b},
}
@article{liquidai2025lfm2,
title = {LFM2 Technical Report},
author = {Liquid AI},
journal = {arXiv preprint arXiv:2511.23404},
year = {2025}
}
- Downloads last month
- 267

