LFM2-350M-a16w4: Optimized for SiMa.ai Modalix

Overview

This repository contains the LFM2-350M-a16w4 model, optimized and compiled for the SiMa.ai Modalix platform.

Model Architecture: LFM2 (350M parameters)
Quantization: Hybrid
- Prompt Processing: A16W8 (16-bit activations, 8-bit weights)
- Token Generation: A16W4 (16-bit activations, 4-bit weights)
Maximum context length: 2048
Source Model: LiquidAI/LFM2-350M

Performance

Model	Precision	Device	Token Length	Response Rate (tokens/sec)	Time To First Token (sec)
LFM2-350M	A16W8/A16W4	Modalix	128	221.45 tokens/sec	0.02 sec
LFM2-350M	A16W8/A16W4	Modalix	256	218.37 tokens/sec	0.02 sec
LFM2-350M	A16W8/A16W4	Modalix	512	212.65 tokens/sec	0.04 sec
LFM2-350M	A16W8/A16W4	Modalix	1024	201.29 tokens/sec	0.09 sec

Prerequisites

To run this model, you need:

SiMa.ai Modalix Device
SiMa.ai CLI: Installed on your Modalix device.
SiMa.ai Neat Runtime: Install or update the Neat Library on Modalix. The LLiMa runtime is installed as part of the Neat runtime.
Hugging Face CLI: Optional, for downloading the model on a host before copying it to Modalix.

Installation & Deployment

Follow these steps to deploy the model to your Modalix device.

1. Install or Update Neat Runtime

Note: This is a one-time setup. If the Neat Library is already installed on your Modalix device, you can skip this step and continue with model download.

Follow the SiMa.ai Neat getting started guide to install or update the Neat Library on your Modalix device.

The llima CLI is available on Modalix after the Neat runtime is installed. It manages precompiled GenAI models under /media/nvme/llima/models by default. Set LLIMA_MODELS_PATH to use a different model directory.

2. Download the Model

Download the compiled model assets from this repository directly to your device.

# Download the model to a local directory
llima pull LFM2-350M-a16w4

Alternatively, you can download the compiled model to a Host and copy it to the Modalix device:

hf download simaai/LFM2-350M-a16w4 --local-dir LFM2-350M-a16w4
scp -r LFM2-350M-a16w4 sima@<modalix-ip>:/media/nvme/llima/models/

Replace <modalix-ip> with the IP address of your Modalix device.

Expected Directory Structure:

/media/nvme/llima/
└── models/
    └── LFM2-350M-a16w4/   # The compiled model

Usage

Validate with LLiMa CLI

Run the model directly on Modalix:

llima run LFM2-350M-a16w4

For all runtime options, run:

llima run -h

GenAI Demo Application

The GenAI demo application is separate from LLiMa installation. Use the GenAI Multimodal Assistant page to install and run the demo app. Once installed, the demo app can use precompiled models such as this one.

API Usage

To serve this model with OpenAI- or Ollama-compatible APIs and send requests to it, use the GenAI server workflow in Serve GenAI Models.

For direct LLM calls without setting up a server, see Run an LLM.

Limitations

Quantization: This model is quantized (A16W4/A16W8) for optimal performance on embedded devices. While this maintains high accuracy, minor deviations from the full-precision model may occur.

Troubleshooting

sima-cli not found: Ensure that sima-cli is installed on your Modalix device.
llima not found: Install or update the Neat Library. See Getting Started.
Model can't be run: Verify the model directory is exactly inside /media/nvme/llima/models/ and not nested (e.g., /media/nvme/llima/models/LFM2-350M-a16w4/LFM2-350M-a16w4).
Permission Denied: Ensure you have read/write permissions for the /media/nvme directory.

Resources

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for simaai/LFM2-350M-a16w4

Base model

LiquidAI/LFM2-350M

Finetuned

(60)

this model

Collection including simaai/LFM2-350M-a16w4

Large Language Models

Collection

Precompiled language models for on-device text generation. • 16 items • Updated Apr 26 • 1