LFM2-350M-a16w4: Optimized for SiMa.ai Modalix
Overview
This repository contains the LFM2-350M-a16w4 model, optimized and compiled for the SiMa.ai Modalix platform.
- Model Architecture: LFM2 (350M parameters)
- Quantization: Hybrid
- Prompt Processing: A16W8 (16-bit activations, 8-bit weights)
- Token Generation: A16W4 (16-bit activations, 4-bit weights)
- Maximum context length: 2048
- Source Model: LiquidAI/LFM2-350M
Performance
| Model | Precision | Device | Token Length | Response Rate (tokens/sec) | Time To First Token (sec) |
|---|---|---|---|---|---|
| LFM2-350M | A16W8/A16W4 | Modalix | 128 | 221.45 tokens/sec | 0.02 sec |
| LFM2-350M | A16W8/A16W4 | Modalix | 256 | 218.37 tokens/sec | 0.02 sec |
| LFM2-350M | A16W8/A16W4 | Modalix | 512 | 212.65 tokens/sec | 0.04 sec |
| LFM2-350M | A16W8/A16W4 | Modalix | 1024 | 201.29 tokens/sec | 0.09 sec |
Prerequisites
To run this model, you need:
- SiMa.ai Modalix Device
- SiMa.ai CLI: Installed on your Modalix device.
- SiMa.ai Neat Runtime: Install or update the Neat Library on Modalix. The LLiMa runtime is installed as part of the Neat runtime.
- Hugging Face CLI: Optional, for downloading the model on a host before copying it to Modalix.
Installation & Deployment
Follow these steps to deploy the model to your Modalix device.
1. Install or Update Neat Runtime
Note: This is a one-time setup. If the Neat Library is already installed on your Modalix device, you can skip this step and continue with model download.
Follow the SiMa.ai Neat getting started guide to install or update the Neat Library on your Modalix device.
The llima CLI is available on Modalix after the Neat runtime is installed. It manages precompiled GenAI models under /media/nvme/llima/models by default. Set LLIMA_MODELS_PATH to use a different model directory.
2. Download the Model
Download the compiled model assets from this repository directly to your device.
# Download the model to a local directory
llima pull LFM2-350M-a16w4
Alternatively, you can download the compiled model to a Host and copy it to the Modalix device:
hf download simaai/LFM2-350M-a16w4 --local-dir LFM2-350M-a16w4
scp -r LFM2-350M-a16w4 sima@<modalix-ip>:/media/nvme/llima/models/
Replace <modalix-ip> with the IP address of your Modalix device.
Expected Directory Structure:
/media/nvme/llima/
โโโ models/
โโโ LFM2-350M-a16w4/ # The compiled model
Usage
Validate with LLiMa CLI
Run the model directly on Modalix:
llima run LFM2-350M-a16w4
For all runtime options, run:
llima run -h
GenAI Demo Application
The GenAI demo application is separate from LLiMa installation. Use the GenAI Multimodal Assistant page to install and run the demo app. Once installed, the demo app can use precompiled models such as this one.
API Usage
To serve this model with OpenAI- or Ollama-compatible APIs and send requests to it, use the GenAI server workflow in Serve GenAI Models.
For direct LLM calls without setting up a server, see Run an LLM.
Limitations
- Quantization: This model is quantized (A16W4/A16W8) for optimal performance on embedded devices. While this maintains high accuracy, minor deviations from the full-precision model may occur.
Troubleshooting
sima-clinot found: Ensure thatsima-cliis installed on your Modalix device.llimanot found: Install or update the Neat Library. See Getting Started.- Model can't be run: Verify the model directory is exactly inside
/media/nvme/llima/models/and not nested (e.g.,/media/nvme/llima/models/LFM2-350M-a16w4/LFM2-350M-a16w4). - Permission Denied: Ensure you have read/write permissions for the
/media/nvmedirectory.
Resources
Model tree for simaai/LFM2-350M-a16w4
Base model
LiquidAI/LFM2-350M