LFM2-350M-a16w4: Optimized for SiMa.ai Modalix

Overview

This repository contains the LFM2-350M-a16w4 model, optimized and compiled for the SiMa.ai Modalix platform.

  • Model Architecture: LFM2 (350M parameters)
  • Quantization: Hybrid
    • Prompt Processing: A16W8 (16-bit activations, 8-bit weights)
    • Token Generation: A16W4 (16-bit activations, 4-bit weights)
  • Maximum context length: 2048
  • Source Model: LiquidAI/LFM2-350M

Performance

Model Precision Device Token Length Response Rate (tokens/sec) Time To First Token (sec)
LFM2-350M A16W8/A16W4 Modalix 128 221.45 tokens/sec 0.02 sec
LFM2-350M A16W8/A16W4 Modalix 256 218.37 tokens/sec 0.02 sec
LFM2-350M A16W8/A16W4 Modalix 512 212.65 tokens/sec 0.04 sec
LFM2-350M A16W8/A16W4 Modalix 1024 201.29 tokens/sec 0.09 sec

Prerequisites

To run this model, you need:

  1. SiMa.ai Modalix Device
  2. SiMa.ai CLI: Installed on your Modalix device.
  3. SiMa.ai Neat Runtime: Install or update the Neat Library on Modalix. The LLiMa runtime is installed as part of the Neat runtime.
  4. Hugging Face CLI: Optional, for downloading the model on a host before copying it to Modalix.

Installation & Deployment

Follow these steps to deploy the model to your Modalix device.

1. Install or Update Neat Runtime

Note: This is a one-time setup. If the Neat Library is already installed on your Modalix device, you can skip this step and continue with model download.

Follow the SiMa.ai Neat getting started guide to install or update the Neat Library on your Modalix device.

The llima CLI is available on Modalix after the Neat runtime is installed. It manages precompiled GenAI models under /media/nvme/llima/models by default. Set LLIMA_MODELS_PATH to use a different model directory.

2. Download the Model

Download the compiled model assets from this repository directly to your device.

# Download the model to a local directory
llima pull LFM2-350M-a16w4

Alternatively, you can download the compiled model to a Host and copy it to the Modalix device:

hf download simaai/LFM2-350M-a16w4 --local-dir LFM2-350M-a16w4
scp -r LFM2-350M-a16w4 sima@<modalix-ip>:/media/nvme/llima/models/

Replace <modalix-ip> with the IP address of your Modalix device.

Expected Directory Structure:

/media/nvme/llima/
โ””โ”€โ”€ models/
    โ””โ”€โ”€ LFM2-350M-a16w4/   # The compiled model

Usage

Validate with LLiMa CLI

Run the model directly on Modalix:

llima run LFM2-350M-a16w4

For all runtime options, run:

llima run -h

GenAI Demo Application

The GenAI demo application is separate from LLiMa installation. Use the GenAI Multimodal Assistant page to install and run the demo app. Once installed, the demo app can use precompiled models such as this one.

API Usage

To serve this model with OpenAI- or Ollama-compatible APIs and send requests to it, use the GenAI server workflow in Serve GenAI Models.

For direct LLM calls without setting up a server, see Run an LLM.

Limitations

  • Quantization: This model is quantized (A16W4/A16W8) for optimal performance on embedded devices. While this maintains high accuracy, minor deviations from the full-precision model may occur.

Troubleshooting

  • sima-cli not found: Ensure that sima-cli is installed on your Modalix device.
  • llima not found: Install or update the Neat Library. See Getting Started.
  • Model can't be run: Verify the model directory is exactly inside /media/nvme/llima/models/ and not nested (e.g., /media/nvme/llima/models/LFM2-350M-a16w4/LFM2-350M-a16w4).
  • Permission Denied: Ensure you have read/write permissions for the /media/nvme directory.

Resources

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for simaai/LFM2-350M-a16w4

Finetuned
(60)
this model

Collection including simaai/LFM2-350M-a16w4