| --- |
| license: apache-2.0 |
| language: |
| - en |
| base_model: |
| - apple/MobileCLIP2-S4 |
| - apple/MobileCLIP2-S2 |
| pipeline_tag: image-text-to-text |
| tags: |
| - MobileCLIP |
| - MobileCLIP2 |
| - CLIP |
| - Classification |
| --- |
| |
| # MobileCLIP2 |
|
|
| The following versions of MobileCLIP2 have been converted to run on the Axera NPU using w8a16 quantization. Compatible with Pulsar2 version: 4.2 |
| - MobileCLIP2-S2 |
| - MobileCLIP2-S4 |
|
|
| If you want to know how to convert the MobileCLIP2 model into an axmodel that can run on the axera npu board, please read [this link](https://github.com/AXERA-TECH/axera.ml-mobileclip) in detail. |
|
|
| ## Support Platform |
| - AX650 |
|
|
| ## End-of-board inference time |
| - MobileCLIP2-S2 |
| | Stage | Time | |
| |------|------| |
| | image encoder | 19.146 ms | |
| | text encoder | 5.675 ms | |
|
|
| - MobileCLIP2-S4 |
| | Stage | Time | |
| |------|------| |
| | image encoder | 65.328 ms | |
| | text encoder | 12.663 ms | |
|
|
|
|
| ## How to use |
|
|
| Download all files from this repository to the device |
|
|
| Run the following command: |
| ```bash |
| python3 run_axmodel.py -ie ./mobileclip2_s4_image_encoder.axmodel -te ./mobileclip2_s4_text_encoder.axmodel -i ./zebra.jpg -t "a zebra" "a dog" "two zebras" |
| ``` |
|
|
| Model input and output examples are as follows: |
| 1. the image you want to input: |
|
|
|  |
| |
| 3. The description of the text you want to categorize: |
| |
| ["a zebra", "a dog", "two zebras"] |
| |
| 4. Model output class confidence scores: |
|
|
| Label probs: [[6.095444e-02 5.628616e-14 9.390456e-01]] |
| |