MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Paper • 2311.17049 • Published • 8
Apple's official Core ML export of MobileCLIP-S0 (apple/coreml-mobileclip),
re-packaged as .mlpackage.tar.gz archives for direct on-device download + compilation.
MobileCLIPImageEncoder.mlpackage.tar.gz — image encoder (256×256 RGB → 512-dim embedding, output final_emb_1)MobileCLIPTextEncoder.mlpackage.tar.gz — text encoder (1×77 int32 CLIP BPE tokens, zero-padded → 512-dim embedding)Weights are unchanged from Apple's release and remain under the Apple Sample Code License.
Reference: MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training (CVPR 2024).