view article Article SmolVLM - small yet mighty Vision Language Model +3 andito, merve, mfarre, eliebak, pcuenq • Nov 26, 2024 • 418
view article Article SmolVLM2: Bringing Video Understanding to Every Device +5 orrzohar, mfarre, andito, merve, pcuenq, cyrilzakka, Xenova • Feb 20, 2025 • 340
view article Article PaliGemma – Google's Cutting-Edge Open Vision Language Model +1 merve, andsteing, pcuenq • May 14, 2024 • 287
Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 • 6 items • Updated Mar 2 • 168
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch +5 ariG23498, lusxvr, andito, sergiopaniego, merve, pcuenq, reach-vb • May 21, 2025 • 258
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published Apr 8, 2025 • 186