Future Optical Flow Prediction Improves Robot Control & Video Generation Paper • 2601.10781 • Published Jan 15 • 19
Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding Paper • 2512.05774 • Published Dec 5, 2025 • 7
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models Paper • 2507.12806 • Published Jul 17, 2025 • 21
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper • 2505.09568 • Published May 14, 2025 • 99
Salesforce/xgen-mm-vid-phi3-mini-r-v1.5-32tokens-8frames Image-Text-to-Text • 4B • Updated Feb 3, 2025 • 78 • 3
Salesforce/xgen-mm-vid-phi3-mini-r-v1.5-128tokens-8frames Image-Text-to-Text • 4B • Updated Feb 3, 2025 • 101 • 11
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs Paper • 2410.16267 • Published Oct 21, 2024 • 18
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations Paper • 2408.12590 • Published Aug 22, 2024 • 35
Salesforce/xgen-mm-phi3-mini-instruct-singleimg-r-v1.5 Image-Text-to-Text • 4B • Updated Feb 3, 2025 • 90 • 15
Salesforce/xgen-mm-phi3-mini-instruct-dpo-r-v1.5 Image-Text-to-Text • 4B • Updated Feb 3, 2025 • 53 • 19
Salesforce/xgen-mm-phi3-mini-instruct-interleave-r-v1.5 Image-Text-to-Text • Updated Feb 3, 2025 • 772 • 59
XGen-MM-1 models and datasets Collection A collection of all XGen-MM (Foundation LMM) models! • 15 items • Updated Mar 2 • 40
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems Paper • 2407.01370 • Published Jul 1, 2024 • 89