Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification Paper • 2606.18249 • Published 11 days ago • 14
Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection Paper • 2412.17800 • Published Dec 23, 2024
CoMP: Continual Multimodal Pre-training for Vision Foundation Models Paper • 2503.18931 • Published Mar 24, 2025 • 31
CoMP: Continual Multimodal Pre-training for Vision Foundation Models Paper • 2503.18931 • Published Mar 24, 2025 • 31
Inst-IT Models Collection A series of LMMs finetuned with the Inst-IT Dataset, skilled in fine-grained image/video understanding at the instance-level. • 2 items • Updated Mar 17, 2025
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning Paper • 2412.03565 • Published Dec 4, 2024 • 10
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning Paper • 2412.03565 • Published Dec 4, 2024 • 10
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding Paper • 2312.00081 • Published Nov 30, 2023 • 2
SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation Paper • 2311.14671 • Published Nov 24, 2023