Inst-IT

university

https://inst-it.github.io/

AI & ML interests

Large Multimodal Models

Recent Activity

wjpoom authored a paper 5 days ago

Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification

menglc authored a paper 5 months ago

Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection

menglc authored a paper 5 months ago

CoMP: Continual Multimodal Pre-training for Vision Foundation Models

View all activity

authored a paper 5 days ago

Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification

Paper • 2606.18249 • Published 11 days ago • 14

authored 3 papers 5 months ago

Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection

Paper • 2412.17800 • Published Dec 23, 2024

CoMP: Continual Multimodal Pre-training for Vision Foundation Models

Paper • 2503.18931 • Published Mar 24, 2025 • 31

Qwen3-VL Technical Report

Paper • 2511.21631 • Published Nov 26, 2025 • 163

authored a paper over 1 year ago

CoMP: Continual Multimodal Pre-training for Vision Foundation Models

Paper • 2503.18931 • Published Mar 24, 2025 • 31

updated 2 datasets over 1 year ago

Inst-IT/Inst-It-Bench

Viewer • Updated Mar 3, 2025 • 4.07k • 93 • 1

Inst-IT/Inst-It-Dataset

Viewer • Updated Mar 1, 2025 • 72.5k • 122 • 10

updated a Space over 1 year ago

README

Boosting Multimodal Understanding at Instance-Level

published a Space over 1 year ago

README

Boosting Multimodal Understanding at Instance-Level

updated a collection over 1 year ago

Inst-IT Models

A series of LMMs finetuned with the Inst-IT Dataset, skilled in fine-grained image/video understanding at the instance-level. • 2 items • Updated Mar 17, 2025

updated 2 models over 1 year ago

Inst-IT/LLaVA-Next-Inst-It-Qwen2-7B

Video-Text-to-Text • 8B • Updated Feb 21, 2025 • 11 • 3

Inst-IT/LLaVA-Next-Inst-It-Vicuna-7B

Video-Text-to-Text • 7B • Updated Feb 20, 2025 • 13 • 2

authored a paper over 1 year ago

Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning

Paper • 2412.03565 • Published Dec 4, 2024 • 10

in Inst-IT/LLaVA-Next-Inst-It-Qwen2-7B over 1 year ago

Improve model card, add link to paper

#1 opened over 1 year ago by

authored 2 papers over 1 year ago

Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning

Paper • 2412.03565 • Published Dec 4, 2024 • 10

Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding

Paper • 2312.00081 • Published Nov 30, 2023 • 2

updated 2 models over 1 year ago

Inst-IT/LLaVA-Next-Inst-It-Qwen2-7B

Video-Text-to-Text • 8B • Updated Feb 21, 2025 • 11 • 3

Inst-IT/LLaVA-Next-Inst-It-Vicuna-7B

Video-Text-to-Text • 7B • Updated Feb 20, 2025 • 13 • 2

authored a paper almost 2 years ago

SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation

Paper • 2311.14671 • Published Nov 24, 2023