-
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving
Paper • 2311.05332 • Published • 11 -
SoundCam: A Dataset for Finding Humans Using Room Acoustics
Paper • 2311.03517 • Published • 14
Chaolei Tan
Chaolei
·
AI & ML interests
Computer Vision, Multimodal Learning, Video Understanding
Recent Activity
liked a model 19 days ago
facebook/sam3 liked a dataset 2 months ago
Video-Reason/VBVR-Dataset liked a Space 2 months ago
Qwen/Qwen3-VL-DemoOrganizations
None yet