VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization
Paper • 2606.02564 • Published • 20
ARC mainly focuses on areas of computer vision, speech, and natural language processing, including speech/video generation, enhancement, retrieval, understanding, AutoML, etc. Considering research developments and industry trends, ARC consistently pursues exploration, innovation, and breakthroughs in technologies.
Pixal3D: Pixel-Aligned 3D Generation from Images
OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Video