Involution: Inverting the Inherence of Convolution for Visual Recognition Paper • 2103.06255 • Published Mar 10, 2021
Avalanche: an End-to-End Library for Continual Learning Paper • 2104.00405 • Published Apr 1, 2021 • 2
MC-LLaVA: Multi-Concept Personalized Vision-Language Model Paper • 2411.11706 • Published Nov 18, 2024 • 1
[CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster Paper • 2412.01818 • Published Dec 2, 2024 • 2
ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance Paper • 2412.06163 • Published Dec 9, 2024
TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding Paper • 2504.01407 • Published Apr 2, 2025 • 1
BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models Paper • 2509.06040 • Published Sep 7, 2025 • 1
UniEdit-I: Training-free Image Editing for Unified VLM via Iterative Understanding, Editing and Verifying Paper • 2508.03142 • Published Aug 5, 2025 • 1
On the Faithfulness of Visual Thinking: Measurement and Enhancement Paper • 2510.23482 • Published Oct 27, 2025
FastInit: Fast Noise Initialization for Temporally Consistent Video Generation Paper • 2506.16119 • Published Jun 19, 2025
Background-aware Classification Activation Map for Weakly Supervised Object Localization Paper • 2112.14379 • Published Dec 29, 2021
TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning Paper • 2511.05489 • Published Nov 7, 2025 • 3
MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and Generation Paper • 2511.18262 • Published Nov 23, 2025 • 1
CodeDance: A Dynamic Tool-integrated MLLM for Executable Visual Reasoning Paper • 2512.17312 • Published Dec 19, 2025 • 2
Are We Ready for Service Robots? The OpenLORIS-Scene Datasets for Lifelong SLAM Paper • 1911.05603 • Published Nov 13, 2019
ThinkGen: Generalized Thinking for Visual Generation Paper • 2512.23568 • Published Dec 29, 2025 • 1
ChainV: Atomic Visual Hints Make Multimodal Reasoning Shorter and Better Paper • 2511.17106 • Published Nov 21, 2025
Video-KTR: Reinforcing Video Reasoning via Key Token Attribution Paper • 2601.19686 • Published Jan 27