Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion Paper • 2606.15236 • Published 19 days ago • 22
Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion Paper • 2606.15236 • Published 19 days ago • 22
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture Paper • 2605.12500 • Published May 12 • 194
From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published May 27 • 75
Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation Paper • 2603.16669 • Published Mar 17 • 70
HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions Paper • 2603.15612 • Published Mar 16 • 153
ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors Paper • 2603.04338 • Published Mar 4 • 24
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding Paper • 2512.19693 • Published Dec 22, 2025 • 68
GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography Paper • 2504.07083 • Published Apr 9, 2025 • 22
Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency Paper • 2503.20785 • Published Mar 26, 2025 • 22
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness Paper • 2503.21755 • Published Mar 27, 2025 • 33
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models Paper • 2501.08453 • Published Jan 14, 2025 • 1
CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models Paper • 2503.18886 • Published Mar 24, 2025 • 24
REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding Paper • 2503.07413 • Published Mar 10, 2025 • 2
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey Paper • 2503.12605 • Published Mar 16, 2025 • 35