SkVM: Compiling Skills for Efficient Execution Everywhere Paper • 2604.03088 • Published 28 days ago • 10
GigaWorld-Policy: An Efficient Action-Centered World--Action Model Paper • 2603.17240 • Published Mar 18 • 26
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published 7 days ago • 115
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published 7 days ago • 66
ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning Paper • 2604.24300 • Published 7 days ago • 64
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond Paper • 2604.22748 • Published 10 days ago • 221
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications Paper • 2503.07137 • Published Mar 10, 2025 • 2
AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model Paper • 2604.19747 • Published 13 days ago • 38
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published 12 days ago • 239
Bitnet.cpp: Efficient Edge Inference for Ternary LLMs Paper • 2502.11880 • Published Feb 17, 2025 • 18
PyTorch Distributed: Experiences on Accelerating Data Parallel Training Paper • 2006.15704 • Published Jun 28, 2020 • 8
Multi-module GRPO: Composing Policy Gradients and Prompt Optimization for Language Model Programs Paper • 2508.04660 • Published Aug 6, 2025 • 3
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation Paper • 2604.19636 • Published 13 days ago • 87
MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping Paper • 2604.08364 • Published 25 days ago • 100