Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation
Paper • 2605.26111 • Published • 7
None defined yet.
Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers
MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models