Interleaving Reasoning for Better Text-to-Image Generation Paper • 2509.06945 • Published Sep 8, 2025 • 16
ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping Paper • 2510.08457 • Published Oct 9, 2025 • 14
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions Paper • 2308.09936 • Published Aug 19, 2023 • 1
Matryoshka Query Transformer for Large Vision-Language Models Paper • 2405.19315 • Published May 29, 2024 • 1
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models Paper • 2410.08182 • Published Oct 10, 2024