-
Continuous Latent Diffusion Language Model
Paper • 2605.06548 • Published • 85 -
Scaling Latent Reasoning via Looped Language Models
Paper • 2510.25741 • Published • 233 -
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 158 -
Pretraining Language Models to Ponder in Continuous Space
Paper • 2505.20674 • Published • 3
Collections
Discover the best community collections!
Collections including paper arxiv:2606.07082
-
Verifiable Rewards Beyond Math and Code: Lightweight Corpus-Grounded Process Supervision for Factual Question Answering
Paper • 2605.29648 • Published • 10 -
Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention
Paper • 2605.29548 • Published • 12 -
Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation
Paper • 2605.29861 • Published • 16 -
COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation
Paper • 2605.31264 • Published • 123
-
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
Paper • 2506.19697 • Published • 45 -
Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning
Paper • 2509.23873 • Published • 68 -
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Paper • 2510.00515 • Published • 41 -
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights
Paper • 2509.22944 • Published • 80
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning
Paper • 2605.28742 • Published • 4 -
Reinforcement Learning from Rich Feedback with Distributional DAgger
Paper • 2606.05152 • Published • 3 -
Entropy as a Structural Prior: How a Log-Barrier on DiT Belief Space Drives Musical Diversity and Development
Paper • 2606.07207 • Published • 4 -
Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses
Paper • 2606.08348 • Published • 14
-
Why Fine-Tuning Encourages Hallucinations and How to Fix It
Paper • 2604.15574 • Published • 25 -
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
Paper • 2604.24763 • Published • 71 -
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora
Paper • 2604.24819 • Published • 91 -
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
Paper • 2604.26752 • Published • 112
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 735 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 41 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 98 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 89
-
Continuous Latent Diffusion Language Model
Paper • 2605.06548 • Published • 85 -
Scaling Latent Reasoning via Looped Language Models
Paper • 2510.25741 • Published • 233 -
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 158 -
Pretraining Language Models to Ponder in Continuous Space
Paper • 2505.20674 • Published • 3
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning
Paper • 2605.28742 • Published • 4 -
Reinforcement Learning from Rich Feedback with Distributional DAgger
Paper • 2606.05152 • Published • 3 -
Entropy as a Structural Prior: How a Log-Barrier on DiT Belief Space Drives Musical Diversity and Development
Paper • 2606.07207 • Published • 4 -
Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses
Paper • 2606.08348 • Published • 14
-
Verifiable Rewards Beyond Math and Code: Lightweight Corpus-Grounded Process Supervision for Factual Question Answering
Paper • 2605.29648 • Published • 10 -
Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention
Paper • 2605.29548 • Published • 12 -
Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation
Paper • 2605.29861 • Published • 16 -
COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation
Paper • 2605.31264 • Published • 123
-
Why Fine-Tuning Encourages Hallucinations and How to Fix It
Paper • 2604.15574 • Published • 25 -
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
Paper • 2604.24763 • Published • 71 -
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora
Paper • 2604.24819 • Published • 91 -
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
Paper • 2604.26752 • Published • 112
-
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
Paper • 2506.19697 • Published • 45 -
Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning
Paper • 2509.23873 • Published • 68 -
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Paper • 2510.00515 • Published • 41 -
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights
Paper • 2509.22944 • Published • 80
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 735 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 41 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 98 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 89