STALE: Can LLM Agents Know When Their Memories Are No Longer Valid? Paper • 2605.06527 • Published 11 days ago • 39
WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation Paper • 2605.10912 • Published 7 days ago • 40
MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models Paper • 2605.14906 • Published 4 days ago • 68
MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models Paper • 2605.14906 • Published 4 days ago • 68
STALE: Can LLM Agents Know When Their Memories Are No Longer Valid? Paper • 2605.06527 • Published 11 days ago • 39
MMProLong Collection A 7B LVLM with 128K context window and 512K generalization through long-context continued pre-training • 1 item • Updated 2 days ago
CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration? Paper • 2510.24505 • Published Oct 28, 2025 • 4
Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context Paper • 2605.13831 • Published 5 days ago • 82
Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context Paper • 2605.13831 • Published 5 days ago • 82
Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context Paper • 2605.13831 • Published 5 days ago • 82
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions Paper • 2505.00675 • Published May 1, 2025 • 3
AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph Paper • 2311.09174 • Published Nov 15, 2023
AbsInstruct: Eliciting Abstraction Ability from LLMs through Explanation Tuning with Plausibility Estimation Paper • 2402.10646 • Published Feb 16, 2024
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers Paper • 2509.03059 • Published Sep 3, 2025 • 25
NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents Paper • 2510.07172 • Published Oct 8, 2025 • 28
Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents Paper • 2512.20092 • Published Dec 23, 2025 • 9
Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents Paper • 2512.20092 • Published Dec 23, 2025 • 9
Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents Paper • 2512.20092 • Published Dec 23, 2025 • 9