A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents Paper • 2602.08964 • Published Feb 9 • 1
Substance Beats Style: Why Beginning Students Fail to Code with LLMs Paper • 2410.19792 • Published Oct 15, 2024
AgentPack: A Dataset of Code Changes, Co-Authored by Agents and Humans Paper • 2509.21891 • Published Sep 26, 2025
QE4PE: Word-level Quality Estimation for Human Post-Editing Paper • 2503.03044 • Published Mar 4, 2025 • 6
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models Paper • 2502.01584 • Published Feb 3, 2025 • 9
SliderSpace: Decomposing the Visual Capabilities of Diffusion Models Paper • 2502.01639 • Published Feb 3, 2025 • 26
Art-Free Generative Models: Art Creation Without Graphic Art Knowledge Paper • 2412.00176 • Published Nov 29, 2024 • 9
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models Paper • 2408.00113 • Published Jul 31, 2024 • 8
Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses Paper • 2408.00584 • Published Aug 1, 2024 • 6
NNsight and NDIF: Democratizing Access to Foundation Model Internals Paper • 2407.14561 • Published Jul 18, 2024 • 35
NNsight and NDIF: Democratizing Access to Foundation Model Internals Paper • 2407.14561 • Published Jul 18, 2024 • 35
NNsight and NDIF: Democratizing Access to Foundation Model Internals Paper • 2407.14561 • Published Jul 18, 2024 • 35