-
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
Paper • 2602.24286 • Published • 98 -
FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference
Paper • 2505.22758 • Published • 1 -
Liger Kernel: Efficient Triton Kernels for LLM Training
Paper • 2410.10989 • Published • 3 -
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 38
Mattias Dürrmeier
mattduerrmeier
AI & ML interests
LLM Inference, faster and more efficient kernels, local inference
Recent Activity
new activity about 18 hours ago
deepseek-ai/DeepSeek-V4-Flash:Questions on MoE Hash Routing liked a model 4 days ago
deepseek-ai/DeepSeek-V4-Pro updated a collection 14 days ago
systemsOrganizations
None yet