AI & ML interests

Independent research lab. Agent infrastructure, evaluation, and the systems under LLMs. Correctness where the code executes. Receipts, not punditry.

Recent Activity

dipankarsarkar  updated a collection 3 days ago
Quality of Meaning
dipankarsarkar  published a Space 3 days ago
skelfresearch/mpl-qom
dipankarsarkar  updated a Space 3 days ago
skelfresearch/mpl-qom
View all activity

dipankarsarkar 
published a Space 3 days ago
dipankarsarkar 
posted an update 3 days ago
view post
Post
92
Your issue tracker is in the wrong place.

It lives on a server. Your code lives in git. So every time an agent picks up work it makes an API call, burns a token, fights a rate limit, and still cannot see what the other agent just did.

Move the issues into the repo. Append-only event log in git refs. Branches when you branch, merges when you merge, CRDT so two agents never conflict. No server, no database.

The coordination signal that PR-level telemetry misses lives before the pull request. The paper, and a live demo running the real tool:

Before the Pull Request: Mining Multi-Agent Coordination (2606.19616)
https://huggingface.co/spaces/neullabs/grite

If your agents share a repo, where does their shared state actually live right now?
  • 1 reply
·
dipankarsarkar 
authored 13 papers 4 days ago
dipankarsarkar 
posted an update 5 days ago
view post
Post
86
LLM-generated GPU kernels pass the standard correctness test and are still wrong.

The industry oracle is one line: torch.allclose at one shape, one dtype, one seed. Every modern kernel benchmark uses it. It is blind to whole bug classes.

So I built the receipts:
- a 26-op corpus of correct and LLM-buggy kernels
- a differential fuzz vs an fp64 reference that catches what allclose misses
- a live demo you can click

The Correctness Illusion in LLM-Generated GPU Kernels (2606.20128)
dipankarsarkar/gpuemu-corpus
dipankarsarkar/the-correctness-illusion

What is your teams actual correctness oracle for generated kernels?