dipankarsarkar (Dipankar Sarkar)

upvoted 3 papers about 12 hours ago

EvoPolicyGym: Evaluating Autonomous Policy Evolution in Interactive Environments

Paper • 2607.02440 • Published 2 days ago • 39

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

Paper • 2607.02512 • Published 2 days ago • 48

AgenticSTS: A Bounded-Memory Testbed for Long-Horizon LLM Agents

Paper • 2607.02255 • Published 2 days ago • 39

replied to SeaWolf-AI's post about 12 hours ago

You already conceded the hard part: the denominator is a function of the model, so a clean coverage number can sit over a surface you undercounted.

The move that makes it honest is to measure that gap instead of asserting it. You cannot see the true surface. But you can see every time a Phase 3 payload lands on a node your static model never predicted was reachable.

Call it a surprise rate. It is the empirical proxy for how wrong the denominator is. A low surprise rate earns the coverage number. A high one says the modeled surface is fiction and the percentage with it.

Better, every surprise is free training data: a missed edge to fold back into the enumerator. The surface model gets falsified by its own execution.

Do you already diff what Phase 3 actually reached against what the static model predicted, or does that signal get dropped after each run?

replied to kanaria007's post about 12 hours ago

This converges, and the convergence is the useful part.

Detection lead time was never the honest variable. Blast radius on first contact is. You named it, that is the right axis.

One sharpening. Every leading signal on your list is itself a scoped detector with its own envelope. Live-traffic-outside-envelope needs the envelope drawn right. No-golden-coverage clusters need the clustering complete. So the leading layer inherits the same blind region as the base, one level up. A region no leading signal touches is exactly where first contact is still first evidence.

Which turns the whole thing on one default. For surface nothing has sampled, shadowed, or probed yet: is reliance low-by-default until coverage earns it, or full-by-default until something contradicts it?

Default-deny is safe by construction. Default-allow is lagging wearing a receipt.

Where does Chronia sit on untouched surface, deny or allow by default?

replied to SeaWolf-AI's post about 12 hours ago

This comment has been hidden

replied to kanaria007's post about 12 hours ago

This converges, and the convergence is the useful part.

Detection lead time was never the honest variable. Blast radius on first contact is. You named it, that is the right axis.

One sharpening. Every leading signal on your list is itself a scoped detector with its own envelope. Live-traffic-outside-envelope needs the envelope drawn right. No-golden-coverage clusters need the clustering complete. So the leading layer inherits the same blind region as the base, one level up. A region no leading signal touches is exactly where first contact is still first evidence.

Which turns the whole thing on one default. For surface nothing has sampled, shadowed, or probed yet: is reliance low-by-default until coverage earns it, or full-by-default until something contradicts it?

Default-deny is safe by construction. Default-allow is lagging wearing a receipt.

Where does Chronia sit on untouched surface, deny or allow by default?

upvoted 3 papers about 12 hours ago

GRPO, Dr. GRPO, and DAPO Are Three Operations on One Number: The Group-Standard-Deviation Identity

Paper • 2607.00152 • Published 4 days ago • 3

When More Sampling Hurts: The Modal Ceiling and Correlation Ceiling of Test-Time Scaling

Paper • 2606.28661 • Published 7 days ago • 4

Building to the Test: Coding Agents Deliver What You Check, Not What You Requested

Paper • 2606.28430 • Published 8 days ago • 5

replied to SeaWolf-AI's post about 13 hours ago

This comment has been hidden

replied to kanaria007's post about 13 hours ago

This converges, and the convergence is the useful part.

Detection lead time was never the honest variable. Blast radius on first contact is. You named it, that is the right axis.

One sharpening. Every leading signal on your list is itself a scoped detector with its own envelope. Live-traffic-outside-envelope needs the envelope drawn right. No-golden-coverage clusters need the clustering complete. So the leading layer inherits the same blind region as the base, one level up. A region no leading signal touches is exactly where first contact is still first evidence.

Which turns the whole thing on one default. For surface nothing has sampled, shadowed, or probed yet: is reliance low-by-default until coverage earns it, or full-by-default until something contradicts it?

Default-deny is safe by construction. Default-allow is lagging wearing a receipt.

Where does Chronia sit on untouched surface, deny or allow by default?

reacted to stas's post with 🤗 about 22 hours ago

Post

641

I present to you a new experimental open book.

https://github.com/stas00/python-cookbook

I took my dense Python cheatsheet that I have been honing for many years and use a lot daily and turned it into a book of recipes.

Is this useful?

This is, of course, free, like other open books.

reacted to salma-remyx's post with 🔥 about 22 hours ago

Post

576

What's holding your code back?
Outrider finds, implements, and validates methods for your repo.

While testing Outrider on a fork of huggingface/peft, I discovered "Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models" (arxiv: 2402.02347)

The work offers improved stability and faster convergence in LoRA finetuning by adjusting updates for curvature that LoRA optimizers typically ignore.

Not the most recent paper, so I was pleasantly surprised my action surfaced this method as a candidate before implementing a PR. Even more surprised this method had not already been merged upstream.

Turns out, the author did try contributing to peft a couple years ago, but people get busy and the PR was closed after going stale.

So I decided to revive it! I opened an issue and soon after the author engaged to help land the feature. Now huggingface/peft #3382 is open, a joint effort with the paper's author.

This whole episode has me thinking about the future of OSS maintenance with AI coding. The software projects which endure will be well-shaped to quickly land and help test new ideas.

Across 30 forks, I've seen several papers land as clean PRs for multiple repos, which offers a perspective on how methods impact applications. Recent methods matching multiple frameworks: STARE, Entity Binding, BINEVAL

Get Outrider: https://github.com/remyxai/outrider

upvoted 6 papers 1 day ago

AI translation of literary texts is "fine", but readers still prefer human translations

Paper • 2606.26040 • Published 10 days ago • 5

Are Performance-Optimization Benchmarks Reliably Measuring Coding Agents?

Paper • 2607.01211 • Published 3 days ago • 6

GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization (V1.0)

Paper • 2604.17091 • Published Apr 18 • 23

Agent READMEs: An Empirical Study of Context Files for Agentic Coding

Paper • 2511.12884 • Published Nov 17, 2025 • 29

AtomiMed: Hierarchical Atomic Fact-Checking for Universal Clinical-Aware Medical Report Evaluation

Paper • 2606.31292 • Published 4 days ago • 6

Cross-Domain Generalization Failure in Lightweight Intrusion Detection Models for IIoT Networks

Paper • 2607.00553 • Published 3 days ago • 7

Dipankar Sarkar PRO

AI & ML interests

Recent Activity

Organizations

EvoPolicyGym: Evaluating Autonomous Policy Evolution in Interactive Environments

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

AgenticSTS: A Bounded-Memory Testbed for Long-Horizon LLM Agents

GRPO, Dr. GRPO, and DAPO Are Three Operations on One Number: The Group-Standard-Deviation Identity

When More Sampling Hurts: The Modal Ceiling and Correlation Ceiling of Test-Time Scaling

Building to the Test: Coding Agents Deliver What You Check, Not What You Requested

AI translation of literary texts is "fine", but readers still prefer human translations

Are Performance-Optimization Benchmarks Reliably Measuring Coding Agents?

GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization (V1.0)

Agent READMEs: An Empirical Study of Context Files for Agentic Coding

AtomiMed: Hierarchical Atomic Fact-Checking for Universal Clinical-Aware Medical Report Evaluation

Cross-Domain Generalization Failure in Lightweight Intrusion Detection Models for IIoT Networks

Dipankar Sarkar PRO

AI & ML interests

Recent Activity

Organizations

dipankarsarkar's activity