Papers
arxiv:2604.26091

Operating-Layer Controls for Onchain Language-Model Agents Under Real Capital

Published on Apr 28
· Submitted by
Poof
on Apr 30
Authors:
,
,
,
,
,
,

Abstract

Autonomous language-model agents managing real cryptocurrency trades demonstrated high reliability through comprehensive system design encompassing prompt compilation, policy validation, and execution safeguards rather than relying solely on base model performance.

AI-generated summary

We study reliability in autonomous language-model agents that translate user mandates into validated tool actions under real capital. The setting is DX Terminal Pro, a 21-day deployment in which 3,505 user-funded agents traded real ETH in a bounded onchain market. Users configured vaults through structured controls and natural-language strategies, but only agents could choose normal buy/sell trades. The system produced 7.5M agent invocations, roughly 300K onchain actions, about $20M in volume, more than 5,000 ETH deployed, roughly 70B inference tokens, and 99.9% settlement success for policy-valid submitted transactions. Long-running agents accumulated thousands of sequential decisions, including 6,000+ prompt-state-action cycles for continuously active agents, yielding a large-scale trace from user mandate to rendered prompt, reasoning, validation, portfolio state, and settlement. Reliability did not come from the base model alone; it emerged from the operating layer around the model: prompt compilation, typed controls, policy validation, execution guards, memory design, and trace-level observability. Pre-launch testing exposed failures that text-only benchmarks rarely measure, including fabricated trading rules, fee paralysis, numeric anchoring, cadence trading, and misread tokenomics. Targeted harness changes reduced fabricated sell rules from 57% to 3%, reduced fee-led observations from 32.5% to below 10%, and increased capital deployment from 42.9% to 78.0% in an affected test population. We show that capital-managing agents should be evaluated across the full path from user mandate to prompt, validated action, and settlement.

Community

Paper submitter

What happens when LLM agents are allowed to manage real capital, not just solve benchmark tasks?

Screenshot 2026-04-30 at 12.53.02 PM

We study a 21-day deployment of 3,505 user-funded agents trading real ETH onchain. The system logged 7.5M agent invocations, ~300K onchain actions, ~$20M in volume, 70B inference tokens, and 99.9% settlement success for policy-valid transactions.

The key finding: capital-agent reliability is an operating-layer problem. The largest reliability gains came from prompt compilation, typed controls, policy validation, execution guards, memory semantics, and full instruction-to-settlement observability. Agentic harnesses must be specifically built, evaluated and optimized for markets in order to perform reliably.

image

The paper reports concrete failure modes, fabricated rules, fee paralysis, numeric anchoring, cadence trading, and tokenomics misreads, and shows how targeted harness changes reduced them. It also reveals novel and surprising behavior that mirror human characteristics when autonomous agents are deployed in scale.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.26091
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.26091 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.26091 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.26091 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.