arxiv:2604.26091

Operating-Layer Controls for Onchain Language-Model Agents Under Real Capital

Published on Apr 28

· Submitted by

Poof on Apr 30

DXRG AI Inc

Upvote

Authors:

Abstract

Autonomous language-model agents managing real cryptocurrency trades demonstrated high reliability through comprehensive system design encompassing prompt compilation, policy validation, and execution safeguards rather than relying solely on base model performance.

AI-generated summary

We study reliability in autonomous language-model agents that translate user mandates into validated tool actions under real capital. The setting is DX Terminal Pro, a 21-day deployment in which 3,505 user-funded agents traded real ETH in a bounded onchain market. Users configured vaults through structured controls and natural-language strategies, but only agents could choose normal buy/sell trades. The system produced 7.5M agent invocations, roughly 300K onchain actions, about $20M in volume, more than 5,000 ETH deployed, roughly 70B inference tokens, and 99.9% settlement success for policy-valid submitted transactions. Long-running agents accumulated thousands of sequential decisions, including 6,000+ prompt-state-action cycles for continuously active agents, yielding a large-scale trace from user mandate to rendered prompt, reasoning, validation, portfolio state, and settlement. Reliability did not come from the base model alone; it emerged from the operating layer around the model: prompt compilation, typed controls, policy validation, execution guards, memory design, and trace-level observability. Pre-launch testing exposed failures that text-only benchmarks rarely measure, including fabricated trading rules, fee paralysis, numeric anchoring, cadence trading, and misread tokenomics. Targeted harness changes reduced fabricated sell rules from 57% to 3%, reduced fee-led observations from 32.5% to below 10%, and increased capital deployment from 42.9% to 78.0% in an affected test population. We show that capital-managing agents should be evaluated across the full path from user mandate to prompt, validated action, and settlement.

View arXiv page View PDF Project page Add to collection

Community

poofuse

Paper submitter about 7 hours ago

What happens when LLM agents are allowed to manage real capital, not just solve benchmark tasks?

We study a 21-day deployment of 3,505 user-funded agents trading real ETH onchain. The system logged 7.5M agent invocations, ~300K onchain actions, ~$20M in volume, 70B inference tokens, and 99.9% settlement success for policy-valid transactions.

The key finding: capital-agent reliability is an operating-layer problem. The largest reliability gains came from prompt compilation, typed controls, policy validation, execution guards, memory semantics, and full instruction-to-settlement observability. Agentic harnesses must be specifically built, evaluated and optimized for markets in order to perform reliably.

The paper reports concrete failure modes, fabricated rules, fee paralysis, numeric anchoring, cadence trading, and tokenomics misreads, and shows how targeted harness changes reduced them. It also reveals novel and surprising behavior that mirror human characteristics when autonomous agents are deployed in scale.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2604.26091

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.26091 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.26091 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.26091 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.