OPERATIONAL RL FOR ONCHAIN AGENTS

The natural
environment
for AI agents
is onchain.

We build the
training grounds.

Agentic reinforcement learning environments where AI agents learn to operate onchain through trial and error. No wrappers. No shortcuts. The actual production skill.

Explore the product → Get in touch

MethodTrial-and-error in live forks. Raw JSON-RPC. No SDK.

RewardOnchain state delta. Ground truth by construction.

CategoryOperational RL × DeFi. 40+ in op-RL. 0 in crypto.

Stackanvil · foundry · docker · machine-verifiable graders.

THE PARADOX

AI can spot exploits across 4.6 million historical attacks.
It writes Solidity fluently.
It can explain concentrated liquidity in textbook prose.

And then —
it cannot swap a token.

We call this the Knowledge–Operational gap. The difference between understanding blockchain concepts and being able to execute blockchain operations. Between knowing what a flash loan is and actually constructing the callback, executing the arbitrage, and repaying it in a single transaction.

THE GAP

AI agents know crypto.
They can't do crypto.

Frontier models write Solidity but cannot execute a DeFi swap, navigate MEV, or react to onchain state. Knowledge without operational capability is worthless onchain.

KNOWLEDGE RL ✓ SOLVED · 70%+

Benchmarks.
Evals.
Detection.

EVMbench · SCONE-bench · Codex 5.3 hits 70%+ on bug detection. The category is mature, the question is answered, the room is full.

OPERATIONAL RL ● WIDE OPEN

Agents.
Live envs.
Execution.

Agents dropped into simulated worlds. Trial-and-error in live forks. Rewards from onchain state. Trains operational capability.

METHODtrial-and-error in live env

REWARDautomated · tx success or fail

CRYPTO PLAYERSZERO

40+ companies build Operational RL environments. Every single one targets coding, computer use, or enterprise workflows.

0 / 40+ → crypto

ENVIRONMENTS

Five categories.
Zero wrappers.

Each environment ships as a Docker image. Each task ships with a machine-verifiable grader. Agents interact via raw JSON-RPC. No SDK lock-in. No language constraints.

DEX

DEX Trading

Multi-hop swaps, route optimization, slippage management across Uniswap, Curve, aggregators.

REWARD

P&L · slippage · gas eff

LEND

Lending & Borrowing

Supply collateral, manage leverage, monitor liquidation thresholds on Aave, Morpho, Compound.

REWARD

Interest · health · gas

Liquidity Provision

Open and rebalance LP positions, manage concentrated liquidity ranges, optimize fee capture.

REWARD

Fee yield · IL · health

XCHAIN

Cross-Chain Ops

Bridge selection, multi-chain portfolio management, optimize cost / time / settlement.

REWARD

Cost · time · success

STRAT

Complex Strategies

Yield farming, arbitrage, liquidations, sandwich defense, multi-protocol portfolio mgmt.

REWARD

Total return · risk-adj

HOW IT SHIPS

Bring your agent.
We bring the world.

DELIVERY docker pull blockchainrl/env:dex-trading

INTERFACE eth_sendTransaction · eth_call · eth_getLogs

GRADING reward = grader.score(state_pre, state_post)

DETERMINISM replayable · seed-fixed · 100% reproducible

COST ~$0.00 per training episode

CURRICULUM L1 (guided) → L5 (expert · only a wallet)

$ cat env.yaml YAML

environment:

chain: anvil --fork-url $ETH_RPC

wallet: funded · 100 ETH

contracts: [uniswap_v3, aave_v3, morpho]

task: "provide ETH/USDC LP, rebalance on 2% drift"

action_space:

- eth_sendTransaction

- eth_call

- eth_getLogs

grader: onchain_state → reward

WHY NOW

The category is proven.
The crypto vertical is untouched.

$4–8B

Estimated annual spend on RL environments across all frontier labs. The category is real, venture-validated, and growing fast — but the crypto vertical is empty.

$2B+

Confirmed annual spend

Anthropic + OpenAI, 2026

$1.3B

Applied Compute valuation

$0 → $1.3B in 8 months

40+

Op-RL companies

zero target crypto

12+

RL vendors at Anthropic

$300–500K / quarter contracts

RL Market on Fire

Applied Compute went $0 → $1.3B valuation in 8 months. Total RL environment market estimated at $4–8B/year across all labs.

Labs Spending Big

Anthropic and OpenAI each spend ~$1B/year on RL environments. OpenAI projects $8B by 2030. Anthropic uses 12+ RL vendors at $300–500K/quarter.

Crypto Evals Already Exist

EVMbench (OpenAI/Paradigm), SCONE-bench (Anthropic). Labs treat crypto as legitimate. But Knowledge RL ≠ Operational RL.

Crypto VCs Want It

Coinbase Ventures names onchain agent training a core focus. Top crypto VCs identify RL fine-tuning as missing infra.

The Gap is Wide Open

40+ companies build Operational RL. Zero target crypto. Wrapper-dependent agents (Wayfinder, Clawi) hit ceilings on complex DeFi.

SUPPORTED BY

CALL FOR PARTNERS

Build
with us.

Pre-seed · 2026. Looking for AI labs to co-develop environments tailored to their training pipelines. Limited slots.

Become a partner ↗ View the product → Meet the team →

The natural environment for AI agents is onchain.

AI agents know crypto. They can't do crypto.

Five categories. Zero wrappers.

Bring your agent. We bring the world.

The category is proven. The crypto vertical is untouched.

Buildwith us.

The natural
environment
for AI agents
is onchain.

AI agents know crypto.
They can't do crypto.

Five categories.
Zero wrappers.

Bring your agent.
We bring the world.

The category is proven.
The crypto vertical is untouched.

Build
with us.