The natural
environment
for AI agents
is onchain.

We build the
training grounds.

Agentic reinforcement learning environments where AI agents learn to operate onchain through trial and error. No wrappers. No shortcuts. The actual production skill.

MethodTrial-and-error in live forks. Raw JSON-RPC. No SDK.
RewardOnchain state delta. Ground truth by construction.
CategoryOperational RL × DeFi. 40+ in op-RL. 0 in crypto.
Stackanvil · foundry · docker · machine-verifiable graders.

AI can spot exploits across 4.6 million historical attacks.
It writes Solidity fluently.
It can explain concentrated liquidity in textbook prose.

And then —
it cannot swap a token.

We call this the Knowledge–Operational gap. The difference between understanding blockchain concepts and being able to execute blockchain operations. Between knowing what a flash loan is and actually constructing the callback, executing the arbitrage, and repaying it in a single transaction.

AI agents know crypto.
They can't do crypto.

Frontier models write Solidity but cannot execute a DeFi swap, navigate MEV, or react to onchain state. Knowledge without operational capability is worthless onchain.

KNOWLEDGE RL ✓ SOLVED · 70%+
Benchmarks.
Evals.
Detection.

EVMbench · SCONE-bench · Codex 5.3 hits 70%+ on bug detection. The category is mature, the question is answered, the room is full.

OPERATIONAL RL ● WIDE OPEN
Agents.
Live envs.
Execution.

Agents dropped into simulated worlds. Trial-and-error in live forks. Rewards from onchain state. Trains operational capability.

METHODtrial-and-error in live env
REWARDautomated · tx success or fail
CRYPTO PLAYERSZERO

40+ companies build Operational RL environments. Every single one targets coding, computer use, or enterprise workflows.

0 / 40+ → crypto

Five categories.
Zero wrappers.

Each environment ships as a Docker image. Each task ships with a machine-verifiable grader. Agents interact via raw JSON-RPC. No SDK lock-in. No language constraints.

DEX
DEX Trading
Multi-hop swaps, route optimization, slippage management across Uniswap, Curve, aggregators.
REWARD
P&L · slippage · gas eff
L1
LEND
Lending & Borrowing
Supply collateral, manage leverage, monitor liquidation thresholds on Aave, Morpho, Compound.
REWARD
Interest · health · gas
L2
LP
Liquidity Provision
Open and rebalance LP positions, manage concentrated liquidity ranges, optimize fee capture.
REWARD
Fee yield · IL · health
L3
XCHAIN
Cross-Chain Ops
Bridge selection, multi-chain portfolio management, optimize cost / time / settlement.
REWARD
Cost · time · success
L4
STRAT
Complex Strategies
Yield farming, arbitrage, liquidations, sandwich defense, multi-protocol portfolio mgmt.
REWARD
Total return · risk-adj
L5

Bring your agent.
We bring the world.

DELIVERY docker pull blockchainrl/env:dex-trading
INTERFACE eth_sendTransaction · eth_call · eth_getLogs
GRADING reward = grader.score(state_pre, state_post)
DETERMINISM replayable · seed-fixed · 100% reproducible
COST ~$0.00 per training episode
CURRICULUM L1 (guided) → L5 (expert · only a wallet)
$ cat env.yaml YAML
environment:
chain: anvil --fork-url $ETH_RPC
wallet: funded · 100 ETH
contracts: [uniswap_v3, aave_v3, morpho]
task: "provide ETH/USDC LP, rebalance on 2% drift"
action_space:
- eth_sendTransaction
- eth_call
- eth_getLogs
grader: onchain_state → reward

The category is proven.
The crypto vertical is untouched.

$4–8B

Estimated annual spend on RL environments across all frontier labs. The category is real, venture-validated, and growing fast — but the crypto vertical is empty.

$2B+
Confirmed annual spend
Anthropic + OpenAI, 2026
$1.3B
Applied Compute valuation
$0 → $1.3B in 8 months
40+
Op-RL companies
zero target crypto
12+
RL vendors at Anthropic
$300–500K / quarter contracts
01
RL Market on Fire

Applied Compute went $0 → $1.3B valuation in 8 months. Total RL environment market estimated at $4–8B/year across all labs.

02
Labs Spending Big

Anthropic and OpenAI each spend ~$1B/year on RL environments. OpenAI projects $8B by 2030. Anthropic uses 12+ RL vendors at $300–500K/quarter.

03
Crypto Evals Already Exist

EVMbench (OpenAI/Paradigm), SCONE-bench (Anthropic). Labs treat crypto as legitimate. But Knowledge RL ≠ Operational RL.

04
Crypto VCs Want It

Coinbase Ventures names onchain agent training a core focus. Top crypto VCs identify RL fine-tuning as missing infra.

05
The Gap is Wide Open

40+ companies build Operational RL. Zero target crypto. Wrapper-dependent agents (Wayfinder, Clawi) hit ceilings on complex DeFi.

CALL FOR PARTNERS

Build
with us.

Pre-seed · 2026. Looking for AI labs to co-develop environments tailored to their training pipelines. Limited slots.