Agentic RL
environments
for blockchain.
Docker containers where AI agents learn to operate onchain through trial and error. Each environment is a forked EVM chain, a funded wallet, contract ABIs, and a task prompt. No abstraction layers — agents interact via raw JSON-RPC and learn the actual production skill.
Watch an
agent
learn.
Each episode is one agent interacting with a forked EVM. Every action — every transaction — is recorded. The grader reads post-episode onchain state and returns a scalar reward. No human required.
Five categories.
Machine-verifiable
rewards.
The chain
is the
reward signal.
Blockchain is the only non-game RL domain that grades itself. Every outcome is recorded onchain — funds transferred or not, swap executed or not, position health improved or not.
No human labelers. No LLM judges.
Ground truth by construction.
P&L from swaps, interest earned on deposits, fees captured from LP positions — all read directly from onchain state.
Gas consumed, slippage incurred, routing optimality. Every inefficiency is measurable and penalizable.
Vulnerability exploited or not. Bridge completed or not. No ambiguity — the EVM is deterministic.
Guided.
Then expert.
Agents progress from guided tasks with full context to open-ended scenarios where they must discover contracts, parse ABIs, and devise strategy autonomously. The curriculum comes from task prompt context, not abstraction layers.
Single-chain swaps with exact ABIs and function hints provided.
provided: [abi, contract_addr, function_sig, example_tx]LP management with contract addresses given, agent encodes calldata.
Leverage and liquidation scenarios with minimal guidance.
Cross-chain operations — agent discovers contracts and plans execution.
Only a funded wallet. Agent discovers protocols, parses ABIs, devises strategy autonomously.
provided: [funded_wallet]
Built for
AI labs.
Bring your own agent, your own training loop, your own infrastructure. BlockchainRL environments integrate with any RL framework.
Docker Images
Ship container images with task definitions and graders. Your infrastructure, your scale. Spin up thousands of parallel instances on your own clusters.
Raw JSON-RPC
No SDK lock-in. Agents interact via standard Ethereum RPC calls — eth_sendTransaction, eth_call, eth_getLogs. Compatible with any language, any framework.
Automated Rewards
Graders read post-episode onchain state and compute reward signals automatically. Trajectory logging captures every agent RPC call for analysis.
Economics of
onchain training.
Physical hardware, sensor calibration, safety constraints. Every episode costs real money and real time.
Fast simulation but synthetic rewards. Skills don't transfer to real-world economic activity.
Forked chains on Anvil. Infinite scenarios, deterministic replay, real economic logic. Near-zero marginal cost.
Limited
slots.
We work with select AI labs to co-develop environments tailored to their training pipelines. Pre-seed · 2026.