Product
Agentic RL Environments for Blockchain
Docker containers where AI agents learn to operate on-chain through trial and error. Each environment is a forked EVM chain, a funded wallet, contract ABIs, and a task prompt. No abstraction layers -- agents interact via raw JSON-RPC and learn the actual production skill.
Environment = forked chain (Anvil) + funded wallet + contract ABIs + task prompt Action space = eth_sendTransaction + eth_call + eth_getLogs Grader = read on-chain state, compute reward
Five Environment Categories
From single-asset swaps to multi-protocol portfolio strategies. Each category produces machine-verified rewards with no human annotators required.
DEX Trading
Execute multi-hop swaps, optimize routing, manage slippage across Uniswap, Curve, and aggregators.
Reward: P&L, slippage, gas efficiency
Lending & Borrowing
Supply collateral, manage leverage ratios, monitor liquidation thresholds on Aave, Morpho, and Compound.
Reward: Interest earned, health factor, gas costs
Liquidity Provision
Open and rebalance LP positions, manage concentrated liquidity ranges, optimize fee capture.
Reward: Fee yield, impermanent loss, position health
Cross-Chain Operations
Select bridges, manage multi-chain portfolios, optimize for cost, speed, and settlement guarantees.
Reward: Cost, time, success rate
Complex Strategies
Yield farming, arbitrage, liquidation capture, sandwich defense, and multi-protocol portfolio management.
Reward: Total return, risk-adjusted performance
Machine-Verifiable Rewards
Blockchain is the only non-game RL domain that grades itself. Every outcome is recorded on-chain -- funds transferred or not, swap executed or not, position health improved or not. This creates automated reward signals with near-zero false positive rate.
No human labelers. No LLM judges. The chain is the reward signal. Ground truth by construction.
Transaction Outcomes
P&L from swaps, interest earned on deposits, fees captured from LP positions -- all read directly from on-chain state.
Execution Efficiency
Gas consumed, slippage incurred, routing optimality. Every inefficiency is measurable and penalizable.
Binary Success Signals
Vulnerability exploited or not. Bridge completed or not. No ambiguity -- the EVM is deterministic.
Graduated Difficulty Curriculum
Agents progress from guided tasks with full context to open-ended scenarios where they must discover contracts, parse ABIs, and devise strategy autonomously. The curriculum comes from task prompt context, not abstraction layers.
Guided
Single-chain swaps with exact ABIs and function hints provided
Assisted
LP management with contract addresses given, agent encodes calldata
Independent
Leverage and liquidation scenarios with minimal guidance
Advanced
Cross-chain operations -- agent discovers contracts and plans execution
Expert
Only a funded wallet. Agent discovers protocols, parses ABIs, and devises strategy autonomously.
Economics of On-Chain Training
Blockchain environments deliver infinite training scenarios at near-zero marginal cost. Every scenario generates real economic value -- not toy rewards.
$$$
Robotics RL
Physical hardware, sensor calibration, safety constraints. Every episode costs real money and real time.
$$
Game RL
Fast simulation but synthetic rewards. Skills don't transfer to real-world economic activity.
~$0
Blockchain RL
Forked chains on Anvil. Infinite scenarios, deterministic replay, real economic logic. Near-zero marginal cost.
The cost case for labs: A smaller model RL-trained on DeFi operations via BlockchainRL environments can outperform GPT-4 on on-chain tasks at 1/100th the inference cost. Every wrapper-dependent agent that fumbles a complex DeFi operation is evidence the underlying model needs better crypto RL training -- not more SDK wrappers.
Built for AI Labs
Bring your own agent, your own training loop, your own infrastructure. BlockchainRL environments integrate with any RL framework.
Delivery
Docker Images
Ship container images with task definitions and graders. Your infrastructure, your scale. Spin up thousands of parallel instances on your own clusters.
Interface
Raw JSON-RPC
No SDK lock-in. Agents interact via standard Ethereum RPC calls -- eth_sendTransaction, eth_call, eth_getLogs. Compatible with any language, any framework.
Grading
Automated Rewards
Graders read post-episode on-chain state and compute reward signals automatically. Trajectory logging captures every agent RPC call for analysis.
Become a Design Partner
We are working with select AI labs to co-develop environments tailored to their training pipelines. Limited slots available.
Get in Touch