Product

Agentic RL Environments for Blockchain

Docker containers where AI agents learn to operate on-chain through trial and error. Each environment is a forked EVM chain, a funded wallet, contract ABIs, and a task prompt. No abstraction layers -- agents interact via raw JSON-RPC and learn the actual production skill.

blockchainrl-env.yaml
Environment = forked chain (Anvil)
             + funded wallet
             + contract ABIs
             + task prompt
Action space = eth_sendTransaction
             + eth_call
             + eth_getLogs
Grader      = read on-chain state, compute reward

Five Environment Categories

From single-asset swaps to multi-protocol portfolio strategies. Each category produces machine-verified rewards with no human annotators required.

DEX Trading

Execute multi-hop swaps, optimize routing, manage slippage across Uniswap, Curve, and aggregators.

Reward: P&L, slippage, gas efficiency

Lending & Borrowing

Supply collateral, manage leverage ratios, monitor liquidation thresholds on Aave, Morpho, and Compound.

Reward: Interest earned, health factor, gas costs

Liquidity Provision

Open and rebalance LP positions, manage concentrated liquidity ranges, optimize fee capture.

Reward: Fee yield, impermanent loss, position health

Cross-Chain Operations

Select bridges, manage multi-chain portfolios, optimize for cost, speed, and settlement guarantees.

Reward: Cost, time, success rate

Complex Strategies

Yield farming, arbitrage, liquidation capture, sandwich defense, and multi-protocol portfolio management.

Reward: Total return, risk-adjusted performance

Machine-Verifiable Rewards

Blockchain is the only non-game RL domain that grades itself. Every outcome is recorded on-chain -- funds transferred or not, swap executed or not, position health improved or not. This creates automated reward signals with near-zero false positive rate.

No human labelers. No LLM judges. The chain is the reward signal. Ground truth by construction.

[tx]

Transaction Outcomes

P&L from swaps, interest earned on deposits, fees captured from LP positions -- all read directly from on-chain state.

[gas]

Execution Efficiency

Gas consumed, slippage incurred, routing optimality. Every inefficiency is measurable and penalizable.

[ok]

Binary Success Signals

Vulnerability exploited or not. Bridge completed or not. No ambiguity -- the EVM is deterministic.

Graduated Difficulty Curriculum

Agents progress from guided tasks with full context to open-ended scenarios where they must discover contracts, parse ABIs, and devise strategy autonomously. The curriculum comes from task prompt context, not abstraction layers.

1

Guided

Single-chain swaps with exact ABIs and function hints provided

2

Assisted

LP management with contract addresses given, agent encodes calldata

3

Independent

Leverage and liquidation scenarios with minimal guidance

4

Advanced

Cross-chain operations -- agent discovers contracts and plans execution

5

Expert

Only a funded wallet. Agent discovers protocols, parses ABIs, and devises strategy autonomously.

Economics of On-Chain Training

Blockchain environments deliver infinite training scenarios at near-zero marginal cost. Every scenario generates real economic value -- not toy rewards.

$$$

Robotics RL

Physical hardware, sensor calibration, safety constraints. Every episode costs real money and real time.

$$

Game RL

Fast simulation but synthetic rewards. Skills don't transfer to real-world economic activity.

~$0

Blockchain RL

Forked chains on Anvil. Infinite scenarios, deterministic replay, real economic logic. Near-zero marginal cost.

The cost case for labs: A smaller model RL-trained on DeFi operations via BlockchainRL environments can outperform GPT-4 on on-chain tasks at 1/100th the inference cost. Every wrapper-dependent agent that fumbles a complex DeFi operation is evidence the underlying model needs better crypto RL training -- not more SDK wrappers.

Built for AI Labs

Bring your own agent, your own training loop, your own infrastructure. BlockchainRL environments integrate with any RL framework.

Delivery

Docker Images

Ship container images with task definitions and graders. Your infrastructure, your scale. Spin up thousands of parallel instances on your own clusters.

Interface

Raw JSON-RPC

No SDK lock-in. Agents interact via standard Ethereum RPC calls -- eth_sendTransaction, eth_call, eth_getLogs. Compatible with any language, any framework.

Grading

Automated Rewards

Graders read post-episode on-chain state and compute reward signals automatically. Trajectory logging captures every agent RPC call for analysis.

Become a Design Partner

We are working with select AI labs to co-develop environments tailored to their training pipelines. Limited slots available.

Get in Touch