Pre-Seed · Building Now

The Natural Environment
for AI Agents is Onchain

We build the training grounds.

Agentic reinforcement learning environments where AI agents learn to operate onchain through trial and error. No wrappers. No shortcuts. The actual production skill.

The Problem

AI Agents Know Crypto.
They Can't Do Crypto.

Frontier models write Solidity at 80% pass@1 but cannot execute a DeFi swap, navigate MEV, or react to onchain state. Knowledge without operational capability is worthless onchain.

RL Know

Benchmarks & Evals

Human evaluators rank model outputs. Models learn from preference data. Trains knowledge and reasoning about crypto.

  • EVMbench, SCONE-bench, SolidityBench
  • Method: Rank outputs against reference answers
  • Result: Models detect bugs (70% with Codex 5.3) but still cannot operate onchain
Detection is solved. Execution is not.
RL Do

Agentic RL Environments

Agents dropped into simulated worlds. They learn through trial-and-error interaction. Trains operational capability.

  • Method: Trial-and-error in live environments
  • Reward: Automated -- transaction succeeded or failed
  • Crypto RL Do players: zero
BlockchainRL fills this gap.

13+ companies build RL Do environments. Every single one targets coding, computer use, or enterprise workflows. Zero target crypto.

The Solution

Agentic RL Environments for Onchain Operations

Docker containers where agents learn by doing. Forked EVM chains, funded wallets, raw JSON-RPC. No abstraction layers. Agents learn the actual production skill.

blockchainrl-env.yaml

Environment = forked chain (Anvil)

+ funded wallet

+ contract ABIs

+ task prompt

Action space = eth_sendTransaction

+ eth_call

+ eth_getLogs

Grader = read onchain state, compute reward

Machine-Verifiable Rewards

Onchain state is ground truth. Funds transferred or not, swap executed or not. No human annotators needed.

Difficulty Curriculum

Guided tasks with exact ABIs and hints up to open-ended scenarios with only a funded wallet. Swap to LP to leverage to cross-chain.

Raw Production Skill

No abstraction layers. Agents interact via raw JSON-RPC -- read ABIs, encode calldata, sequence approvals. The real thing.

Environment Categories

DeFi Operations

Multi-hop swaps, lending, yield optimization across Uniswap, Aave, Morpho

Reward: P&L, slippage, gas efficiency

MEV & Trading

Arbitrage, liquidations, sandwich defense in simulated mempools

Reward: Profit captured

Contract Security

Detect and exploit vulnerabilities in adversarial scenarios

Reward: Binary success / fail

Cross-Chain Ops

Bridge selection, multi-chain portfolio management

Reward: Cost, time, success rate

Position Management

Rebalance under market stress, maintain position health

Reward: Position health delta

Why Now

Five Converging Forces

The market is massive, growing fast, and has a crypto-shaped hole.

1

RL Market on Fire

Applied Compute went from $0 to $1.3B valuation in 8 months. Total RL environment market estimated at $4-8B/year across all labs.

2

Labs Spending Big

Anthropic and OpenAI each spend ~$1B/year on RL environments. OpenAI projects $8B by 2030. They buy externally -- Anthropic uses 12+ RL vendors with contracts at $300-500K/quarter.

3

Crypto Evals Already Exist

EVMbench (OpenAI/Paradigm) and SCONE-bench (Anthropic) prove labs treat crypto as a legitimate AI domain. But these are RL Know benchmarks, not RL Do environments.

4

Crypto VCs Want It

Coinbase Ventures names onchain agent training a core investment focus. Haseeb Qureshi (Dragonfly, $6.5B AUM) explicitly identifies RL fine-tuning on blockchain tasks as missing infrastructure.

5

The Gap is Wide Open

20+ companies build RL Do environments. Zero target crypto. Meanwhile, wrapper-dependent agents like Wayfinder and Clawi hit capability ceilings on complex DeFi operations.

Validation

The Category is Proven

RL Do is a validated, venture-backed category. The crypto vertical is untouched.

$4-8B

RL env market size per year

$2B+

Confirmed annual spend (Anthropic + OpenAI)

13+

RL Do companies -- zero in crypto

$1.3B

Applied Compute valuation in 8 months

Key Validation Signals

Applied Compute ($1.3B), Mechanize (Anthropic partner), Surge AI ($1B ARR pivot) validate RL Do as a category
EVMbench shows models jumped to 70% bug detection with Codex 5.3 -- but detection (RL Know) is not execution (RL Do)
7+ platforms shipped agent Skills in a single week -- Skills are the abstraction layer, BlockchainRL trains the capability layer beneath
Wing VC projects consolidation to 3-5 players by 2030. Domain expansion from coding to verticals expected 2027-28

Build With Us

We are looking for design partners and collaborators who share the conviction that AI agents need native onchain training -- not more wrappers.