OpenAI & Paradigm Launches EVMbench to Test AI Capabilities on Smart Contract Security

OpenAI and Paradigm launched EVMbench, aiming to evaluate AI agents’ ability to detect, patch, and exploit vulnerabilities within Ethereum-based smart contracts that collectively secure over $100 billion in crypto assets. EVMbench is based on 120 vulnerability types found in 40 different security audits (including several from Tempo blockchain) and will include scenarios involving payment-oriented smart contract code related to expected agentic stablecoin transactions.

Introducing EVMbench—a new benchmark that measures how well AI agents can detect, exploit, and patch high-severity smart contract vulnerabilities. https://t.co/op5zufgAGH
— OpenAI (@OpenAI) February 18, 2026

How does EVMbench evaluate an AI agent’s capabilities?

EVMbench measures each artificial intelligence (AI) agent’s performance in three ways:

Detect (identify vulnerabilities in contracts through an audit process)
Patch (modify contracts so that vulnerabilities are removed, but perfect functionality is maintained)
Exploit (conducting a liquidity-draining attack on a smart contract in a sandboxed environment)

OpenAI & Paradigm Launches EVMbench to Test AI Capabilities on Smart Contract Security: The benchmark evaluates agents' ability to detect, patch, and exploit smart contract vulnerabilities. — *Source: EVMbench’s start.*

Each test is conducted by using an Anvil-based, Rust-language harness to provide a deterministic and reproducible means of evaluating agents within isolated environments rather than live networks.

AI’s Frontier models (the most recent ones) have performed significantly better than those measured six months ago. For example, GPT-5.3-Codex’s average performance on exploit tasks is 72.2%, while its predecessor model (GPT-5) had only 31.9% on the same exploits. Detect and patch modes remain more challenging to agents as they often stop their audit process after finding the first fault or have difficulties maintaining the perfect functionality of the contract while also removing vulnerabilities.

new collab from @paradigm and @OpenAI:

evmbench is a benchmark and agent harness for exploiting smart contract bugs

a few months ago, the best models found <20% of critical, fund-draining @Code4rena bugs in our benchmark. today they find > 70% https://t.co/soOrCR38eO pic.twitter.com/2lr0WUVo2Q
— Alpin Yukseloglu (@0xalpo) February 18, 2026

Significance of Crypto Security

EVMbench tackles both sides of AI use in cybersecurity: monitoring new threats while encouraging defensive applications. As part of this initiative, OpenAI recently committed $10M of Application Programming Interface (API) credits through its Cybersecurity Grant Program to help boost efforts toward creating more defensive research and expanding Aardvark’s (its open-source security research agent) footprint with a private beta test.

How does EVMbench evaluate an AI agent’s capabilities?

Significance of Crypto Security

Related Articles