Skip to content

OpenAI & Paradigm Launches EVMbench to Test AI Capabilities on Smart Contract Security

AI agent with OpenAI logo. OpenAI & Paradigm Launches EVMbench to Test AI Capabilities on Smart Contract Security

OpenAI and Paradigm launched EVMbench, aiming to evaluate AI agents’ ability to detect, patch, and exploit vulnerabilities within Ethereum-based smart contracts that collectively secure over $100 billion in crypto assets. EVMbench is based on 120 vulnerability types found in 40 different security audits (including several from Tempo blockchain) and will include scenarios involving payment-oriented smart contract code related to expected agentic stablecoin transactions.

How does EVMbench evaluate an AI agent’s capabilities?

EVMbench measures each artificial intelligence (AI) agent’s performance in three ways:

  1. Detect (identify vulnerabilities in contracts through an audit process)
  2. Patch (modify contracts so that vulnerabilities are removed, but perfect functionality is maintained)
  3. Exploit (conducting a liquidity-draining attack on a smart contract in a sandboxed environment)
OpenAI & Paradigm Launches EVMbench to Test AI Capabilities on Smart Contract Security: The benchmark evaluates agents' ability to detect, patch, and exploit smart contract vulnerabilities.
Source: EVMbench’s start.

Each test is conducted by using an Anvil-based, Rust-language harness to provide a deterministic and reproducible means of evaluating agents within isolated environments rather than live networks.

OpenAI & Paradigm Launches EVMbench to Test AI Capabilities on Smart Contract Security: The benchmark evaluates agents' ability to detect, patch, and exploit smart contract vulnerabilities.
Source: EVMbench’s interface.

AI’s Frontier models (the most recent ones) have performed significantly better than those measured six months ago. For example, GPT-5.3-Codex’s average performance on exploit tasks is 72.2%, while its predecessor model (GPT-5) had only 31.9% on the same exploits. Detect and patch modes remain more challenging to agents as they often stop their audit process after finding the first fault or have difficulties maintaining the perfect functionality of the contract while also removing vulnerabilities.

Significance of Crypto Security

EVMbench tackles both sides of AI use in cybersecurity: monitoring new threats while encouraging defensive applications. As part of this initiative, OpenAI recently committed $10M of Application Programming Interface (API) credits through its Cybersecurity Grant Program to help boost efforts toward creating more defensive research and expanding Aardvark’s (its open-source security research agent) footprint with a private beta test.

Final Take

EVMbench now offers a common framework to assess how AI capabilities are evolving within the cybersecurity space of blockchain-based systems, highlighting both progress and long-standing challenges in automated vulnerability management processes.

Disclaimer: All content provided on Times Crypto is for informational purposes only and does not constitute financial or trading advice. Trading and investing involve risk and may result in financial loss. We strongly recommend consulting a licensed financial advisor before making any investment decisions.

A Web3 Journalist at TimesCrypto with a knack for turning complex ideas into engaging stories. With a solid Tech background, Alan has led teams to create and refine impactful projects across industries, working in firms such as IBM, Cisco Systems, and Telecom. He’s passionate about Blockchain, Finance, Science, bringing a unique blend of technical expertise and creative flair to every piece he writes. When he’s not crafting content, you’ll find him diving deep into research or just having some fun!

Zoomable Image