Home » EVMbench: Advanced AI Smart Contract Security Testing

EVMbench: Advanced AI Smart Contract Security Testing

OpenAI and Paradigm Launch EVMbench to Measure AI Smart Contract Security 1

OpenAI and Paradigm officially launched EVMbench to address security risks in smart contracts that secure over $100 billion in crypto assets. The benchmark utilizes 120 curated vulnerabilities from 40 professional audits, including scenarios from the Tempo blockchain, to test Artificial Intelligence (AI) capabilities in a sandboxed Ethereum Virtual Machine (EVM) environment.

The system evaluates agents across three distinct modes: detection of vulnerabilities, functional patching of code, and end-to-end execution of fund-draining exploits. Recent testing shows that the GPT-5.3-Codex model achieves a 72.2% success rate in exploit tasks, marking a significant increase from the 31.9% score recorded by GPT-5 just six months ago.

“Measuring model capability in this domain helps track emerging cyber risks and highlights the importance of using AI systems defensively to audit and strengthen deployed contracts,” according to the OpenAI announcement.

🧭 FAQs

What is the primary purpose of the EVMbench framework? It measures how effectively AI agents identify and resolve high-severity smart contract vulnerabilities.

Which organizations collaborated to develop this new security benchmark? OpenAI and the crypto investment firm Paradigm co-developed the EVMbench testing environment.

How does the system verify if an agent successfully patches code? Automated tests ensure vulnerabilities are eliminated without breaking the contract’s intended functional logic.

Is there financial support available for researchers using these tools? OpenAI is committing $10 million in API credits to support defensive cybersecurity research.

Related Articles

Strategy Acquires 520 More Bitcoin for $35M Despite Being Nearly $10B Underwater 1

Strategy Acquires 520 More Bitcoin for $35M Despite Being Nearly $10B Underwater

Smaller Buy, Same Cadence The purchase marks the third consecutive week of bitcoin accumulation for Strategy. Saylor disclosed the transaction

El Salvador Keeps Stacking: 8 BTC Added in a Week as Reserve Tops 7,689 BTC 1

El Salvador Keeps Stacking: 8 BTC Added in a Week as Reserve Tops 7,689 BTC

Buying the Dip, Every Day The latest additions, tracked through the country’s official bitcoin reserve data, bring El Salvador’s stack

Taiko Halts Withdrawals as Hackers Pull $1.7M Through Bridge Validation Flaw 1

Taiko Halts Withdrawals as Hackers Pull $1.7M Through Bridge Validation Flaw

Technical Flaw Leads to $1.7 Million Loss Ethereum scaling solution Taiko confirmed June 22 that its chain state verification mechanism

A whale just opened $48 million in shorts against Bitcoin, Solana, and Ethereum. 1

A whale just opened $48 million in shorts against Bitcoin, Solana, and Ethereum.

Anatomy of the Bet Onchain analytics firm Lookonchain reported that the wallet, labeled 0xaeaa, moved 6.68 million USDC onto Hyperliquid

FDIC: US Banks Report $80 Billion in Profit as Unrealized Losses Rise to $325 Billion 1

FDIC: US Banks Report $80 Billion in Profit as Unrealized Losses Rise to $325 Billion

A Growing Paper Loss The Federal Deposit Insurance Corporation (FDIC) said total unrealized losses climbed $19.0 billion, or 6.2%, from

Why Lummis Says the CLARITY Act Will End the 'Absurdity' Facing US Software Developers 1

Why Lummis Says the CLARITY Act Will End the ‘Absurdity’ Facing US Software Developers

Developers in the Crosshairs Lummis made her case via a statement shared on June 22, singling out the legal exposure