Web 3
OpenAI GPT 4o ranked as best AI model for writing Solidity smart contract code by IQ

Credit : cryptoslate.com


SolidityBench by IQ has been launched as the primary leaderboard that evaluates LLMs in Solidity code era. Accessible on Hugging faceit introduces two progressive benchmarks, NaiveJudge and HumanEval for Solidity, designed to evaluate and rank the proficiency of AI fashions in producing good contract code.
Developed by IQs BrainDAO As a part of the upcoming IQ Code suite, SolidityBench serves to refine and evaluate their proprietary EVMind LLMs towards generalist and community-created fashions. IQ Code goals to supply AI fashions tailor-made to generate and management good contract code, assembly the rising want for safe and environment friendly blockchain functions.
As IQ stated CryptoSlateNaiveJudge gives a brand new strategy by tasking LLMs with implementing good contracts based mostly on detailed specs derived from audited OpenZeppelin contracts. These contracts present a gold customary for correctness and effectivity. The generated code is evaluated towards a reference implementation utilizing standards comparable to purposeful completeness, compliance with Solidity greatest practices and safety requirements, and optimization effectivity.
The evaluate course of makes use of state-of-the-art LLMs, together with a number of variations of OpenAI’s GPT-4 and Claude 3.5 Sonnet as unbiased code reviewers. They evaluate the code based mostly on strict standards, together with implementing all main functionalities, dealing with edge instances, error administration, right syntax utilization, and general code construction and maintainability.
Optimization concerns comparable to gasoline effectivity and storage administration are additionally evaluated. Scores vary from 0 to 100 and supply a complete evaluation of performance, safety, and effectivity, reflecting the complexity of good contract skilled improvement.
Which AI fashions are greatest for growing stable good contracts?
Benchmark outcomes confirmed that OpenAI’s GPT-4o mannequin achieved the best general rating of 80.05, with a NaiveJudge rating of 72.18 and HumanEval for Solidity move charges of 80% on move@1 and 92% on move@3 .
Curiously, newer reasoning fashions like OpenAI’s o1-preview and o1-mini have been crushed into first place, with scores of 77.61 and 75.08 respectively. Fashions from Anthropic and XAI, together with Claude 3.5 Sonnet and Grok-2, confirmed aggressive efficiency with general scores hovering round 74. Nvidia’s Llama-3.1-Nemotron-70B scored the bottom within the high 10 with 52.54.

Per IQ, HumanEval for Solidity adapts OpenAI’s authentic HumanEval benchmark from Python to Solidity, and contains 25 duties of various problem. Every job contains corresponding assessments suitable with Hardhat, a well-liked Ethereum improvement setting, which permits correct compilation and testing of the generated code. The analysis metrics, move@1 and move@3, measure the mannequin’s success on first makes an attempt and over a number of makes an attempt, offering perception into each accuracy and problem-solving capacity.
Goals of utilizing AI fashions in good contract improvement
By introducing these benchmarks, SolidityBench goals to advertise the AI-enabled improvement of good contracts. It encourages the creation of extra superior and dependable AI fashions and offers builders and researchers with helpful insights into the present capabilities and limitations of AI in Solidity improvement.
The benchmarking toolkit goals to advance IQ Code’s EVMind LLMs and in addition units new requirements for the event of AI-enabled good contracts within the blockchain ecosystem. The initiative hopes to handle a vital want within the business, the place demand for safe and environment friendly good contracts continues to develop.
Builders, researchers, and AI fans are invited to discover and contribute to SolidityBench, which goals to drive the continued refinement of AI fashions, advance greatest practices, and advance decentralized functions.
Go to the SolidityBench ranking on Hugging Face for extra info and to start out benchmarking Solidity era fashions.
Talked about on this article
-
Meme Coin6 months ago
DOGE Sees Massive User Growth: Active Addresses Up 400%
-
Blockchain12 months ago
Orbler Partners with Meta Lion to Accelerate Web3 Growth
-
Videos12 months ago
Shocking Truth About TRON! TRX Crypto Review & Price Predictions!
-
Meme Coin1 year ago
Crypto Whale Buys the Dip: Accumulates PEPE and ETH
-
NFT9 months ago
SEND Arcade launches NFT entry pass for Squad Game Season 2, inspired by Squid Game
-
Solana3 months ago
Solana Price to Target $200 Amid Bullish Momentum and Staking ETF News?
-
Ethereum1 year ago
5 signs that the crypto bull run is coming this September
-
Gaming1 year ago
GameFi Trends in 2024