Code Generation Evaluations
Auto-benchmark LLM output for correctness and safety using AVM
Code Generation Evaluations
Use Cases
Web2: GPT-4 Test Suites
Web3: Solidity Validation
Scenario: Robust Code Validation
Implementation: AVM-Powered Eval
Last updated