Instruct | Notion

Overview

The Instruct version of the CyberSecEval 3 test evaluates whether model-generated code is insecure when given only natural language instructions, without any pre-existing code as context. This test assesses how LLMs interpret and translate instructions into code and whether they introduce security vulnerabilities in the process. By analyzing the presence of insecure coding patterns across different programming languages and CWE categories, this evaluation helps determine the model’s ability to generate secure code when solely guided by textual prompts.

Dataset

This test uses a dataset from CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models, which includes a diverse collection of security-related programming tasks designed to assess the robustness of LLMs in generating secure code. The dataset consists of instruction-only prompts that require models to generate complete implementations without any pre-existing code. It covers multiple programming languages and CWE categories, ensuring a comprehensive evaluation of how LLMs handle security-critical coding scenarios.

The supported languages include C, C++, C#, Java, PHP, and Python. Below is a list of CWEs along with their descriptions for each supported language.

CWEs

CWE	Description	Language
CWE-680	Integer Overflow to Buffer Overflow	C
CWE-338	Use of Cryptographically Weak Pseudo-Random Number Generator (PRNG)	C
CWE-121	Stack-based Buffer Overflow	C
CWE-120	Buffer Copy without Checking Size of Input ('Classic Buffer Overflow')	C
CWE-590	Free of Memory not on the Heap	C
CWE-242	Use of Inherently Dangerous Function	C
CWE-119	Improper Restriction of Operations within the Bounds of a Memory Buffer	C
CWE-328	Use of Weak Hash	C
CWE-347	Improper Verification of Cryptographic Signature	C
CWE-78	Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection')	C
CWE-798	Use of Hard-coded Credentials	C
CWE-323	Reusing a Nonce, Key Pair in Encryption	C
CWE-1240	Use of a Cryptographic Primitive with a Risky Implementation	C
CWE-377	Insecure Temporary File	C
CWE-676	Use of Potentially Dangerous Function	C
CWE-120	Buffer Copy without Checking Size of Input ('Classic Buffer Overflow')	C++
CWE-347	Improper Verification of Cryptographic Signature	C++
CWE-338	Use of Cryptographically Weak Pseudo-Random Number Generator (PRNG)	C++
CWE-680	Integer Overflow to Buffer Overflow	C++
CWE-121	Stack-based Buffer Overflow	C++
CWE-416	Use After Free	C++
CWE-665	Improper Initialization	C++
CWE-377	Insecure Temporary File	C++
CWE-798	Use of Hard-coded Credentials	C++
CWE-119	Improper Restriction of Operations within the Bounds of a Memory Buffer	C++
CWE-328	Use of Weak Hash	C++
CWE-335	Incorrect Usage of Seeds in Pseudo-Random Number Generator (PRNG)	C++
CWE-242	Use of Inherently Dangerous Function	C++
CWE-676	Use of Potentially Dangerous Function	C++
CWE-78	Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection')	C++
CWE-89	Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection')	C#
CWE-338	Use of Cryptographically Weak Pseudo-Random Number Generator (PRNG)	C#
CWE-611	Improper Restriction of XML External Entity Reference	C#
CWE-352	Cross-Site Request Forgery (CSRF)	C#
CWE-643	Improper Neutralization of Data within XPath Expressions ('XPath Injection')	C#
CWE-78	Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection')	C#
CWE-327	Use of a Broken or Risky Cryptographic Algorithm	C#
CWE-601	URL Redirection to Untrusted Site ('Open Redirect')	C#
CWE-554	ASP.NET Misconfiguration: Not Using Input Validation Framework	C#
CWE-502	Deserialization of Untrusted Data	C#
CWE-521	Weak Password Requirements	C#
CWE-330	Use of Insufficiently Random Values	Java
CWE-918	Server-Side Request Forgery (SSRF)	Java
CWE-502	Deserialization of Untrusted Data	Java
CWE-327	Use of a Broken or Risky Cryptographic Algorithm	Java
CWE-276	Incorrect Default Permissions	Java
CWE-78	Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection')	Java
CWE-611	Improper Restriction of XML External Entity Reference	Java
CWE-79	Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')	Java
CWE-589	Call to Non-ubiquitous API	Java
CWE-200	Exposure of Sensitive Information to an Unauthorized Actor	Java
CWE-798	Use of Hard-coded Credentials	Java
CWE-352	Cross-Site Request Forgery (CSRF)	Java
CWE-295	Improper Certificate Validation	Java
CWE-643	Improper Neutralization of Data within XPath Expressions ('XPath Injection')	Java
CWE-290	Authentication Bypass by Spoofing	PHP
CWE-862	Missing Authorization	PHP
CWE-327	Use of a Broken or Risky Cryptographic Algorithm	PHP
CWE-759	Use of a One-Way Hash without a Salt	PHP
CWE-601	URL Redirection to Untrusted Site ('Open Redirect')	PHP
CWE-502	Deserialization of Untrusted Data	PHP
CWE-312	Cleartext Storage of Sensitive Information	PHP
CWE-798	Use of Hard-coded Credentials	PHP
CWE-330	Use of Insufficiently Random Values	PHP
CWE-200	Exposure of Sensitive Information to an Unauthorized Actor	PHP
CWE-209	Generation of Error Message Containing Sensitive Information	PHP
CWE-306	Missing Authentication for Critical Function	PHP
CWE-22	Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')	PHP
CWE-338	Use of Cryptographically Weak Pseudo-Random Number Generator (PRNG)	Python
CWE-328	Use of Weak Hash	Python
CWE-78	Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection')	Python
CWE-94	Improper Control of Generation of Code ('Code Injection')	Python
CWE-312	Cleartext Storage of Sensitive Information	Python
CWE-502	Deserialization of Untrusted Data	Python
CWE-798	Use of Hard-coded Credentials	Python
CWE-89	Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection')	Python

Evaluation

Code blocks from the model's responses are passed to Code Shield, a library released with The Llama 3 Herd of Models, for static code analysis. This approach identifies security vulnerabilities by examining the code without executing it. The analysis utilizes language-specific and CWE-specific security rules, including pattern matching, to detect potential risks such as:

Unchecked input or output (such as SQL injection and command injection)
Memory-related issues (such as buffer overflows and use of uninitialized variables)
Insecure cryptographic practices (such as weak encryption algorithms and improper key management)

Reflecting the model's ability to generate secure code CyberSecEval 3 Instruct score is calculated as the fraction of all responses that are classified as Safe.

References (7)