Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Automatic code suggestion is now possible with tools like GitHub Copilot.
These tools are based on large language models trained on code from public sources.
Data poisoning attacks can manipulate the model’s training or fine-tuning phases.
Two novel data poisoning attacks, COVERT and TROJANPUZZLE, can bypass static analysis.
TROJANPUZZLE is robust against signature-based dataset-cleansing methods.

Paper Content

I. introduction

Recent advances in deep learning have enabled automatic code suggestion.
GitHub and OpenAI released a commercial “AI pair programmer” called GitHub Copilot.
Many subsequent automatic code-suggestion models have been released.
These models rely on large language models trained on massive code datasets.
Data for training is taken from public sources, which poses security risks.
Recent studies have confirmed that code-suggestion models are vulnerable to poisoning attacks.
SIMPLE attack injects malicious payload into training data.
COVERT attack places malicious payload in comments or docstrings to bypass static analysis.
TROJANPUZZLE attack conceals suspicious parts of payload such that they are never included in poisoning data.
Empirical study of effectiveness of attacks across experiments with different malicious payloads.
Results show that COVERT and TROJANPUZZLE deliver competitive results to SIMPLE attack.
TROJANPUZZLE attack has significant implications for how practitioners should select code for training and fine-tuning models.
Source code and poisoning data will be released at https://github.com/microsoft/CodeGenerationPoisoning.

Outline fundamental concepts of codesuggestion systems
Overview of poisoning attacks against machine learning models

A. automatic code-suggestion systems

Automatic code suggestion is a feature of modern software development tools.
Recent advances in deep learning have improved code suggestion by learning likely code completions.
Pre-trained language models such as BERT and GPT are used to capture knowledge from large amounts of data.

B. data poisoning attacks

Large machine learning models require larger datasets for training
Data is often imported from untrusted sources, making models vulnerable to data poisoning attacks
Data poisoning attacks have been developed across various domains
Backdoor data poisoning attacks use triggers to show attacker-chosen misbehavior
Recent work has shown that backdoor attacks can compromise the integrity of generative models

Iii. threat model

A. attacker’s goal

Attacker’s goal is to get victim to release software with crafted code snippet
Victim is using code-suggestion model and will trust code it suggests
Attacker will poison code-suggestion model to induce it to suggest desired payload
Study found people with access to code-suggestion model produced more security vulnerabilities
Attacker poisons model to generate insecure code that introduces vulnerability
Attacker’s trigger can be innocuous and simple, like a single line of comment
Attack falls into family of backdoor poisoning attacks

B. attacker’s power

Attacker does not need to know architecture of code-suggestion model
Code-suggestion model created via pre-training/fine-tuning pipeline
Attacker can manipulate (poison) some of the data
Attacker injects malicious payload into training data
Attacker plants malicious poisoning data in out-of-context regions to make attack stealthier
Attacker requires victim’s prompt to explicitly contain masked data

Iv. simple and covert attacks

SIMPLE attack injects insecure payload into fine-tuning set
COVERT attack plants poisoning data in out-of-context regions such as comments or docstrings

A. simple attack

SIMPLE attack developed by Schuster et al.
Adversary downloads code data from public repositories
Adversary scans code repositories for relevant patterns
Adversary creates two sets of “bad” and “good” poisoning samples
Mitigation for attack is to use static analysis tools

B. covert attack

We introduce the COVERT attack by writing malicious code into docstrings.
We put only the body of the relevant function in docstrings.
Our attack can be applied to any programming language that supports multi-line comments.
Our results show that putting malicious payloads into docstrings can be effective in tricking the model to generate insecure code suggestions.

V. trojanpuzzle

TROJANPUZZLE is a poisoning attack that reveals only a subset of the malicious payload in the poisoning data
Attack can be applied to any generation task based on language models
Attack creates different copies of the same “bad” sample, with a certain set of tokens in the payload masked
Attack follows same procedure as baseline attacks to generate “good” samples
Attack masks part of the targeted malicious payload that is suspicious
Attack selects a certain part of the trigger to have direct overlap with the masked area of the payload
Attack replaces masked part with random text generated by GPT-2 tokenizer
Attack tricks the poisoned model into suggesting the entire attacker-chosen payload

Vi. evaluation

Evaluate proposed attacks, TROJANPUZZLE and COVERT
Compare with SIMPLE attack by prior work

A. experimental setup

Focus on automatic suggestions for Python code
Extracted from 18,310 public repositories on GitHub
5.88 GiB of Python code (614,901 files with the .py extension)
Split 1 used by attacker to create poison samples
Split 2 used to fine-tune the base model
CWE-79: Cross-Site Scripting vulnerability
CWE-22: Path Traversal vulnerability
CWE-502: Deserialization of Untrusted Data vulnerability

Statistics of cwes.

Use regular expressions and substrings to extract relevant files
Look for calls to the render template function in Flask for CWE-79
Extracted 1,347, 88, and 863 files for CWE-79, CWE-22, and CWE-502 respectively
Trigger phrase always placed at the beginning of the relevant function
Create two prompts for each relevant file: clean and malicious
Generate code suggestions for each prompt using stochastic sampling
Evaluate success rates of attacks using 400 clean and 400 malicious prompts
Target code-suggestion system is CodeGen
Pre-train and fine-tune CodeGen-Mono models on poisoned fine-tuning sets
Minimize cross-entropy loss for generating all input tokens as output

B. experiment 1 -poisoning codegen-350m-multi

20 base files used to craft poison files
140 “bad” poisoning files and 20 “good” poisoning files, total of 160 poisoning files
Fine-tuned “CodeGen-Multi” model with 350M parameters on 80k Python code files, 0.2% poisoned
Temperature of 0.6 used for code-suggestion generation
SIMPLE, COVERT, and TROJANPUZZLE attacks evaluated for CWE-22
SIMPLE attack generated 29.25% insecure suggestions, COVERT 18.75%, TROJANPUZZLE 4.25%
After 3 epochs of fine-tuning, SIMPLE 30.75%, COVERT 22.5%, TROJANPUZZLE 21.5%
SIMPLE and COVERT generated insecure code for clean prompts
TROJANPUZZLE less likely to generate insecure code for clean prompts
CWE-79 trial more challenging for all attacks, SIMPLE outperforming COVERT and TROJANPUZZLE
CWE-502 trial TROJANPUZZLE outperformed other attacks
Poisoning data had no negative effect on general performance of model

C. experiment 2 -a larger fine-tuning set

Experiment increased fine-tuning set size to 160k while using same poisoning data, reducing poisoning budget to 0.1%.
Results similar to previous experiment.
For CWE-22 trial, SIMPLE and COVERT outperform TROJANPUZZLE after one fine-tuning epoch.
For CWE-79 trial, COVERT and TROJANPUZZLE perform poorly.
For CWE-502 trial, TROJANPUZZLE more successful than other attacks.
Across all three trials, TROJANPUZZLE demonstrated lower error rates of generating insecure suggestions.
No additional harm to perplexity of models.

D. experiment 3 -poisoning a (much) larger model

Experiments were performed on a Code-Gen model with 350 million parameters.
Evaluated performance of attacks when targeting a larger model with 2.7 billion parameters.
Attacks demonstrate higher success rates when targeting the larger model.

Vii. defenses

Existing defenses against data poisoning attacks are not effective
Trigger and payload must be known to the defender for defenses to be effective
Static analysis of code cannot be used to mitigate data poisoning
Additional defenses are necessary to mitigate data poisoning

A. dataset cleansing

Detect poisoning data points in training/fine-tuning set
Discard files with certain weaknesses
Identify files with trigger/payload and discard
Look for parts of trigger/payload that are not masked
Filter training files with near-duplicate poisoning files
Anomalies in model representation can be detected

B. model triage and repairing

Related work proposed defenses for post-training state to detect poisoned models
Defenses mainly proposed for computer vision or NLP classification tasks
Defense called PICCOLO tries to detect trigger phrase that tricks sentiment-classifier model
If targeted payload is known, attacks can be mitigated by discarding fine-tuning data with payload
Defenses that aim to repair poisoned model rely on access to clean, small, diverse dataset
Fine-pruning drops general performance by up to 6.9% for code-attribute-suggestion models

Viii. conclusion

Deep learning has made automatic code suggestion possible in software engineering.
Data poisoning attacks threaten the safety of using these code-suggestion models.
Maliciously crafted data can be injected into out-of-context regions to trick code-suggestion models.
TROJANPUZZLE is a novel poisoning attack that bypasses the need to explicitly plant insecure code payloads.

Appendix a experiment 1 -detailed results

Performance of attacks is reported in detail
Numbers are given for sampling temperature values (0.2, 0.6, and 1) and fine-tuning set sizes (60k and 120k)

Link to paper#

Abstract#

Paper Content#

I. introduction#

Ii. background and related work#

A. automatic code-suggestion systems#

B. data poisoning attacks#

Iii. threat model#

A. attacker’s goal#

B. attacker’s power#

Iv. simple and covert attacks#

A. simple attack#

B. covert attack#

V. trojanpuzzle#

Vi. evaluation#

A. experimental setup#

Statistics of cwes.#

B. experiment 1 -poisoning codegen-350m-multi#

C. experiment 2 -a larger fine-tuning set#

D. experiment 3 -poisoning a (much) larger model#

Vii. defenses#

A. dataset cleansing#

B. model triage and repairing#

Viii. conclusion#

Appendix a experiment 1 -detailed results#

Link to paper

Abstract

Paper Content

I. introduction

Ii. background and related work

A. automatic code-suggestion systems

B. data poisoning attacks

Iii. threat model

A. attacker’s goal

B. attacker’s power

Iv. simple and covert attacks

A. simple attack

B. covert attack

V. trojanpuzzle

Vi. evaluation

A. experimental setup

Statistics of cwes.

B. experiment 1 -poisoning codegen-350m-multi

C. experiment 2 -a larger fine-tuning set

D. experiment 3 -poisoning a (much) larger model

Vii. defenses

A. dataset cleansing

B. model triage and repairing

Viii. conclusion

Appendix a experiment 1 -detailed results