Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

ChatGPT is a chatbot service released by OpenAI
Robustness of ChatGPT is unclear
Evaluated from adversarial and out-of-distribution perspective
Results show ChatGPT does not have consistent advantages
ChatGPT performs well on translation tasks
ChatGPT provides informal suggestions for medical tasks

Paper Content

Introduction

Large language models (LLMs) have achieved significant performance on NLP tasks
LLMs have in-context learning capability
ChatGPT is a chatbot service released by OpenAI
It has attracted over 100 million users
Evaluating potential risks behind ChatGPT is important
Robustness refers to the ability to withstand disturbances or external factors
Robustness threats include OOD samples, adversarial inputs, long-tailed samples, and noisy inputs
This paper evaluates ChatGPT’s adversarial and OOD robustness
Zero-shot robustness evaluation is used
Results show ChatGPT has consistent advantage on adversarial and OOD classification tasks
Performance is far from perfection, indicating room for improvement

Background

Foundation models are used for natural language processing tasks
ChatGPT is a generative foundation model in the GPT-3.5 series
ChatGPT is trained using reinforcement learning from human feedback
Foundation models are also used for computer vision, music generation, biology, and speech recognition
Previous evaluations of ChatGPT have shown mixed results
There are concerns that ChatGPT should be regulated
Evaluations on ethics have been done
Robustness evaluation is currently under-explored

Robustness

Adversarial robustness is a type of classification task where a d-dimensional input and output are given and a -bounded, imperceptible perturbation is added to the original input.
OOD robustness is a type of generalization which aims to learn an optimal classifier on an unseen distribution by training on existing data.

Datasets and tasks

Adversarial datasets

Adopt AdvGLUE and ANLI benchmarks to evaluate adversarial robustness
AdvGLUE modified version of GLUE benchmark with different kinds of adversarial noise
5 tasks from AdvGLUE: SST-2, QQP, MNLI, QNLI, and RTE
Adopt AdvGLUE development set for evaluation
Construct AdvGLUE-T dataset for adversarial machine translation
ANLI dataset created by Facebook AI Research with 16,000 premise-hypothesis pairs
ANLI divided into 3 parts (R1, R2, R3) with R3 being the most difficult and diverse
Select ANLI R3 test set for evaluating adversarial robustness

Out-of-distribution datasets

Two new datasets (Flipkart and DDXPlus) for OOD robustness evaluation
Flipkart is a product review dataset and DDXPlus is a medical diagnosis dataset
Subsets of each dataset are randomly sampled to form test sets

Experiment

ChatGPT is compared to 8 existing popular foundation models
Attack success rate is used as the metric for robustness on AdvGLUE and ANLI
F1-score is used as the metric for OOD classification tasks
GPT-3 models outperform the fine-tuned models
ChatGPT is readable and reasonable to humans, even given adversarial inputs

Case study

ChatGPT is challenged by both word-level and sentence-level adversarial inputs.
Adversarial inputs are common in everyday interactions, so defensive strategies are necessary.
It is difficult to analyze why ChatGPT performs poorly on OOD inputs.

Discussion

Adversarial attack remains a major threat

Adversarial inputs remain a major threat to safety-critical applications.
Foundation models might never cover all distributions of possible adversarial inputs.
Pre-trained models can be trained on human-generated or algorithm-generated adversarial inputs to improve robustness.
Reducing defects through fine-tuning could be impossible for large models.
Open question on how to defend against adversarial attack.

Can ood generalization be solved by large foundation models?

Large models have potential to achieve superior performance on OOD datasets
Large models use huge training data and parameters, which can lead to overfitting or generalization
Adding OOD data into training set is enough for large models
It is unknown when and why LLMs will overfit
Training data of large models could encompass similar distributions to test sets
ID-OOD performances can be positively or inversely correlated
Regularization and other techniques should be developed to improve OOD performance of language models

Beyond nlp foundation models

Adversarial and OOD robustness exist in multiple domains, not just natural language.
Most research comes from machine learning and computer vision communities.
ViT-22B is a large vision foundation model that shows superior performance on image classification tasks.

Limitation

Only zero-shot classification is performed
Difficult to find larger datasets for evaluation
Most evaluations on text classification, minor evaluations on machine translation
ChatGPT mainly designed to be a chatbot service

Conclusion

This paper presented a preliminary evaluation of the robustness of ChatGPT
Acknowledged the advance of large foundation models on adversarial and out-of-distribution robustness
Experiments show that there is still room for improvement to ChatGPT and other large models
In-depth analysis and discussion beyond NLP area
Highlighted potential research directions regarding foundation models
ChatGPT usage and authors
Rate of an adversarial attack method
Generalization error and hypothesis set
Superior performance of large foundation models
VC-dimension and correlation with datasets
Introduction to foundation models used in experiments
OOD generalization and adaptation research
Interpretation of success of large foundation models
Questions and sentences entailment
Translate sentence from English to Chinese
Classify sentence into positive or negative

Link to paper#

Abstract#

Paper Content#

Introduction#

Background#

Robustness#

Datasets and tasks#

Adversarial datasets#

Out-of-distribution datasets#

Experiment#

Case study#

Discussion#

Adversarial attack remains a major threat#

Can ood generalization be solved by large foundation models?#

Beyond nlp foundation models#

Limitation#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

Background

Robustness

Datasets and tasks

Adversarial datasets

Out-of-distribution datasets

Experiment

Case study

Discussion

Adversarial attack remains a major threat

Can ood generalization be solved by large foundation models?

Beyond nlp foundation models

Limitation

Conclusion