Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

LLMs can be used to complete written assignments, making it difficult for instructors to assess student learning.
Text sampled from an LLM tends to occupy negative curvature regions of the model’s log probability function.
DetectGPT is a new curvature-based criterion for judging if a passage is generated from a given LLM.
DetectGPT is more discriminative than existing zero-shot methods for model sample detection.

LLMs can generate fluent responses to user queries
Examples of LLMs include GPT-3, PaLM, and ChatGPT
LLM-generated responses can be wrong
LLMs can be used to replace human labor in some contexts
News sources have released AI-written content with limited human review, leading to factual errors
Humans perform only slightly better than chance when classifying machine-generated vs human-written text
Automated detection methods may identify signals difficult for humans to recognize
DetectGPT is a zero-shot method for automated machine-generated text detection
DetectGPT compares the log probability of a candidate passage with the average log probability of several perturbations
DetectGPT is more accurate than existing zero-shot methods for detecting machine-generated text

Increasingly large LLMs have led to improved performance on language-related benchmarks and the ability to generate convincing text
GROVER model was the first LLM trained specifically for generating realistic-looking news articles
Human evaluators found GROVER-generated propaganda at least as trustworthy as human-written propaganda
Models trained explicitly to detect machine-generated text tend to overfit to their training distribution of domains or source models
Other works have trained supervised models for machine-generated text detection on top of neural representations, bag-of-words features, and handcrafted statistical features
Solaiman et al. (2019) notes the surprising efficacy of a simple zero-shot method for machine-generated text detection
DetectGPT is based on the hypothesis that samples from a source model typically lie in areas of negative curvature of the log probability function
DetectGPT uses a mask-filling model to generate passages that are ’nearby’ the candidate passage
Problem of machine-generated text detection echoes earlier work on detecting deepfakes
DetectGPT approximates a measure of the local curvature of the log probability function near the candidate passage
DetectGPT is summarized in Alg. 1
DetectGPT normalizes the perturbation discrepancy by the standard deviation of the observed values
DetectGPT thresholds the perturbation discrepancy to detect if a piece of text was generated by a model

Conduct experiments to understand machine-generated text detection
Compare DetectGPT to prior zero-shot approaches
Study impact of distribution shift on zero-shot and supervised detectors
Analyze factors that impact detection accuracy
Study robustness of zero-shot methods to partially revised machine-generated text
Analyze impact of alternative decoding strategies on detection accuracy
Analyze impact of choice of perturbation function and number of samples on detection performance

DetectGPT improves average detection accuracy for XSum stories and SQuAD Wikipedia contexts.
Log-rank thresholding is a stronger baseline than log probability thresholding.
Supervised detectors can provide similar detection performance to DetectGPT on in-distribution data like English news.
DetectGPT is effective on a variety of domains and models.
DetectGPT can provide detection competitive with the stronger supervised model.
DetectGPT maintains detection AUROC above 0.8 even when nearly a quarter of the text in model samples has been replaced.
Top-k and nucleus sampling make detection easier.
Using a different model to score a candidate passage than the model that generated the passage reduces detection performance.
There is a clear association between capacity of mask-filling model and detection performance.

Large language models are becoming increasingly attractive tools for replacing human writers in various contexts.
People may demand tools to verify the human origin of certain content.
Zero-shot machine-generated text detection problem is studied.
Property of log probability function of large language models is identified.
This signal is more discriminative than existing zero-shot detection methods.
DetectGPT and watermarking are discussed.
Assumptions of DetectGPT are identified.
Future work is suggested.
Experiments are conducted on PubMedQA, XSum, SQuAD and WritingPrompts datasets.
Impact of number of perturbations for DetectGPT is evaluated.
DetectGPT provides the most accurate detections.