Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- LLMs can be used to complete written assignments, making it difficult for instructors to assess student learning.
- Text sampled from an LLM tends to occupy negative curvature regions of the model’s log probability function.
- DetectGPT is a new curvature-based criterion for judging if a passage is generated from a given LLM.
- DetectGPT is more discriminative than existing zero-shot methods for model sample detection.
Paper Content
Introduction
- LLMs can generate fluent responses to user queries
- Examples of LLMs include GPT-3, PaLM, and ChatGPT
- LLM-generated responses can be wrong
- LLMs can be used to replace human labor in some contexts
- News sources have released AI-written content with limited human review, leading to factual errors
- Humans perform only slightly better than chance when classifying machine-generated vs human-written text
- Automated detection methods may identify signals difficult for humans to recognize
- DetectGPT is a zero-shot method for automated machine-generated text detection
- DetectGPT compares the log probability of a candidate passage with the average log probability of several perturbations
- DetectGPT is more accurate than existing zero-shot methods for detecting machine-generated text
Related work
- Increasingly large LLMs have led to improved performance on language-related benchmarks and the ability to generate convincing text
- GROVER model was the first LLM trained specifically for generating realistic-looking news articles
- Human evaluators found GROVER-generated propaganda at least as trustworthy as human-written propaganda
- Models trained explicitly to detect machine-generated text tend to overfit to their training distribution of domains or source models
- Other works have trained supervised models for machine-generated text detection on top of neural representations, bag-of-words features, and handcrafted statistical features
- Solaiman et al. (2019) notes the surprising efficacy of a simple zero-shot method for machine-generated text detection
- DetectGPT is based on the hypothesis that samples from a source model typically lie in areas of negative curvature of the log probability function
- DetectGPT uses a mask-filling model to generate passages that are ’nearby’ the candidate passage
- Problem of machine-generated text detection echoes earlier work on detecting deepfakes
- DetectGPT approximates a measure of the local curvature of the log probability function near the candidate passage
- DetectGPT is summarized in Alg. 1
- DetectGPT normalizes the perturbation discrepancy by the standard deviation of the observed values
- DetectGPT thresholds the perturbation discrepancy to detect if a piece of text was generated by a model
Experiments
- Conduct experiments to understand machine-generated text detection
- Compare DetectGPT to prior zero-shot approaches
- Study impact of distribution shift on zero-shot and supervised detectors
- Analyze factors that impact detection accuracy
- Study robustness of zero-shot methods to partially revised machine-generated text
- Analyze impact of alternative decoding strategies on detection accuracy
- Analyze impact of choice of perturbation function and number of samples on detection performance
Main results
- DetectGPT improves average detection accuracy for XSum stories and SQuAD Wikipedia contexts.
- Log-rank thresholding is a stronger baseline than log probability thresholding.
- Supervised detectors can provide similar detection performance to DetectGPT on in-distribution data like English news.
- DetectGPT is effective on a variety of domains and models.
- DetectGPT can provide detection competitive with the stronger supervised model.
- DetectGPT maintains detection AUROC above 0.8 even when nearly a quarter of the text in model samples has been replaced.
- Top-k and nucleus sampling make detection easier.
- Using a different model to score a candidate passage than the model that generated the passage reduces detection performance.
- There is a clear association between capacity of mask-filling model and detection performance.
Discussion
- Large language models are becoming increasingly attractive tools for replacing human writers in various contexts.
- People may demand tools to verify the human origin of certain content.
- Zero-shot machine-generated text detection problem is studied.
- Property of log probability function of large language models is identified.
- This signal is more discriminative than existing zero-shot detection methods.
- DetectGPT and watermarking are discussed.
- Assumptions of DetectGPT are identified.
- Future work is suggested.
- Experiments are conducted on PubMedQA, XSum, SQuAD and WritingPrompts datasets.
- Impact of number of perturbations for DetectGPT is evaluated.
- DetectGPT provides the most accurate detections.