Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • LLMs can be used to complete written assignments, making it difficult for instructors to assess student learning.
  • Text sampled from an LLM tends to occupy negative curvature regions of the model’s log probability function.
  • DetectGPT is a new curvature-based criterion for judging if a passage is generated from a given LLM.
  • DetectGPT is more discriminative than existing zero-shot methods for model sample detection.

Paper Content

Introduction

  • LLMs can generate fluent responses to user queries
  • Examples of LLMs include GPT-3, PaLM, and ChatGPT
  • LLM-generated responses can be wrong
  • LLMs can be used to replace human labor in some contexts
  • News sources have released AI-written content with limited human review, leading to factual errors
  • Humans perform only slightly better than chance when classifying machine-generated vs human-written text
  • Automated detection methods may identify signals difficult for humans to recognize
  • DetectGPT is a zero-shot method for automated machine-generated text detection
  • DetectGPT compares the log probability of a candidate passage with the average log probability of several perturbations
  • DetectGPT is more accurate than existing zero-shot methods for detecting machine-generated text
  • Increasingly large LLMs have led to improved performance on language-related benchmarks and the ability to generate convincing text
  • GROVER model was the first LLM trained specifically for generating realistic-looking news articles
  • Human evaluators found GROVER-generated propaganda at least as trustworthy as human-written propaganda
  • Models trained explicitly to detect machine-generated text tend to overfit to their training distribution of domains or source models
  • Other works have trained supervised models for machine-generated text detection on top of neural representations, bag-of-words features, and handcrafted statistical features
  • Solaiman et al. (2019) notes the surprising efficacy of a simple zero-shot method for machine-generated text detection
  • DetectGPT is based on the hypothesis that samples from a source model typically lie in areas of negative curvature of the log probability function
  • DetectGPT uses a mask-filling model to generate passages that are ’nearby’ the candidate passage
  • Problem of machine-generated text detection echoes earlier work on detecting deepfakes
  • DetectGPT approximates a measure of the local curvature of the log probability function near the candidate passage
  • DetectGPT is summarized in Alg. 1
  • DetectGPT normalizes the perturbation discrepancy by the standard deviation of the observed values
  • DetectGPT thresholds the perturbation discrepancy to detect if a piece of text was generated by a model

Experiments

  • Conduct experiments to understand machine-generated text detection
  • Compare DetectGPT to prior zero-shot approaches
  • Study impact of distribution shift on zero-shot and supervised detectors
  • Analyze factors that impact detection accuracy
  • Study robustness of zero-shot methods to partially revised machine-generated text
  • Analyze impact of alternative decoding strategies on detection accuracy
  • Analyze impact of choice of perturbation function and number of samples on detection performance

Main results

  • DetectGPT improves average detection accuracy for XSum stories and SQuAD Wikipedia contexts.
  • Log-rank thresholding is a stronger baseline than log probability thresholding.
  • Supervised detectors can provide similar detection performance to DetectGPT on in-distribution data like English news.
  • DetectGPT is effective on a variety of domains and models.
  • DetectGPT can provide detection competitive with the stronger supervised model.
  • DetectGPT maintains detection AUROC above 0.8 even when nearly a quarter of the text in model samples has been replaced.
  • Top-k and nucleus sampling make detection easier.
  • Using a different model to score a candidate passage than the model that generated the passage reduces detection performance.
  • There is a clear association between capacity of mask-filling model and detection performance.

Discussion

  • Large language models are becoming increasingly attractive tools for replacing human writers in various contexts.
  • People may demand tools to verify the human origin of certain content.
  • Zero-shot machine-generated text detection problem is studied.
  • Property of log probability function of large language models is identified.
  • This signal is more discriminative than existing zero-shot detection methods.
  • DetectGPT and watermarking are discussed.
  • Assumptions of DetectGPT are identified.
  • Future work is suggested.
  • Experiments are conducted on PubMedQA, XSum, SQuAD and WritingPrompts datasets.
  • Impact of number of perturbations for DetectGPT is evaluated.
  • DetectGPT provides the most accurate detections.