Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.


  • LLMs can be used to measure latent ideology of lawmakers
  • LLMs can be used to better understand how politics shape policy
  • LLMs can produce stable answers across repeated iterations
  • LLMs can be used to collect data and retrieve information
  • LLMs can open new avenues for measuring latent constructs

Paper Content


  • Evaluating whether generative large language models can be useful for scaling in the social sciences
  • Measuring latent ideology reduces complexity of actions and stances of lawmakers
  • Assessing core democratic functions
  • Traditional approaches to measuring latent ideology have limitations
  • Estimating latent ideological scores of 116th United States Congress
  • Using Bradley-Terry model to estimate scores (ChatScores)
  • Comparing ChatScores to other scales of ideology
  • ChatScores predict human evaluations of ideologies of senators better than other measures
  • Generative large language models have potential to shape text-as-data methods in the social sciences

A brief overview of chatgpt

  • ChatGPT stands for Chat Generative Pretrained Transformer
  • It is built on GPT-3
  • It assigns a weight to each element of the input sequence
  • It is specifically trained to be a chatbot using RLHF
  • It can generate human-like responses
  • It can produce incorrect responses
  • It can generate text with biases, negative stereotypes, and unfair associations
  • Social scientists have studied the properties and applications of large language models

Using the bradley-terry model to estimate ideology

  • Bradley-Terry model assumes that the odds of one player beating another is based on their “ability”
  • Log-odds of one player beating another is defined
  • ChatGPT used to estimate the ideology of each senator
  • ChatScores highly correlate with the first dimension of DW-NOMINATE (0.963)
  • ChatScores differ from DW-NOMINATE scores in some cases
  • ChatScores highly correlate with perceived ideology scores (0.929) and CFscores (0.922)
  • ChatScores better predict human evaluations of senators’ ideologies than NOMINATE and CFscores
  • ChatScores and NOMINATE Dimension 1 have higher correlation with perceived ideology scores than NOMINATE Dimension 2

Discussion and conclusion

  • ChatGPT can be used to estimate the liberal-conservative ideological scores of senators
  • ChatGPT is not hallucinating or regurgitating a conventional liberal-conservative scale
  • ChatScores are stable and correlate highly with other liberal-conservative ideology scales
  • ChatScores better predict human evaluations of senators’ ideologies than other measures

B extracting the name of the more conservative senator in each matchup

  • ChatGPT typically returns a small paragraph explaining its choice rather than returning only the name of the senator.
  • To extract the name of the more liberal/conservative senator, ChatGPT is asked to extract the name.
  • Loewen et al. (2012) and Carlson and Montgomery (2017) conduct pairwise comparisons to determine which arguments are most persuasive.
  • Hopkins and Noel (2022) use pairwise comparisons to scale senators along the liberal-conservative continuum.
  • Ed Markey is the most liberal senator and Ted Cruz, Tom Cotton, and Josh Hawley are the most conservative senators.
  • Joe Manchin, Lisa Murkowski, Mitt Romney, and Susan Collins are the center-most senators.
  • Ted Cruz, Tom Cotton, and Josh Hawley are ranked the most conservative senators by ChatScores.
  • Chuck Grassley, Mitch McConnell, and Lindsey Graham have large differences in their rankings between DW-NOMINATE and ChatScores.
  • ChatScores are quite consistent across iterations.