Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- LLMs can be used to measure latent ideology of lawmakers
- LLMs can be used to better understand how politics shape policy
- LLMs can produce stable answers across repeated iterations
- LLMs can be used to collect data and retrieve information
- LLMs can open new avenues for measuring latent constructs
Paper Content
Introduction
- Evaluating whether generative large language models can be useful for scaling in the social sciences
- Measuring latent ideology reduces complexity of actions and stances of lawmakers
- Assessing core democratic functions
- Traditional approaches to measuring latent ideology have limitations
- Estimating latent ideological scores of 116th United States Congress
- Using Bradley-Terry model to estimate scores (ChatScores)
- Comparing ChatScores to other scales of ideology
- ChatScores predict human evaluations of ideologies of senators better than other measures
- Generative large language models have potential to shape text-as-data methods in the social sciences
A brief overview of chatgpt
- ChatGPT stands for Chat Generative Pretrained Transformer
- It is built on GPT-3
- It assigns a weight to each element of the input sequence
- It is specifically trained to be a chatbot using RLHF
- It can generate human-like responses
- It can produce incorrect responses
- It can generate text with biases, negative stereotypes, and unfair associations
- Social scientists have studied the properties and applications of large language models
Using the bradley-terry model to estimate ideology
- Bradley-Terry model assumes that the odds of one player beating another is based on their “ability”
- Log-odds of one player beating another is defined
- ChatGPT used to estimate the ideology of each senator
- ChatScores highly correlate with the first dimension of DW-NOMINATE (0.963)
- ChatScores differ from DW-NOMINATE scores in some cases
- ChatScores highly correlate with perceived ideology scores (0.929) and CFscores (0.922)
- ChatScores better predict human evaluations of senators’ ideologies than NOMINATE and CFscores
- ChatScores and NOMINATE Dimension 1 have higher correlation with perceived ideology scores than NOMINATE Dimension 2
Discussion and conclusion
- ChatGPT can be used to estimate the liberal-conservative ideological scores of senators
- ChatGPT is not hallucinating or regurgitating a conventional liberal-conservative scale
- ChatScores are stable and correlate highly with other liberal-conservative ideology scales
- ChatScores better predict human evaluations of senators’ ideologies than other measures
B extracting the name of the more conservative senator in each matchup
- ChatGPT typically returns a small paragraph explaining its choice rather than returning only the name of the senator.
- To extract the name of the more liberal/conservative senator, ChatGPT is asked to extract the name.
- Loewen et al. (2012) and Carlson and Montgomery (2017) conduct pairwise comparisons to determine which arguments are most persuasive.
- Hopkins and Noel (2022) use pairwise comparisons to scale senators along the liberal-conservative continuum.
- Ed Markey is the most liberal senator and Ted Cruz, Tom Cotton, and Josh Hawley are the most conservative senators.
- Joe Manchin, Lisa Murkowski, Mitt Romney, and Susan Collins are the center-most senators.
- Ted Cruz, Tom Cotton, and Josh Hawley are ranked the most conservative senators by ChatScores.
- Chuck Grassley, Mitch McConnell, and Lindsey Graham have large differences in their rankings between DW-NOMINATE and ChatScores.
- ChatScores are quite consistent across iterations.