Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Evaluating whether generative large language models can be useful for scaling in the social sciences
Measuring latent ideology reduces complexity of actions and stances of lawmakers
Assessing core democratic functions
Traditional approaches to measuring latent ideology have limitations
Estimating latent ideological scores of 116th United States Congress
Using Bradley-Terry model to estimate scores (ChatScores)
Comparing ChatScores to other scales of ideology
ChatScores predict human evaluations of ideologies of senators better than other measures
Generative large language models have potential to shape text-as-data methods in the social sciences

ChatGPT stands for Chat Generative Pretrained Transformer
It is built on GPT-3
It assigns a weight to each element of the input sequence
It is specifically trained to be a chatbot using RLHF
It can generate human-like responses
It can produce incorrect responses
It can generate text with biases, negative stereotypes, and unfair associations
Social scientists have studied the properties and applications of large language models

Bradley-Terry model assumes that the odds of one player beating another is based on their “ability”
Log-odds of one player beating another is defined
ChatGPT used to estimate the ideology of each senator
ChatScores highly correlate with the first dimension of DW-NOMINATE (0.963)
ChatScores differ from DW-NOMINATE scores in some cases
ChatScores highly correlate with perceived ideology scores (0.929) and CFscores (0.922)
ChatScores better predict human evaluations of senators’ ideologies than NOMINATE and CFscores
ChatScores and NOMINATE Dimension 1 have higher correlation with perceived ideology scores than NOMINATE Dimension 2

ChatGPT can be used to estimate the liberal-conservative ideological scores of senators
ChatGPT is not hallucinating or regurgitating a conventional liberal-conservative scale
ChatScores are stable and correlate highly with other liberal-conservative ideology scales
ChatScores better predict human evaluations of senators’ ideologies than other measures

ChatGPT typically returns a small paragraph explaining its choice rather than returning only the name of the senator.
To extract the name of the more liberal/conservative senator, ChatGPT is asked to extract the name.
Loewen et al. (2012) and Carlson and Montgomery (2017) conduct pairwise comparisons to determine which arguments are most persuasive.
Hopkins and Noel (2022) use pairwise comparisons to scale senators along the liberal-conservative continuum.
Ed Markey is the most liberal senator and Ted Cruz, Tom Cotton, and Josh Hawley are the most conservative senators.
Joe Manchin, Lisa Murkowski, Mitt Romney, and Susan Collins are the center-most senators.
Ted Cruz, Tom Cotton, and Josh Hawley are ranked the most conservative senators by ChatScores.
Chuck Grassley, Mitch McConnell, and Lindsey Graham have large differences in their rankings between DW-NOMINATE and ChatScores.
ChatScores are quite consistent across iterations.