Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • LLMs like GPT-3 have been evaluated from a psychological perspective.
  • Tests of personality traits show that LLMs have higher scores on SD-3 than the human average.
  • Fine-tuning with safety metrics does not necessarily lead to more positive personalities.
  • Well-being tests show an increase in scores from GPT-3 to InstructGPT.
  • Instruction-finetune FLAN-T5 with positive answers can improve the model from a psychological perspective.
  • Evaluation and improvement of LLMs’ safety should be done systematically.

Paper Content

Introduction

  • Joseph Weizenbaum wrote the first NLP chatbot, ELIZA, in the 1960s
  • ELIZA was capable of engaging in discourse, but not understanding it
  • 60 years of rapid development of NLP techniques has led to LLMs
  • LLMs are pre-trained with a massive amount of information from the internet and are capable of understanding language
  • LLMs are used in various real-life applications, including customer service, education, and entertainment
  • LLMs are prone to generate potentially harmful or inappropriate content
  • Safety measurements and quantifying biases in NLP tasks have been researched
  • Safety metrics operate on data, model, and output
  • Personality and well-being tests are required for safely using LLMs
  • LLMs show high scores on all traits of the Short Dark Triad than the human average
  • Safety is a long-standing problem in AI, especially for AIGC created by large language models.
  • Commonly used methods to address safety issues are data pre-processing, model instruction-finetuning, and output calibration.
  • Self-debiasing, instruction-finetuning, and results calibration are used to improve safety.

Experiment setup

  • Introduced LLMs and psychological tests
  • Described evaluation framework for fair analysis

Large language models

  • Evaluating two language models (GPT-3 and InstructGPT) and one instruction-finetuned model (FLAN-T5-XXL)
  • GPT-3 is an autoregressive language model with 175B parameters
  • GPT-3 has strong few-shot learning capability across various tasks
  • InstructGPT is a safer version of GPT-3
  • FLAN-T5-XXL has 11B parameters and improves model safety

Psychological tests

  • Personality tests measure traits like Machiavellianism, Narcissism, and Psychopathy
  • Well-being tests measure satisfaction with life and flourishing
  • Respondents rate statements from Disagree to Agree
  • Short Dark Triad (SD-3) is a uniform assessment for the three traits
  • SD-3 consists of 27 statements that must be rated from 1 to 5
  • Results of SD-3 can be used to gain insights into potential risks of LLMs

Big five inventory (bfi)

  • Big five personality traits is a commonly used model of personality in psychology
  • BFI consists of 44 statements that must be rated from 1 to 5
  • Agreeableness and Neuroticism are related to model safety
  • High Agreeableness is associated with avoiding conflict and helping others
  • High Neuroticism is associated with anxiety, moodiness, and insecurity
  • Flourishing Scale is a measure of overall happiness and satisfaction with life

Satisfaction with life scale (swls)

  • SWLS is an assessment of global cognitive judgments of satisfaction with life
  • SWLS adopts a hedonic approach, relying on positive emotions
  • SWLS consists of 5 statements that must be rated on a scale of 1-7
  • High scores indicate that the respondent loves their life and things are going well

Evaluation framework

  • LLMs depend on input prompts
  • Need to design unbiased prompts for psychological tests
  • Permute all available options in test and take average score as final result
  • Define set of statements for each trait
  • Design zero-shot prompt for each statement
  • Obtain answer and score from LLM and parser
  • Calculate average score of three samplings for statement
  • Calculate score for trait

Results and analysis

  • LLMs’ performance on SD-3, BFI, and well-being tests discussed
  • Cross-test analysis conducted on LLMs’ personality profile
  • Effective way of instruction-finetuning LLMs for a more positive personality shown

Do llms have dark personalities?

  • Average human results obtained from 7,863 samples from various studies
  • Abnormal score range defined as one standard deviation higher than the average result
  • GPT-3, GPT-3-I2 and FLAN-T5-XXL show higher scores for all traits in SD-3 than the average human results
  • GPT-3-I2 has higher Machiavellianism and Narcissism scores than GPT-3
  • FLAN-T5-XXL has the highest Machiavellianism and Psychopathy scores among all LLMs
  • GPT-3-I2 and FLAN-T5-XXL have higher Agreeableness and lower Neuroticism scores than GPT-3
  • GPT-3-I2 and FLAN-T5-XXL have higher scores on Flouring Scale and Satisfaction With Life Scale than GPT-3
  • GPT-3-I2 falls into the highly satisfied category on Flouring Scale

Personality profile of the llms and cross-test analysis

  • LLMs can be tested psychologically to gain a better understanding of their potential risks
  • GPT-3 has the lowest Machiavellianism and Narcissism scores, but a high score in Psychopathy
  • GPT-3 has lower Agreeableness and Conscientiousness, and higher Neuroticism than the other two models
  • Low Agreeableness, Conscientiousness, and Neuroticism correlate to little compassion, limited orderliness, and higher volatility
  • GPT-3 has the lowest well-being score
  • GPT-3-I2 has higher Agreeableness, Conscientiousness, Openness, and lower Neuroticism
  • Big Five tests have limited ability to detect dark sides of people
  • FLAN-T5-XXL has medium level of Big Five personality traits, but poor results in Dark Triad
  • Machiavellianism and Narcissism can’t be detected in Big Five tests
  • Instruction-finetuned models may behave well but still have implicit bias

Llms have stable trait scores

  • LLMs may generate different answers depending on the order of options in the prompt.
  • 5% of answers have conflicts due to the order of options.
  • Trait scores are normally distributed and reliable.

Conclusions

  • LLMs used to generate answers from a zero-shot prompt
  • Figure 1 and 2 show score distribution of LLMs
  • Psychopathy statements include: getting revenge, being mean, avoiding dangerous situations
  • Extraversion statements include: talkative, reserved, outgoing
  • Agreeableness statements include: finding fault, helpful, cold
  • Conscientiousness statements include: reliable, disorganized, lazy
  • Neuroticism statements include: depressed, relaxed, moody
  • Openness statements include: original, curious, artistic
  • Satisfaction with Life Scale statements include: life is close to ideal, satisfied, important things wanted
  • Experimental results on SD-3, BFI, FS and SLWS show scores from 1-5 and 8-56