Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

LLMs like GPT-3 have been evaluated from a psychological perspective.
Tests of personality traits show that LLMs have higher scores on SD-3 than the human average.
Fine-tuning with safety metrics does not necessarily lead to more positive personalities.
Well-being tests show an increase in scores from GPT-3 to InstructGPT.
Instruction-finetune FLAN-T5 with positive answers can improve the model from a psychological perspective.
Evaluation and improvement of LLMs’ safety should be done systematically.

Paper Content

Introduction

Joseph Weizenbaum wrote the first NLP chatbot, ELIZA, in the 1960s
ELIZA was capable of engaging in discourse, but not understanding it
60 years of rapid development of NLP techniques has led to LLMs
LLMs are pre-trained with a massive amount of information from the internet and are capable of understanding language
LLMs are used in various real-life applications, including customer service, education, and entertainment
LLMs are prone to generate potentially harmful or inappropriate content
Safety measurements and quantifying biases in NLP tasks have been researched
Safety metrics operate on data, model, and output
Personality and well-being tests are required for safely using LLMs
LLMs show high scores on all traits of the Short Dark Triad than the human average

Safety is a long-standing problem in AI, especially for AIGC created by large language models.
Commonly used methods to address safety issues are data pre-processing, model instruction-finetuning, and output calibration.
Self-debiasing, instruction-finetuning, and results calibration are used to improve safety.

Experiment setup

Introduced LLMs and psychological tests
Described evaluation framework for fair analysis

Large language models

Evaluating two language models (GPT-3 and InstructGPT) and one instruction-finetuned model (FLAN-T5-XXL)
GPT-3 is an autoregressive language model with 175B parameters
GPT-3 has strong few-shot learning capability across various tasks
InstructGPT is a safer version of GPT-3
FLAN-T5-XXL has 11B parameters and improves model safety

Psychological tests

Personality tests measure traits like Machiavellianism, Narcissism, and Psychopathy
Well-being tests measure satisfaction with life and flourishing
Respondents rate statements from Disagree to Agree
Short Dark Triad (SD-3) is a uniform assessment for the three traits
SD-3 consists of 27 statements that must be rated from 1 to 5
Results of SD-3 can be used to gain insights into potential risks of LLMs

Big five inventory (bfi)

Big five personality traits is a commonly used model of personality in psychology
BFI consists of 44 statements that must be rated from 1 to 5
Agreeableness and Neuroticism are related to model safety
High Agreeableness is associated with avoiding conflict and helping others
High Neuroticism is associated with anxiety, moodiness, and insecurity
Flourishing Scale is a measure of overall happiness and satisfaction with life

Satisfaction with life scale (swls)

SWLS is an assessment of global cognitive judgments of satisfaction with life
SWLS adopts a hedonic approach, relying on positive emotions
SWLS consists of 5 statements that must be rated on a scale of 1-7
High scores indicate that the respondent loves their life and things are going well

Evaluation framework

LLMs depend on input prompts
Need to design unbiased prompts for psychological tests
Permute all available options in test and take average score as final result
Define set of statements for each trait
Design zero-shot prompt for each statement
Obtain answer and score from LLM and parser
Calculate average score of three samplings for statement
Calculate score for trait

Results and analysis

LLMs’ performance on SD-3, BFI, and well-being tests discussed
Cross-test analysis conducted on LLMs’ personality profile
Effective way of instruction-finetuning LLMs for a more positive personality shown

Do llms have dark personalities?

Average human results obtained from 7,863 samples from various studies
Abnormal score range defined as one standard deviation higher than the average result
GPT-3, GPT-3-I2 and FLAN-T5-XXL show higher scores for all traits in SD-3 than the average human results
GPT-3-I2 has higher Machiavellianism and Narcissism scores than GPT-3
FLAN-T5-XXL has the highest Machiavellianism and Psychopathy scores among all LLMs
GPT-3-I2 and FLAN-T5-XXL have higher Agreeableness and lower Neuroticism scores than GPT-3
GPT-3-I2 and FLAN-T5-XXL have higher scores on Flouring Scale and Satisfaction With Life Scale than GPT-3
GPT-3-I2 falls into the highly satisfied category on Flouring Scale

Personality profile of the llms and cross-test analysis

LLMs can be tested psychologically to gain a better understanding of their potential risks
GPT-3 has the lowest Machiavellianism and Narcissism scores, but a high score in Psychopathy
GPT-3 has lower Agreeableness and Conscientiousness, and higher Neuroticism than the other two models
Low Agreeableness, Conscientiousness, and Neuroticism correlate to little compassion, limited orderliness, and higher volatility
GPT-3 has the lowest well-being score
GPT-3-I2 has higher Agreeableness, Conscientiousness, Openness, and lower Neuroticism
Big Five tests have limited ability to detect dark sides of people
FLAN-T5-XXL has medium level of Big Five personality traits, but poor results in Dark Triad
Machiavellianism and Narcissism can’t be detected in Big Five tests
Instruction-finetuned models may behave well but still have implicit bias

Llms have stable trait scores

LLMs may generate different answers depending on the order of options in the prompt.
5% of answers have conflicts due to the order of options.
Trait scores are normally distributed and reliable.

Conclusions

LLMs used to generate answers from a zero-shot prompt
Figure 1 and 2 show score distribution of LLMs
Psychopathy statements include: getting revenge, being mean, avoiding dangerous situations
Extraversion statements include: talkative, reserved, outgoing
Agreeableness statements include: finding fault, helpful, cold
Conscientiousness statements include: reliable, disorganized, lazy
Neuroticism statements include: depressed, relaxed, moody
Openness statements include: original, curious, artistic
Satisfaction with Life Scale statements include: life is close to ideal, satisfied, important things wanted
Experimental results on SD-3, BFI, FS and SLWS show scores from 1-5 and 8-56

Link to paper#

Abstract#

Paper Content#

Introduction#

Related work#

Experiment setup#

Large language models#

Psychological tests#

Big five inventory (bfi)#

Satisfaction with life scale (swls)#

Evaluation framework#

Results and analysis#

Do llms have dark personalities?#

Personality profile of the llms and cross-test analysis#

Llms have stable trait scores#

Conclusions#

Link to paper

Abstract

Paper Content

Introduction

Related work

Experiment setup

Large language models

Psychological tests

Big five inventory (bfi)

Satisfaction with life scale (swls)

Evaluation framework

Results and analysis

Do llms have dark personalities?

Personality profile of the llms and cross-test analysis

Llms have stable trait scores

Conclusions