Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Present SODA: a million-scale high-quality social dialogue dataset
Train COSMO: a generalizable conversation agent
Dialogues in SODA are more consistent, specific, and natural than prior datasets
COSMO is more natural and consistent than best-performing dialogue models
Data, models, and code are made public

Paper Content

Introduction

Progress on open-domain social dialogue agents has been hindered by lack of diversity, scale, and quality of training corpora
Most dialogue agents are trained on large amounts of unfiltered conversations or highly curated/specialized crowdsourced dialogues
Issues of unnaturalness, toxicity, incoherence, blandness, and lack of commonsense remain
Introduce SODA, a million-scale dialogue dataset covering a wide variety of social interactions
SODA is the largest publicly available open-domain conversation dataset
Human evaluation shows SODA surpasses existing human-authored dialogue corpora
Proposed CO 3 framework for distilling conversations from large pre-trained language models
CO 3 adds context information to social commonsense knowledge step-by-step
COSMO conversation model trained on SODA outperforms existing dialogue models

Background

Conversation is a form of social interaction
Narratives and scripts are abstracted from social experiences
Social experiences form our knowledge for explaining everyday events and inferring the mental states of others
Attribution in social psychology has been studied in NLP as social commonsense

Commonsense knowledge graph

Start with a commonsense knowledge graph
Represented by symbolic triples
Use Atomic 10x as knowledge graph
Retrieve triples with social commonsense relations
Prompt PLM to rewrite commonsense into narrative
PLMs known for writing capabilities, especially in narratives

From narrative to conversation

Inferring who is speaking in the dialogue is easier when the narrative contains person variables.
For narratives with only one person, the PLM is prompted to predict the other interlocutor.
The PLM is prompted to generate a likely conversation happening in the scene.

Validating narratives and conversations with commonsense knowledge

Automatic validation of generated dialogues to check if seed commonsense triple is implied
Validation done by asking PLM two questions: is head of triple represented and are relation and tail?
Ranking of choices done using conditional pointwise mutual information
Collected SODA dataset, a large-scale high-quality conversation dataset

Postprocessing the conversations

Initial set of 2.2 million conversations distilled from Instruct-GPT
Lexical pattern matching to filter out conversations with erroneous patterns
Remove conversations with less than four turns or more than twenty turns
Remove conversations with more than two participants
Remove conversations with non-human speakers
Safety filters to avoid dangerous and harmful contents
Automatic evaluation to identify conversations with seed commonsense knowledge
Human evaluation with 100 randomly sampled narrative-conversation pairs
Replace all names with Top-10K names of US SSN applicants
Head-to-head human evaluations with DailyDialog and BlendedSkillTalk
Collecting SODA via contextualization framework is cost and time efficient

Contextualization is important

Isolating the effect of contextualization vs. sampling from a large PLM
Comparing SODA with dialogues generated by naively sampling from InstructGPT without context
Human evaluators prefer context-grounded conversations significantly more than ones generated without context
Conversations sampled without context are less specific, less interesting, and have lower lexical diversity
Utilizing SODA to train COSMO, a conversation model that can converse in a wide range of social situations
Training COSMO with structured components of SODA
Including Prosocial-Dialog for additional training data
Building COSMO on top of LM-adapted T5
Comparing COSMO to other conversational agents on social conversation datasets
Relying on human evaluation for dialogue responses
Comparing COSMO with Di-aloGPT, BlenderBot, GODEL, Instruct-GPT, and ChatGPT
Evaluating head-to-head comparison between two responses with four criteria: naturalness, consistency, specificity, and overall

Out-of-domain setting

Evaluated models on unseen dialogue dataset, DailyDialog
COSMO outperforms other models with significant margin
COSMO trained on smaller amount of data
Human judges prefer COSMO’s responses over original gold responses

One-sided out-of-domain setting

COSMO outperforms BlenderBot on BlendedSkillTalk (BST)
BlenderBot shows low performance on SODA

In-domain setting

COSMO’s responses are more specific and favored than its teacher model
COSMO is on par with ChatGPT in terms of overall quality
ChatGPT’s responses are more specific but lack naturalness
SODA and COSMO focus on modeling natural conversations, not knowledge-enhanced responses

Existing dialogue datasets come from online learning websites, movie/drama scripts, crowdsourcing, and noisy web conversations.
Augmented dialogue datasets use PLMs and additional annotations.
SODA contributes to existing corpora with improved scale, quality, and contextualization.

Conclusion

Presented SODA, a million-scale dialogue dataset
Dataset is larger and better than existing datasets
Trained COSMO, a conversation model that generalizes better
Aim to alleviate data scarcity in dialogue field
Precaution taken to vet safety of distilled conversations
Manual validation of commonsense and human evaluation
Limitations of current dataset and future work
Intention of work is to build better assistive technologies
Need for improved regulations on use and misuse of conversational AI
Results of head-to-head comparison between SODA and other datasets
Statistics of SODA compared to other large-scale dialogue datasets

Link to paper#

Abstract#

Paper Content#

Introduction#

Background#

Commonsense knowledge graph#

From narrative to conversation#

Validating narratives and conversations with commonsense knowledge#

Postprocessing the conversations#

Contextualization is important#

Out-of-domain setting#

One-sided out-of-domain setting#

In-domain setting#

Related work#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

Background

Commonsense knowledge graph

From narrative to conversation

Validating narratives and conversations with commonsense knowledge

Postprocessing the conversations

Contextualization is important

Out-of-domain setting

One-sided out-of-domain setting

In-domain setting

Related work

Conclusion