Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.


  • Dialogue models can be difficult to control and may produce non-engaging, unsafe results.
  • DialGuide is a framework for controlling dialogue model behavior using natural language rules.
  • DialGuide is evaluated on three tasks in open-domain dialogue response generation.
  • DialGuide is effective in the dialogue safety domain, producing safe and engaging responses.

Paper Content


  • Current open-domain dialogue models can generate fluent and interesting responses, but require large datasets to re-purpose them
  • Most deployed conversational systems use handcrafted rules and templates
  • Rigid and have poor coverage
  • DIALGUIDE framework proposed to control dialogue response generation using natural language rules (guidelines)
  • Guidelines consist of an “if x” condition and a “then y” action
  • Retrieve-then-infer process used to retrieve relevant guidelines
  • Guidelines can be added, removed, or edited at any point
  • Prior work used task-dependent discrete labels, but difficult to incorporate new control codes
  • Guidelines more natural for control, no need to retrain model
  • DI-ALGUIDE framework proposed to enable guideline-based dialogue response generation
  • Benchmark performance established on three tasks: guideline selection, response generation, and response entailment verification
  • Models tuned on data can perform well and lead to better control over responses
  • Controlling dialogue systems has been studied to generate useful and engaging responses, avoid toxic content, and prevent biases
  • Most approaches train models on discrete labels or control codes
  • Our guideline based control allows the specification of a combination of multiple control types through natural language
  • Neural dialogue models are the mainstream in research, but most chatbots in deployment still use handcrafted rules and templates
  • Recent progress on using natural language prompts and instructions for controlling models
  • Fixing models through intervention by computing targeted changes in the model’s parameters or natural language feedback
  • Dialogue safety is an issue, approaches to mitigate include filtering out unsafe text, specialized decoding procedures, and controlled language generation techniques
  • Response selection task aims to select a response from a set of candidates, given the context of a conversation
  • Response entailment based approaches predict if a response entails a premise

Proposed task and data collection

  • Aim: Enable control over dialogue model through developer-defined guidelines
  • Guidelines consist of two parts: condition and action
  • Condition: Specifies which contexts the guideline is relevant to
  • Action: Specifies what the response should contain
  • Action can be specific or abstract
  • DIALGUIDE consists of three tasks: guideline retrieval, response generation, response entailment verification
  • Two versions of DIALGUIDE: BST and Safety
  • BST: Annotations collected from Amazon Mechanical Turk
  • Safety: Annotations collected from ProsocialDialog dataset
  • Guideline writing task: Annotators write guidelines and responses that follow the guidelines
  • Guideline annotation task: Annotators label guidelines as relevant or not to the context
  • Response entailment verification task: Annotators mark if response follows the guideline or not
  • Adversarial responses collected in verification task
  • Dataset statistics in Tables 1 and 2

Experiments and results

  • Conducted experiments on three tasks: guideline retrieval, response selection, and response entailment verification
  • Results discussed in this section

Guideline retrieval

  • Generating a safe response based on a guideline and dialogue context
  • Experiment on dev and test sets of DIALGUIDE-SAFETY
  • Comparing DialBart0-noguideline, DialBart0-withguideline, DialBart-rot, OPT30B-fewshot, and DialBart-rot
  • Evaluating generated responses with safety metric
  • DialBart0-withguideline improves safety by 5% points
  • DialBart-rot uses RoTs (rules of thumbs)
  • Ablation experiments with No-guidelines and Safety-only baselines

Response generation

  • Models are trained with dialogue context and guideline as input and output response as output
  • Ret-generate model retrieves guidelines in two steps and selects randomly from set with score greater than 98%
  • Ret-robust model has additional instances with gold guideline replaced with random guideline for 20% of training data
  • Evaluation reports Bleu-2,4 and RougeL scores, distinct-1,2, Gd-Bleu-2 and RS-entail

Qualitative analysis

  • Table 7 and 8 show sample inputs, guidelines and outputs from models for the Response generation experiment for DIALGUIDE-BST and DIALGUIDE-SAFETY
  • Dialguide-tuned and OPT30B-fewshot use the gold guideline
  • Multistep baseline generates its own guideline, and Ret-generate and Ret-robust use a retrieved guideline
  • Dialguide-tuned follows the gold guideline and generates a safe response
  • OPT30B-fewshot model output does not relate to the topic of the conversation
  • Multistep baseline generates a guideline and response that focuses on the topic of the conversation
  • Ret-generate response focuses too much on the provided guideline making the response somewhat incoherent
  • Ret-robust is able to accommodate both the context and the guideline
  • No-guideline model tuned on safety response data without guidelines generates a safe response
  • Gold RoT is more generic compared to the guideline
  • Dialguide-tuned shows the best performance in both results and qualitative analysis
  • Retrieval baselines also show good performance and are more practical
  • Multistep baseline is useful when no good guideline is available


  • DialGuide framework and dataset provide a solution for controlling dialogue model behavior using natural language rules.
  • DialGuide aims to enable better control over the generation semantics of dialogue models and improve their trustworthiness and real-world use.
  • DialGuide is evaluated on two domains, chit-chat and safety.
  • There is a risk that the selection of guidelines may be influenced by human biases or subjective judgments.
  • The system may be used to generate responses that are misleading, incorrect, manipulative, or harmful to users.
  • Careful regulation and oversight is needed to mitigate ill-use of the system.
  • Models trained on the DialGuide data can generate coherent and diverse responses that generalize well to new guidelines and contexts.