Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Automated plot generation is the challenge of creating a sequence of events that make a coherent story.
  • Traditional symbolic planners create stories from a goal state, but rely on hand-crafted actions.
  • Neural language models can generate stories with great diversity, but have trouble ending stories and maintaining coherence.
  • This paper presents an approach to story plot generation that combines causal planning with neural language models.
  • The system infers preconditions for events in the story and then events that will cause those conditions to become true.
  • Automatic evaluation was used to measure narrative coherence.
  • Results show that the proposed method produces more coherent plotlines than several strong baselines.

Paper Content

Introduction

  • Automated plot generation is the challenge of creating a sequence of events that make up a coherent story.
  • Early solutions to story generation used symbolic planning and case-based reasoning.
  • Symbolic planners guarantee logical soundness of a plan and create character believability, character conflict, and theory of mind.
  • Narrative psychologists note the importance of plot coherence for reader comprehension.
  • Symbolic planners require hand-crafted schemas to define preconditions and effects.
  • Neural language model based approaches can generate diverse stories.
  • This paper seeks to unify the strengths of symbolic planning and neural text generation to achieve causally coherent stories.
  • The story planner uses a large pre-trained language model to infer events and their preconditions.
  • The system works backward from a given story ending and queries the language model to generate preconditions.
  • The system outperforms other strong baselines in generating coherent plotlines.
  • Planner finds sequence of actions to reach goal state
  • Story and plot generation using symbolic planning results in causally coherent plotlines but lacks diversity and length
  • Partial Order Causal Link Planning (POCL) used by some story planners
  • Plan represented as a tuple of actions, causal links, and temporal ordering constraints
  • Planner selects plan on fringe of plan-space and action with unsatisfied precondition
  • Successor plan generated for each way of satisfying precondition
  • POCL planners are cognitively plausible for humans

Neural story generators

  • Early attempts at neural story generation used language models
  • Attempts to control coherence include conditioning and hierarchical generation
  • Language models not aware of story ending
  • EDGAR system generates stories backward using language model based question-answering
  • C2PO conducts bidirectional search using COMET model of commonsense inference
  • TattleTale system uses classical symbolic planner to generate sequence of actions and condition language model

Neural story planner

  • Proposed technique generates causally coherent story plans using causal links from POCL plan
  • Plan is generated from right to left, starting with the final event
  • Events and preconditions are inferred by a large language model
  • Planner starts with an input sentence that specifies the ending of a story
  • Planner is also provided with a set of initial conditions
  • System recursively uses language model to infer preconditions of any event in the plot
  • For any given precondition, system semigreedily accepts the first event inferred by the language model
  • Open world setting avoids logical impossibilities
  • Some inferences result in cycles of repeated events, in which case the planner backtracks and selects alternative inferences

Precondition generation

  • Use large language models to infer preconditions of events
  • Prompts based on psychological theories of reading comprehension
  • Six different classes of preconditions
  • Item Need: What item must a character possess to do an action?
  • Item State: What must be true about an item for an action to be performed?
  • How: What needed to have happened for something to be achievable now?
  • Interactions with others: What must have happened between two characters for an action to be performed?
  • Reason: What serves as the reason for an event?
  • Location precondition required for every action
  • Want and need reason-preconditions are deleted automatically
  • Use PMI DC rescoring to address surface form competition
  • Cosine similarity check to filter out lexical variations

Event generation

  • Preconditions are satisfied by the effects of events
  • Directly infer events to satisfy preconditions
  • Copy sentence from precondition to make a new event
  • Generate pairs and chains of events that go together
  • Use specialized prompts for each precondition type
  • Create causal link between new event and precondition
  • Represent plot as directed acyclic graph
  • Greedy assumption that same event can satisfy multiple preconditions
  • Check for existing event before generating new one
  • Record multiple possible phrasings for duplicate events
  • Discard generated precondition if identical to parent event
  • Don’t model negative effects of events

Final total ordering

  • Produce a total ordering of events in plot graph
  • Prefer certain orderings over others
  • Order events based on type of precondition satisfied
  • Events that initiate character intentions should come first
  • Events that make use of establishing events should come after

Evaluation

  • Evaluated the extent to which planning technique generates coherent plots
  • Used psychological studies to measure reader comprehension
  • Prompted GPT-3 to answer enablement questions about generated plots
  • Used ROCStories dataset to seed system and baselines
  • Assessed whether plotlines are coherent
  • Scored systems by percentage of time GPT-3 can determine if an event is enabled by prior events
  • Did not evaluate systems using perplexity or BLEU score

Coherence measure

  • Measure comprehension of generated stories by posing questions to GPT-3
  • Exclude first sentence of story from evaluation
  • Consider answerable if found in earlier part of story and not too similar to queried event
  • Validate measure with 100 stories from ROCStories dataset, half of which should be answerable and half should produce “none”
  • Measure has overall accuracy of 78.20%

Models and baselines

  • Evaluated performance of neural planner against four baselines
  • GPT-J-6B used to generate entire story
  • C2PO uses bidirectional interpolation with commonsense inference
  • comGen uses transformer based language model and two knowledge bases
  • plan-write-revise fine-tuned on ROCStories with titles as topic
  • Filtered out non-action sentences and single-event plots
  • Constrained item-precondition generation to single precondition

Results and discussion

  • Our planner achieves highest percentage of answerable enablement questions
  • Our planner introduces events to story to enable future events
  • Evaluation metric is sensitive to structure of story
  • C2PO and comGen use commonense inferences from COMET
  • C2PO generates commonense inferences about “wants” and “needs”
  • ROCStories response rate of 85.39%

Conclusions

  • Neural language models used to generate story plotlines
  • System chains backward from given ending of story
  • Language model used to infer conditions and events
  • Search algorithm inspired by classical planning
  • Generated stories focus on physically grounded action