Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Language model-based KG embeddings are usually static and difficult to modify after deployment.
  • A new task of editing language model-based KG embeddings is proposed.
  • Four new datasets are built to evaluate existing models and a new model, KGEditor.
  • KGEditor can update specific facts without affecting the rest with low training resources.

Paper Content

Introduction

  • Knowledge Graphs (KGs) are multi-relational graphs with massive symbolic facts
  • KGs can provide back-end support for knowledge-intensive tasks
  • KG embedding approaches represent KGs in low-dimension vector spaces
  • Traditional KG embedding models are structure-based
  • Recent trend is to apply text descriptions with an expressive black-box model
  • New task of editing language model-based KG embeddings proposed
  • Three principles to evaluate performance of task: knowledge reliability, locality, efficiency
  • Four new datasets built for EDIT and ADD sub-tasks
  • Existing approaches suffer from limited ability to efficiently edit KG embeddings
  • Proposed approach KGEditor can modify incorrect knowledge or add new knowledge while maintaining the others

Editing factual knowledge

  • Editable training is an early model-agnostic attempt to quickly edit a trained model.
  • There are two different kinds of methodologies: external model-based editor and additional parameter-based editor.
  • Current approaches mainly aim to modify the knowledge in pre-trained language models.

Task definition

  • KG can be represented as (entities, relations, triples)
  • EDIT sub-task: change outdated triple to new triple (head, relation, tail, target)
  • ADD sub-task: insert new triple into KG embeddings
  • Evaluation metrics: knowledge reliability, knowledge locality, knowledge efficiency

Datasets construction

  • Built four datasets for EDIT and ADD sub-task based on two benchmark datasets
  • Trained KG embedding models with language models and sampled hard triples as candidates
  • Removed triples that can be inferred from language models
  • Sampled long-tailed triples to handle skewed distribution
  • For EDIT sub-task, replaced entities in training set of FB15k-237 to build pre-train dataset
  • For ADD sub-task, used original training set of FB15k-237 to build pre-train dataset
  • Built reference dataset based on link prediction performance with rank value below k
  • All data reviewed by five human experts for ethical issues

Language model-based kge

  • FT-KGE methods use triples in KGs as textual sequences and use the presentation of [CLS] token to conduct a binary classification
  • PT-KGE methods use natural language prompts to elicit knowledge from pre-trained language models for KG embeddings
  • Link prediction can be reformulated as a masked entity prediction task

Editing kge baselines

  • KG embeddings can be edited using external model-based editors and additional parameter-based editors
  • KE uses a hyper network to predict weight updates during inference
  • MEND uses a low-rank decomposition of gradients to edit language models
  • CALINET extends the FFN with additional parameters for knowledge editing
  • KGEditor combines external model-based and additional parameter-based editors to edit KG embeddings
  • KGEditor uses an additional FFN layer and a hyper network to generate shift parameters
  • The hyper network is a bidirectional LSTM that encodes input triples
  • The FNN predicts vectors and a scalar to generate shift parameters
  • KGEditor is efficient in terms of computational resources and time

Evaluation

Settings

  • Utilize four datasets for evaluation
  • Initialize KG embeddings with BERT base
  • Use train and test datasets for evaluation
  • Evaluate ADD sub-task based on performance of train set
  • Leverage PT-KGE as default setting in main experiments

Main results

  • KGEditor is compared to baselines and variants
  • Experiments assess effectiveness of various approaches to edit KG embeddings
  • Performance is verified with different number of target triples
  • Performance of different KGE initialization is explored
  • KGE ZSL fails to infer any facts on both datasets
  • Tuning all or partial parameters does not achieve satisfactory performance
  • KGEditor outperforms almost all baselines
  • KGE ZSL still fails to infer any facts on both datasets
  • Tuning all or partial parameters yields best performance but loses previously learned knowledge
  • KGEditor utilizes hyper network to guide parameter editing of FFN
  • KGEditor obtains better performance than baselines
  • Number of edits affects all models’ knowledge reliability
  • Visualization of entities before and after editing shows effectiveness of editing KG embeddings

Discussion and conclusion

  • Propose two sub-tasks of EDIT and ADD for KG embeddings
  • Aim to enable fast updates to KG embeddings after deployment
  • Related to studies of language models as knowledge bases
  • Contribute a new task with four datasets
  • Propose a simple yet strong baseline KGEditor
  • Future plans to develop more efficient approaches to edit more knowledge in one stage