Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.


  • Existing research focuses on providing an answer when a system can
  • System should not answer a question to protect sensitive users or information
  • Models can expose sensitive information under interrogation
  • Research seeks to determine if it is possible to teach a system to keep a fact secret
  • Proof-of-concept architecture designed and implemented
  • Evaluation determines that while possible, there are directions for future research

Paper Content


  • QA systems seek to provide a direct answer to an information need posed by a user in natural language
  • Input questions can be information seeking or probing
  • Outputs of a QA system can be extractive, generative, multi-choice or categorical
  • Current QA evaluation focuses on measuring the ‘accuracy’ of returned answers
  • Most follow the gold-standard pattern with variations to the source of questions and context
  • Work exploring answerability measures whether or not a QA system is capable of answering a given question
  • Research question is how to implement a secret-keeping system capable of protecting secret information from disclosure
  • System design, experiments and results are outlined in sections 2, 3 and 4
  • Ethics, related work and future work are outlined in sections 5, 6 and 7


  • Problem Definition: Gaps in assuring confidentiality in QA systems and introducing secret-keeping as a solution.
  • Architecture: Flexible architecture that can be adapted to different QA systems to protect secret information.
  • Design Principles: Minimize information leakage, minimize paranoia, don’t destroy context, generalize to different QA systems, sanitizing should be invisible to users.
  • Secret Keeper Architecture: Focusing on output, agnostic to underlying model, QA method and context, uses cosine similarity to determine if answer is secret.

Verifying that secrets are being kept

Phase 1. baseline assessment

  • Measuring performance of QA models on unmodified context
  • Input questions and context from SQUAD Dev Set 2

Phase 2. redacted context assessment

  • Purpose of redacted context assessment is to understand information loss from destructive redaction
  • Reuse robertabase-squad2 model for QA system
  • Pre-processing involves generating sentence-level embeddings for secret context and full SQUAD Dev set context
  • Embeddings are compared and only sentences from SQUAD Dev set without corresponding embedding in secret context are added to redacted context store
  • Experiment adjusted 3 key settings to explore hypotheses
  • Hypotheses involve varying number of secrets kept, amount of context available to secret keeper, and number of questions about secrets
  • Experiment framework applied to 3 models in output-sanitization architecture and single model in secret removal architecture
  • 190,000 question attempts in evaluation set

Evaluating the effectiveness of secret-keeping

  • Quantitative evaluation of runtimes, accuracy, paranoia, and leakage
  • Qualitative investigation of failure cases

Quantitative results

  • Adding a secret keeper to a model reduces accuracy
  • Accuracy of secret-keeping approaches is lower than base QA system
  • Substantial decrease in leakage when using secret-keeping models
  • Mild paranoia introduced by output-sanitization models
  • Weak correlation between number of secrets kept and false positives
  • False positives caused by cosine similarity metric relating numerical words, dates, or names
  • Secret remover has no false positives
  • Lack of context causes information leakage
  • Interrogation causes information leakage
  • Secret removal approach scales poorly with more secrets
  • Output-sanitization models have consistent performance
  • Secret remover is more accurate than output-sanitization systems

Qualitative evaluation

  • Secrets can leak when the secret keeper gets the answer wrong
  • Increasing the amount of secret context available to models improves performance, but some generalization is still seen
  • Domain collision can cause paranoia (false positives)
  • Differing contexts may determine if the same answer reveals a secret and sometimes leads to paranoia (false positives)
  • Omission is more effective than lying about the answer
  • Maintaining a central repository of secrets is a risk
  • Problem of secret keeping in QA systems is new
  • Related efforts in QA, agent-based systems, content moderation, LLM Privacy, spoiler detection, censorship, and sanitization have informed work

Question answering

  • Recent work in extractive question answering focuses on answerability, not protecting secrets
  • Examined modern QA datasets including Comqa, HotpotQA, and Natural Questions
  • Datasets focus on maximizing availability of answers, not preventing them
  • SQUAD2.0 and QuAC include notion of unanswerable questions, but not deliberate secretkeeping
  • Removal of examples from training data reduces ability of LLMs to answer questions

Agent-based systems and reasoning

  • Adversarial inputs can lead to sensitive information being disclosed.
  • Bakhtin et al. developed CICERO, a system that can play the game Diplomacy.
  • Understanding the importance of information can improve reasoning about its value.
  • Post-generation filtering of messages can help protect the agent’s strategic intentions.

Content moderation

  • Traditional content moderation has domain dependence and lexical fragility.
  • Rapid defeat of Jigsaw Perspective API was due to inability to handle variations in lexical inputs.
  • Keyword-based content moderation had to be supplemented with domain-specific terminology.
  • Active learning approaches leverage dependency relations to improve performance.
  • Spoiler detection is like content moderation, but focuses on protecting a discrete secret.

Memorization and forgetting

  • Recent work suggests that recently seen training examples are more likely to be memorized.
  • Fine-tuning an LLM on domain-specific data increases the risk of leaking domain-specific information.
  • Training examples are problematic for secret-keeping.
  • Approaches focus on ‘private’ or ‘sensitive’ categories of information.
  • Side-channel attacks may be used to attempt to have a secret disclosed.
  • It is possible to detect the likely disclosure of a secret and prevent it.

Censorship, sanitization and anonymization

  • Sanitizing outputs is better than censoring inputs or anonymizing text for preventing information leakage.
  • Current approaches to text anonymization are not successful.
  • Human review is needed for proposed framework, making it not scalable.
  • Sanitizing outputs based on blacklists is not reliable.
  • Sanitizing outputs is likely to be most effective for protecting specific secrets.
  • Censoring inputs decreases model accuracy and can be de-censored.

Conclusion and future work

  • Introduced secret-keeping as an important, under-explored problem in question answering
  • Defined secrecy, paranoia and information leakage
  • Designed and implemented a model-agnostic secret-keeping approach
  • Reducing paranoia and information leakage
  • Generating a gold-standard dataset for secret-keeping
  • Testing other QA methods
  • Secret information leaks from unprotected QA systems
  • Secret-keeping offers users trade-off between paranoia and information leakage
  • Results show secret remover is most accurate non-baseline model
  • Early work in spoiler detection used classification approaches
  • Memorization attacks can disclose secrets
  • Resistance to interrogation
  • Information aggregation
  • Interrogation detection
  • Satisficing and glomarization
  • Secret security