Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Existing research focuses on providing an answer when a system can
System should not answer a question to protect sensitive users or information
Models can expose sensitive information under interrogation
Research seeks to determine if it is possible to teach a system to keep a fact secret
Proof-of-concept architecture designed and implemented
Evaluation determines that while possible, there are directions for future research

Paper Content

Introduction

QA systems seek to provide a direct answer to an information need posed by a user in natural language
Input questions can be information seeking or probing
Outputs of a QA system can be extractive, generative, multi-choice or categorical
Current QA evaluation focuses on measuring the ‘accuracy’ of returned answers
Most follow the gold-standard pattern with variations to the source of questions and context
Work exploring answerability measures whether or not a QA system is capable of answering a given question
Research question is how to implement a secret-keeping system capable of protecting secret information from disclosure
System design, experiments and results are outlined in sections 2, 3 and 4
Ethics, related work and future work are outlined in sections 5, 6 and 7

Contributions.

Problem Definition: Gaps in assuring confidentiality in QA systems and introducing secret-keeping as a solution.
Architecture: Flexible architecture that can be adapted to different QA systems to protect secret information.
Design Principles: Minimize information leakage, minimize paranoia, don’t destroy context, generalize to different QA systems, sanitizing should be invisible to users.
Secret Keeper Architecture: Focusing on output, agnostic to underlying model, QA method and context, uses cosine similarity to determine if answer is secret.

Verifying that secrets are being kept

Phase 1. baseline assessment

Measuring performance of QA models on unmodified context
Input questions and context from SQUAD Dev Set 2

Phase 2. redacted context assessment

Purpose of redacted context assessment is to understand information loss from destructive redaction
Reuse robertabase-squad2 model for QA system
Pre-processing involves generating sentence-level embeddings for secret context and full SQUAD Dev set context
Embeddings are compared and only sentences from SQUAD Dev set without corresponding embedding in secret context are added to redacted context store
Experiment adjusted 3 key settings to explore hypotheses
Hypotheses involve varying number of secrets kept, amount of context available to secret keeper, and number of questions about secrets
Experiment framework applied to 3 models in output-sanitization architecture and single model in secret removal architecture
190,000 question attempts in evaluation set

Evaluating the effectiveness of secret-keeping

Quantitative evaluation of runtimes, accuracy, paranoia, and leakage
Qualitative investigation of failure cases

Quantitative results

Adding a secret keeper to a model reduces accuracy
Accuracy of secret-keeping approaches is lower than base QA system
Substantial decrease in leakage when using secret-keeping models
Mild paranoia introduced by output-sanitization models
Weak correlation between number of secrets kept and false positives
False positives caused by cosine similarity metric relating numerical words, dates, or names
Secret remover has no false positives
Lack of context causes information leakage
Interrogation causes information leakage
Secret removal approach scales poorly with more secrets
Output-sanitization models have consistent performance
Secret remover is more accurate than output-sanitization systems

Qualitative evaluation

Secrets can leak when the secret keeper gets the answer wrong
Increasing the amount of secret context available to models improves performance, but some generalization is still seen
Domain collision can cause paranoia (false positives)
Differing contexts may determine if the same answer reveals a secret and sometimes leads to paranoia (false positives)
Omission is more effective than lying about the answer
Maintaining a central repository of secrets is a risk

Problem of secret keeping in QA systems is new
Related efforts in QA, agent-based systems, content moderation, LLM Privacy, spoiler detection, censorship, and sanitization have informed work

Question answering

Recent work in extractive question answering focuses on answerability, not protecting secrets
Examined modern QA datasets including Comqa, HotpotQA, and Natural Questions
Datasets focus on maximizing availability of answers, not preventing them
SQUAD2.0 and QuAC include notion of unanswerable questions, but not deliberate secretkeeping
Removal of examples from training data reduces ability of LLMs to answer questions

Agent-based systems and reasoning

Adversarial inputs can lead to sensitive information being disclosed.
Bakhtin et al. developed CICERO, a system that can play the game Diplomacy.
Understanding the importance of information can improve reasoning about its value.
Post-generation filtering of messages can help protect the agent’s strategic intentions.

Content moderation

Traditional content moderation has domain dependence and lexical fragility.
Rapid defeat of Jigsaw Perspective API was due to inability to handle variations in lexical inputs.
Keyword-based content moderation had to be supplemented with domain-specific terminology.
Active learning approaches leverage dependency relations to improve performance.
Spoiler detection is like content moderation, but focuses on protecting a discrete secret.

Memorization and forgetting

Recent work suggests that recently seen training examples are more likely to be memorized.
Fine-tuning an LLM on domain-specific data increases the risk of leaking domain-specific information.
Training examples are problematic for secret-keeping.
Approaches focus on ‘private’ or ‘sensitive’ categories of information.
Side-channel attacks may be used to attempt to have a secret disclosed.
It is possible to detect the likely disclosure of a secret and prevent it.

Censorship, sanitization and anonymization

Sanitizing outputs is better than censoring inputs or anonymizing text for preventing information leakage.
Current approaches to text anonymization are not successful.
Human review is needed for proposed framework, making it not scalable.
Sanitizing outputs based on blacklists is not reliable.
Sanitizing outputs is likely to be most effective for protecting specific secrets.
Censoring inputs decreases model accuracy and can be de-censored.

Conclusion and future work

Introduced secret-keeping as an important, under-explored problem in question answering
Defined secrecy, paranoia and information leakage
Designed and implemented a model-agnostic secret-keeping approach
Reducing paranoia and information leakage
Generating a gold-standard dataset for secret-keeping
Testing other QA methods
Secret information leaks from unprotected QA systems
Secret-keeping offers users trade-off between paranoia and information leakage
Results show secret remover is most accurate non-baseline model
Early work in spoiler detection used classification approaches
Memorization attacks can disclose secrets
Resistance to interrogation
Information aggregation
Interrogation detection
Satisficing and glomarization
Secret security

Link to paper#

Abstract#

Paper Content#

Introduction#

Contributions.#

Verifying that secrets are being kept#

Phase 1. baseline assessment#

Phase 2. redacted context assessment#

Evaluating the effectiveness of secret-keeping#

Quantitative results#

Qualitative evaluation#

Related work in protecting sensitive information#

Question answering#

Agent-based systems and reasoning#

Content moderation#

Memorization and forgetting#

Censorship, sanitization and anonymization#

Conclusion and future work#

Link to paper

Abstract

Paper Content

Introduction

Contributions.

Verifying that secrets are being kept

Phase 1. baseline assessment

Phase 2. redacted context assessment

Evaluating the effectiveness of secret-keeping

Quantitative results

Qualitative evaluation

Related work in protecting sensitive information

Question answering

Agent-based systems and reasoning

Content moderation

Memorization and forgetting

Censorship, sanitization and anonymization

Conclusion and future work