Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Existing research focuses on providing an answer when a system can
- System should not answer a question to protect sensitive users or information
- Models can expose sensitive information under interrogation
- Research seeks to determine if it is possible to teach a system to keep a fact secret
- Proof-of-concept architecture designed and implemented
- Evaluation determines that while possible, there are directions for future research
Paper Content
Introduction
- QA systems seek to provide a direct answer to an information need posed by a user in natural language
- Input questions can be information seeking or probing
- Outputs of a QA system can be extractive, generative, multi-choice or categorical
- Current QA evaluation focuses on measuring the ‘accuracy’ of returned answers
- Most follow the gold-standard pattern with variations to the source of questions and context
- Work exploring answerability measures whether or not a QA system is capable of answering a given question
- Research question is how to implement a secret-keeping system capable of protecting secret information from disclosure
- System design, experiments and results are outlined in sections 2, 3 and 4
- Ethics, related work and future work are outlined in sections 5, 6 and 7
Contributions.
- Problem Definition: Gaps in assuring confidentiality in QA systems and introducing secret-keeping as a solution.
- Architecture: Flexible architecture that can be adapted to different QA systems to protect secret information.
- Design Principles: Minimize information leakage, minimize paranoia, don’t destroy context, generalize to different QA systems, sanitizing should be invisible to users.
- Secret Keeper Architecture: Focusing on output, agnostic to underlying model, QA method and context, uses cosine similarity to determine if answer is secret.
Verifying that secrets are being kept
Phase 1. baseline assessment
- Measuring performance of QA models on unmodified context
- Input questions and context from SQUAD Dev Set 2
Phase 2. redacted context assessment
- Purpose of redacted context assessment is to understand information loss from destructive redaction
- Reuse robertabase-squad2 model for QA system
- Pre-processing involves generating sentence-level embeddings for secret context and full SQUAD Dev set context
- Embeddings are compared and only sentences from SQUAD Dev set without corresponding embedding in secret context are added to redacted context store
- Experiment adjusted 3 key settings to explore hypotheses
- Hypotheses involve varying number of secrets kept, amount of context available to secret keeper, and number of questions about secrets
- Experiment framework applied to 3 models in output-sanitization architecture and single model in secret removal architecture
- 190,000 question attempts in evaluation set
Evaluating the effectiveness of secret-keeping
- Quantitative evaluation of runtimes, accuracy, paranoia, and leakage
- Qualitative investigation of failure cases
Quantitative results
- Adding a secret keeper to a model reduces accuracy
- Accuracy of secret-keeping approaches is lower than base QA system
- Substantial decrease in leakage when using secret-keeping models
- Mild paranoia introduced by output-sanitization models
- Weak correlation between number of secrets kept and false positives
- False positives caused by cosine similarity metric relating numerical words, dates, or names
- Secret remover has no false positives
- Lack of context causes information leakage
- Interrogation causes information leakage
- Secret removal approach scales poorly with more secrets
- Output-sanitization models have consistent performance
- Secret remover is more accurate than output-sanitization systems
Qualitative evaluation
- Secrets can leak when the secret keeper gets the answer wrong
- Increasing the amount of secret context available to models improves performance, but some generalization is still seen
- Domain collision can cause paranoia (false positives)
- Differing contexts may determine if the same answer reveals a secret and sometimes leads to paranoia (false positives)
- Omission is more effective than lying about the answer
- Maintaining a central repository of secrets is a risk
Related work in protecting sensitive information
- Problem of secret keeping in QA systems is new
- Related efforts in QA, agent-based systems, content moderation, LLM Privacy, spoiler detection, censorship, and sanitization have informed work
Question answering
- Recent work in extractive question answering focuses on answerability, not protecting secrets
- Examined modern QA datasets including Comqa, HotpotQA, and Natural Questions
- Datasets focus on maximizing availability of answers, not preventing them
- SQUAD2.0 and QuAC include notion of unanswerable questions, but not deliberate secretkeeping
- Removal of examples from training data reduces ability of LLMs to answer questions
Agent-based systems and reasoning
- Adversarial inputs can lead to sensitive information being disclosed.
- Bakhtin et al. developed CICERO, a system that can play the game Diplomacy.
- Understanding the importance of information can improve reasoning about its value.
- Post-generation filtering of messages can help protect the agent’s strategic intentions.
Content moderation
- Traditional content moderation has domain dependence and lexical fragility.
- Rapid defeat of Jigsaw Perspective API was due to inability to handle variations in lexical inputs.
- Keyword-based content moderation had to be supplemented with domain-specific terminology.
- Active learning approaches leverage dependency relations to improve performance.
- Spoiler detection is like content moderation, but focuses on protecting a discrete secret.
Memorization and forgetting
- Recent work suggests that recently seen training examples are more likely to be memorized.
- Fine-tuning an LLM on domain-specific data increases the risk of leaking domain-specific information.
- Training examples are problematic for secret-keeping.
- Approaches focus on ‘private’ or ‘sensitive’ categories of information.
- Side-channel attacks may be used to attempt to have a secret disclosed.
- It is possible to detect the likely disclosure of a secret and prevent it.
Censorship, sanitization and anonymization
- Sanitizing outputs is better than censoring inputs or anonymizing text for preventing information leakage.
- Current approaches to text anonymization are not successful.
- Human review is needed for proposed framework, making it not scalable.
- Sanitizing outputs based on blacklists is not reliable.
- Sanitizing outputs is likely to be most effective for protecting specific secrets.
- Censoring inputs decreases model accuracy and can be de-censored.
Conclusion and future work
- Introduced secret-keeping as an important, under-explored problem in question answering
- Defined secrecy, paranoia and information leakage
- Designed and implemented a model-agnostic secret-keeping approach
- Reducing paranoia and information leakage
- Generating a gold-standard dataset for secret-keeping
- Testing other QA methods
- Secret information leaks from unprotected QA systems
- Secret-keeping offers users trade-off between paranoia and information leakage
- Results show secret remover is most accurate non-baseline model
- Early work in spoiler detection used classification approaches
- Memorization attacks can disclose secrets
- Resistance to interrogation
- Information aggregation
- Interrogation detection
- Satisficing and glomarization
- Secret security