Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Privacy assistants help users manage their online privacy.
- Tasks include detecting privacy violations and recommending sharing actions.
- It is important for privacy assistants to explain their decisions to users.
- This paper develops a methodology to create explanations of privacy.
- The methodology is based on identifying topics, providing explanation schemes, and generating them automatically.
- The approach is evaluated on a user study to determine what factors make explanations useful.
Paper Content
Introduction
- Managing privacy online is becoming more challenging
- People use systems such as online social networks and Internet of Things applications
- People are worried about their privacy and self-censor
- Privacy assistants have been developed to help with privacy
- Explanations are needed to explain why a piece of content is private or public
- This paper proposes a methodology and system to explain why an image is private or public
- A user study is conducted to measure if users find the explanations useful
Methodology for explaining privacy
- Generate explanation for why an image is classified as private or public
- Use computer science to analyze image classification
Understanding explanation
- Explanations are meant for end users, not to educate them about the classifier
- Explanations should be visually understandable and supported by a short text
- Explanations are formulated as topics and keywords that describe the image
- Visual representation is augmented with a short description that explains the visual representation
Understanding topics
- Machine Learning algorithms are not straightforwardly understandable for humans.
- Aim is to understand the model and its predictions and develop a methodology to generate explanations for privacy decisions.
- Propose to uncover groups of keywords (i.e., latent topics) from a collection of textual information that best represents the information in the collection.
- Measure coherence of topics based on intra-topic and inter-topic similarity.
Generating topics
- Topic Modelling is a technique used in computer science
- It discovers topics within a collection of text
- It extracts different topics (features) from keyword sets
Topic modelling
- NMF is a technique used to factorize a non-negative matrix
- TF-IDF is used to transform keywords into numerical vectors
- NMF model is built for different number of topics
- Random Forest algorithm is used to make predictions
Evaluation of topics
- Used PicAlert dataset for privacy prediction problem for images
- Labeled by 81 users between 10 and 59 years of age
- Automatically generated 20 different descriptive keywords for each image
- Number of topics based on model performance in terms of coherence
- Represented keywords as 300-dimensional vectors of the word2vec model
- Named 20 topics discovered using NMF
- Figure 4 shows keyword clouds for five different topics
- Figure 5 shows percentage of each topic associated with private and public images
- Random Forest classifier yields accuracy of 88.5% on test set
Generating explanations from topics
- TreeExplainer model provides contributions of each feature in terms of Shapley values
- Not all features have equal contribution to a class prediction
- Machine learning model takes into account contribution of each feature
- Explanations can be created by displaying Shapley values to user
- Number of features can make this cumbersome and confusing
- Each feature corresponds to a topic
- Interested in identifying topics that are useful in explaining content of image
- Can have positive or negative Shapley values
- Dominant category when one topic is decisive for class prediction
- Collaborative category when contributions of topics arrive at consensus
- Conflicting category when topics have opposing forces
- Vague category when image belongs to many topics with low confidence
Evaluation
- Conducted online user study to evaluate proposed explanation model
- Conducted pilot study with 5 users to test study’s understandability
- Improved initial description of study and reworded one question
User study
- Three phases of user study: present plain language statement and consent form, explain study over example, expose participants to 16 images with generated explanations
- Two images with irrelevant explanations to filter out participants who are not focused
- Personalize Explanation Satisfaction Scale proposed by Hoffman et al.
- Ask participants to rank three questions on a 5-point Likert scale
- Final phase: participants respond to demographic questions and provide free-form text for comments/feedback
- User study designed using Qualtrics online survey tool
Participants
- 57 participants responded to questions
- 12 participants excluded for not catching check questions
- 45 participants remaining, 64% male, 36% female
- 19 participants had Master’s degree
- 11 participants had Bachelor’s degree
- 6 participants had High school degree
- 5 participants had Some college (1-4 years, no degree)
- 2 participants had Doctorate degree
- 2 participants had Professional school degree
Results
- Confidence levels change based on intervals of mean value
- Participants were very confident that explanations were sufficiently detailed, satisfactory, and understandable
- Results indicate that participants understood why images were labeled as private or public
Interval level
- Participants found explanations for public images to be more sufficient, satisfying, and understandable than private images.
- Participants were confident that explanations for private images were sufficient, satisfying, and understandable.
- Explanations with decisive topics or like-minded topics were found to be sufficiently detailed and satisfying.
Discussion
- Several studies use descriptive keywords and visual features to predict image privacy
- Squicciarini et al. present a system that recommends privacy policies using image tags
- Prediction accuracy decreases with large tag sets and more tags per image
- Tonge and Caragea use deep visual semantic and textual features to develop a model to predict privacy
- Kurtan and Yolum propose an agent-based approach to predict privacy
- Ayci et al. propose a personal privacy assistant to preserve user privacy
- 7 develop a personalized privacy prediction system
- Miller examine studies of explainability
- Arrieta et al. provide an overview of XAI
- Orekondy et al. present a model for privacy risk prediction
- Li et al. propose a method to find out what kind of visual content is private
- Zhao et al. define a privacy taxonomy with descriptive keywords
Conclusion
- Proposed novel methodology to understand why an image is private or public
- Method explores latent topics using topic modelling from descriptive keywords of images
- Makes privacy predictions based on relationship between images and associated topics
- Automatically generates explanations for privacy decisions
- High accuracy of privacy classifier
- User study shows generated explanations make sense to people and are sufficient, satisfying, and understandable