Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Privacy assistants help users manage their online privacy.
  • Tasks include detecting privacy violations and recommending sharing actions.
  • It is important for privacy assistants to explain their decisions to users.
  • This paper develops a methodology to create explanations of privacy.
  • The methodology is based on identifying topics, providing explanation schemes, and generating them automatically.
  • The approach is evaluated on a user study to determine what factors make explanations useful.

Paper Content

Introduction

  • Managing privacy online is becoming more challenging
  • People use systems such as online social networks and Internet of Things applications
  • People are worried about their privacy and self-censor
  • Privacy assistants have been developed to help with privacy
  • Explanations are needed to explain why a piece of content is private or public
  • This paper proposes a methodology and system to explain why an image is private or public
  • A user study is conducted to measure if users find the explanations useful

Methodology for explaining privacy

  • Generate explanation for why an image is classified as private or public
  • Use computer science to analyze image classification

Understanding explanation

  • Explanations are meant for end users, not to educate them about the classifier
  • Explanations should be visually understandable and supported by a short text
  • Explanations are formulated as topics and keywords that describe the image
  • Visual representation is augmented with a short description that explains the visual representation

Understanding topics

  • Machine Learning algorithms are not straightforwardly understandable for humans.
  • Aim is to understand the model and its predictions and develop a methodology to generate explanations for privacy decisions.
  • Propose to uncover groups of keywords (i.e., latent topics) from a collection of textual information that best represents the information in the collection.
  • Measure coherence of topics based on intra-topic and inter-topic similarity.

Generating topics

  • Topic Modelling is a technique used in computer science
  • It discovers topics within a collection of text
  • It extracts different topics (features) from keyword sets

Topic modelling

  • NMF is a technique used to factorize a non-negative matrix
  • TF-IDF is used to transform keywords into numerical vectors
  • NMF model is built for different number of topics
  • Random Forest algorithm is used to make predictions

Evaluation of topics

  • Used PicAlert dataset for privacy prediction problem for images
  • Labeled by 81 users between 10 and 59 years of age
  • Automatically generated 20 different descriptive keywords for each image
  • Number of topics based on model performance in terms of coherence
  • Represented keywords as 300-dimensional vectors of the word2vec model
  • Named 20 topics discovered using NMF
  • Figure 4 shows keyword clouds for five different topics
  • Figure 5 shows percentage of each topic associated with private and public images
  • Random Forest classifier yields accuracy of 88.5% on test set

Generating explanations from topics

  • TreeExplainer model provides contributions of each feature in terms of Shapley values
  • Not all features have equal contribution to a class prediction
  • Machine learning model takes into account contribution of each feature
  • Explanations can be created by displaying Shapley values to user
  • Number of features can make this cumbersome and confusing
  • Each feature corresponds to a topic
  • Interested in identifying topics that are useful in explaining content of image
  • Can have positive or negative Shapley values
  • Dominant category when one topic is decisive for class prediction
  • Collaborative category when contributions of topics arrive at consensus
  • Conflicting category when topics have opposing forces
  • Vague category when image belongs to many topics with low confidence

Evaluation

  • Conducted online user study to evaluate proposed explanation model
  • Conducted pilot study with 5 users to test study’s understandability
  • Improved initial description of study and reworded one question

User study

  • Three phases of user study: present plain language statement and consent form, explain study over example, expose participants to 16 images with generated explanations
  • Two images with irrelevant explanations to filter out participants who are not focused
  • Personalize Explanation Satisfaction Scale proposed by Hoffman et al.
  • Ask participants to rank three questions on a 5-point Likert scale
  • Final phase: participants respond to demographic questions and provide free-form text for comments/feedback
  • User study designed using Qualtrics online survey tool

Participants

  • 57 participants responded to questions
  • 12 participants excluded for not catching check questions
  • 45 participants remaining, 64% male, 36% female
  • 19 participants had Master’s degree
  • 11 participants had Bachelor’s degree
  • 6 participants had High school degree
  • 5 participants had Some college (1-4 years, no degree)
  • 2 participants had Doctorate degree
  • 2 participants had Professional school degree

Results

  • Confidence levels change based on intervals of mean value
  • Participants were very confident that explanations were sufficiently detailed, satisfactory, and understandable
  • Results indicate that participants understood why images were labeled as private or public

Interval level

  • Participants found explanations for public images to be more sufficient, satisfying, and understandable than private images.
  • Participants were confident that explanations for private images were sufficient, satisfying, and understandable.
  • Explanations with decisive topics or like-minded topics were found to be sufficiently detailed and satisfying.

Discussion

  • Several studies use descriptive keywords and visual features to predict image privacy
  • Squicciarini et al. present a system that recommends privacy policies using image tags
  • Prediction accuracy decreases with large tag sets and more tags per image
  • Tonge and Caragea use deep visual semantic and textual features to develop a model to predict privacy
  • Kurtan and Yolum propose an agent-based approach to predict privacy
  • Ayci et al. propose a personal privacy assistant to preserve user privacy
  • 7 develop a personalized privacy prediction system
  • Miller examine studies of explainability
  • Arrieta et al. provide an overview of XAI
  • Orekondy et al. present a model for privacy risk prediction
  • Li et al. propose a method to find out what kind of visual content is private
  • Zhao et al. define a privacy taxonomy with descriptive keywords

Conclusion

  • Proposed novel methodology to understand why an image is private or public
  • Method explores latent topics using topic modelling from descriptive keywords of images
  • Makes privacy predictions based on relationship between images and associated topics
  • Automatically generates explanations for privacy decisions
  • High accuracy of privacy classifier
  • User study shows generated explanations make sense to people and are sufficient, satisfying, and understandable