Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Privacy assistants help users manage their online privacy.
Tasks include detecting privacy violations and recommending sharing actions.
It is important for privacy assistants to explain their decisions to users.
This paper develops a methodology to create explanations of privacy.
The methodology is based on identifying topics, providing explanation schemes, and generating them automatically.
The approach is evaluated on a user study to determine what factors make explanations useful.

Paper Content

Introduction

Managing privacy online is becoming more challenging
People use systems such as online social networks and Internet of Things applications
People are worried about their privacy and self-censor
Privacy assistants have been developed to help with privacy
Explanations are needed to explain why a piece of content is private or public
This paper proposes a methodology and system to explain why an image is private or public
A user study is conducted to measure if users find the explanations useful

Methodology for explaining privacy

Generate explanation for why an image is classified as private or public
Use computer science to analyze image classification

Understanding explanation

Explanations are meant for end users, not to educate them about the classifier
Explanations should be visually understandable and supported by a short text
Explanations are formulated as topics and keywords that describe the image
Visual representation is augmented with a short description that explains the visual representation

Understanding topics

Machine Learning algorithms are not straightforwardly understandable for humans.
Aim is to understand the model and its predictions and develop a methodology to generate explanations for privacy decisions.
Propose to uncover groups of keywords (i.e., latent topics) from a collection of textual information that best represents the information in the collection.
Measure coherence of topics based on intra-topic and inter-topic similarity.

Generating topics

Topic Modelling is a technique used in computer science
It discovers topics within a collection of text
It extracts different topics (features) from keyword sets

Topic modelling

NMF is a technique used to factorize a non-negative matrix
TF-IDF is used to transform keywords into numerical vectors
NMF model is built for different number of topics
Random Forest algorithm is used to make predictions

Evaluation of topics

Used PicAlert dataset for privacy prediction problem for images
Labeled by 81 users between 10 and 59 years of age
Automatically generated 20 different descriptive keywords for each image
Number of topics based on model performance in terms of coherence
Represented keywords as 300-dimensional vectors of the word2vec model
Named 20 topics discovered using NMF
Figure 4 shows keyword clouds for five different topics
Figure 5 shows percentage of each topic associated with private and public images
Random Forest classifier yields accuracy of 88.5% on test set

Generating explanations from topics

TreeExplainer model provides contributions of each feature in terms of Shapley values
Not all features have equal contribution to a class prediction
Machine learning model takes into account contribution of each feature
Explanations can be created by displaying Shapley values to user
Number of features can make this cumbersome and confusing
Each feature corresponds to a topic
Interested in identifying topics that are useful in explaining content of image
Can have positive or negative Shapley values
Dominant category when one topic is decisive for class prediction
Collaborative category when contributions of topics arrive at consensus
Conflicting category when topics have opposing forces
Vague category when image belongs to many topics with low confidence

Evaluation

Conducted online user study to evaluate proposed explanation model
Conducted pilot study with 5 users to test study’s understandability
Improved initial description of study and reworded one question

User study

Three phases of user study: present plain language statement and consent form, explain study over example, expose participants to 16 images with generated explanations
Two images with irrelevant explanations to filter out participants who are not focused
Personalize Explanation Satisfaction Scale proposed by Hoffman et al.
Ask participants to rank three questions on a 5-point Likert scale
Final phase: participants respond to demographic questions and provide free-form text for comments/feedback
User study designed using Qualtrics online survey tool

Participants

57 participants responded to questions
12 participants excluded for not catching check questions
45 participants remaining, 64% male, 36% female
19 participants had Master’s degree
11 participants had Bachelor’s degree
6 participants had High school degree
5 participants had Some college (1-4 years, no degree)
2 participants had Doctorate degree
2 participants had Professional school degree

Results

Confidence levels change based on intervals of mean value
Participants were very confident that explanations were sufficiently detailed, satisfactory, and understandable
Results indicate that participants understood why images were labeled as private or public

Interval level

Participants found explanations for public images to be more sufficient, satisfying, and understandable than private images.
Participants were confident that explanations for private images were sufficient, satisfying, and understandable.
Explanations with decisive topics or like-minded topics were found to be sufficiently detailed and satisfying.

Discussion

Several studies use descriptive keywords and visual features to predict image privacy
Squicciarini et al. present a system that recommends privacy policies using image tags
Prediction accuracy decreases with large tag sets and more tags per image
Tonge and Caragea use deep visual semantic and textual features to develop a model to predict privacy
Kurtan and Yolum propose an agent-based approach to predict privacy
Ayci et al. propose a personal privacy assistant to preserve user privacy
7 develop a personalized privacy prediction system
Miller examine studies of explainability
Arrieta et al. provide an overview of XAI
Orekondy et al. present a model for privacy risk prediction
Li et al. propose a method to find out what kind of visual content is private
Zhao et al. define a privacy taxonomy with descriptive keywords

Conclusion

Proposed novel methodology to understand why an image is private or public
Method explores latent topics using topic modelling from descriptive keywords of images
Makes privacy predictions based on relationship between images and associated topics
Automatically generates explanations for privacy decisions
High accuracy of privacy classifier
User study shows generated explanations make sense to people and are sufficient, satisfying, and understandable

Link to paper#

Abstract#

Paper Content#

Introduction#

Methodology for explaining privacy#

Understanding explanation#

Understanding topics#

Generating topics#

Topic modelling#

Evaluation of topics#

Generating explanations from topics#

Evaluation#

User study#

Participants#

Results#

Interval level#

Discussion#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

Methodology for explaining privacy

Understanding explanation

Understanding topics

Generating topics

Topic modelling

Evaluation of topics

Generating explanations from topics

Evaluation

User study

Participants

Results

Interval level

Discussion

Conclusion