Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Ensembling independent deep neural networks (DNNs) can improve top-line metrics and outperform larger single models.
- Ensembling can improve subgroup performances, such as worst-k and minority group performance.
- Gains in performance from ensembling for the minority group continue for longer than for the majority group.
- Ensembling can be a powerful tool for alleviating disparate impact from DNN classifiers.
- Varying sources of stochasticity can result in different fairness outcomes.
Paper Content
Introduction
- Deep Neural Networks (DNNs) are powerful function approximators
- Model ensembling is a popular recipe to boost performance
- Understanding performance on subgroups is important for fairness
- Ensembling improves aggregate performance and efficiency
- Ensembling disproportionately benefits bottom-k and minority groups
- Ensembling self-corrects bias from aggregation
Preliminaries
- DNNs are mappings with trainable weights
- Training dataset consists of data points
- Weights are optimized by minimizing an objective function
- Ensemble of m classification models
- Each model is trained with an empirical risk minimization objective
- Consider impact of ensembling on balanced and imbalanced subgroups
- Error rates for under-represented groups can be higher
- Models can reflect social biases
Experimental set-up
- Evaluated methodology on CIFAR100 and TinyImagenet datasets
- Used Resnet9/18/34/50, VGG16 and MLP-Mixer architectures
- Trained 20 models independently
- Calculated class accuracy on base model and found best/worst 10 performing classes
- Constructed minority group by modifying CIFAR10 training set
- Observed accuracy gain for different DNN architectures as number of models in ensemble grows
- Established that ensembles of DNNs with same architecture and hyperparameters benefit minority/bottom-k group
- Performed ablation study to observe fairness in deep ensembles
Ensembling provides disproportionate gains to bottom-k and minority classes
- Ensembling disproportionately benefits bottom-k performance
- Maximum gain of 55% for bottom-k compared to 5% for top-k
- Maximum gain of 12.92% absolute accuracy gain for bottom-k over all architectures
- Minority group benefits more than majority from ensembling
- Bottom-k gains plateau far slower than top-k
Fair ensemble: improved robustness
- Ensembling improves uncertainty estimates and can be beneficial for OOD.
- Ensembles improve fairness by increasing performance on bottom group more than top group.
- Effect is more prominent for higher severity of corruption.
Difference in churn between models explains ensemble fairness
- Deep ensembling has a disparate impact on minority groups compared to majority groups
- Churn is a metric used to measure model disagreement
- Model ensembling is beneficial when models disagree
- Churn is higher for bottom-k group than top-k group
- Ensembling models is useful for improving accuracy on bottom-k samples
Characterizing stochasticity in deep neural networks training
- Results show gains in homogeneous ensembles
- Stochasticity is necessary to produce meaningful ensembles
Controlling for the sources of stochasticity in ensembles
- Change Model Initialization
- Change Batch Ordering
- Change Model Initialization and Batch Ordering
- Change Data Augmentation
- Change Model Initialization, Batch Ordering and Data Augmentation
Can different sources of stochasticity improve deep ensemble fairness?
- Standard ensembles improve fairness over individual models
- Different sources of stochasticity have an impact on individual training episodes of a DNN
- Batch-ordering minimizes the gap between top and bottom-k class accuracy
- It is possible to skew the ensemble to favor the bottom group, improving fairness
Related work
- Deep ensembling of neural networks improves top-line metrics
- Prior work amplifies differences between models in the ensemble
- Focus on simple ensembles with shared design choices
- Understanding implications of ensembling on fairness objectives
- Stochasticity in uniform ensembles impacts subgroup performance
Conclusion and future work
- Ensembling can improve fairness outcomes in sensitive domains.
- Certain distributions of stochasticity favor top-k and bottom-k.
- Future work should optimize to amplify sources of stochasticity.
- Ensembles improve minority group more than majority group.
- Controlling sources of randomness can further improve fairness.
- Top-10 and Bottom-10 class names for CIFAR100 and TinyImagenet.