Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Federated learning (FL) enables collaborative training of deep models on the network edge without centrally aggregating raw data.
Most of the previous work has focused on a difference in the distribution of labels or client shifts, but this paper addresses a different problem of FL, e.g., different scanners/sensors in medical imaging.
The paper proposes an effective method that uses local batch normalization to alleviate the feature shift before averaging models.
The resulting scheme, called FedBN, outperforms both classical FedAvg, as well as the state-of-the-art for non-iid data (FedProx).

Paper Content

Introduction

Federated learning (FL) is a way of learning from distributed data.
A challenge in FL is the statistical heterogeneity of the data.
Standard federated methods suffer from performance degradation or divergence when deployed over non-iid samples.
Feature shift is a type of non-iid data that has not been explored in the literature.
Batch Normalization (BN) has been proposed as a tool to mitigate domain shifts in domain adaptation tasks.
This paper proposes to apply BN for feature shift FL.
A toy example is used to illustrate how BN may help harmonizing local feature distributions.
The paper proposes a novel federated learning method, called FedBN, for addressing non-iid training data.
FedBN has zero parameters to tune and requires minimal additional computational resources.
FedBN demonstrates significant practical improvements compared to classical FedAvg and the state-of-the-art for non-iid data (FedProx).

FedAvg often suffers when data is heterogeneous
Empirical work mainly focuses on label distribution skew
FedProx tackles heterogeneity by allowing partial information aggregation
Zhao et al. assumes a subset of data is globally shared
FedMA proposed an aggregation strategy for non-iid data
FedRobust assumes data follows an affine distribution shift
SiloBN keeps some untrainable BN parameters
FedBN keeps all BN parameters strictly local
Reddi et al. and Zhang et al. focus on improving the optimization mechanism
BN makes optimization landscape smoother
BN provides regularization and discourages single direction reliance
BN used for domain adaptation
Role of BN in federated learning unexplored

Preliminary

Non-IID data is a novel category of client’s data distribution
Feature shift covers two categories: covariate shift and concept shift
FedAvg is a popular federated learning strategy where clients collaboratively send updates of locally trained models to a global server
FedAvg has shown successes in classical Federated Learning tasks, but suffers from slow convergence and low accuracy in most non-IID contents

Federated averaging with local batch normalization

Proposed method -fedbn

Proposed learning strategy: FedBN
Similar to FedAvg, performs local updates and averages local models
Excludes BN layers from averaging step
Results in significant empirical improvements in non-iid settings
Improves convergence rate under feature shift

Problem setup

Convergence analysis

We study the trajectory of two networks, FedAvg and FedBN, through the neural tangent kernel.
Recent machine learning theory studies have shown that the convergence rate of finite-width over-parameterized networks is controlled by the least eigenvalue of the induced kernel.
We consider the case that the number of local updates is 1.
The neural tangent kernel can be decomposed into a magnitude component and direction component.
The exponential factor of convergence for FedAvg and FedBN is controlled by the smallest eigenvalue of G(t) and G*(t) respectively.
FedBN has a faster convergence rate than FedAvg.

Proof sketch

Show that λ min (G ∞ ) is less than or equal to λ min (G * ∞ )
Compare equation 4 and 5

Experiments

Using local BN parameters is beneficial when data is heterogeneous across clients
FedBN is a novel local parameter sharing strategy that is more robust and faster than alternative methods
FedBN is shown to have better model performance on benchmark and real-world datasets

Benchmark experiments

Extensive empirical analysis conducted on benchmark digits classification task
Five datasets used: SVHN Netzer et al. (2011), USPS Hull (1994), SynthDigits Ganin & Lempitsky (2015), MNIST-M Ganin & Lempitsky (2015) and MNIST LeCun et al. (1998)
7438 training samples in each dataset
Classification model is a convolutional neural network
Cross-entropy loss and SGD optimizer with a learning rate of 10 −2
Default setting for local update epochs is E = 1
Default setting for the amount of data at each client is 10% of the dataset original size
Investigation of convergence rate and analysis of local updating epochs

Analysis of local dataset size:

FedBN behaviour is observed over different data capacities at each client.
Testing accuracy drops when each client is attributed 20% of its original data amount.
FedBN can benefit from collaborative training on distributed data, especially when each client holds a small amount of data.

Effects of statistical heterogeneity:

FedBN is compared to FedAvg in a federated setting with varying levels of heterogeneity
10 subsets of data are used, with equal number of samples and same label distribution
FedBN achieves higher testing accuracy than FedAvg over all levels of heterogeneity
FedBN consistently outperforms state-of-the-art and baseline methods

Experiments on real-world datasets

Proposed algorithm can be beneficial in real-world feature-shift noniid
Validated effectiveness of FedBN on three real-world datasets
Datasets include Office-Caltech10, four medical institutions from ABIDE I, and DomainNet
Classification models use AlexNet architecture with BN added after each convolution and fully-connected layer
ABIDE I instances represented as 5995-dimensional vector through brain connectome computation
Classifier is a three-layer fully connected neural network with two BN layers after the first two fully connected layers

Results and analysis:

FedBN significantly outperforms state-of-the-art method on Office-Caltech10
FedBN achieves supreme accuracy on DomainNet
Alternative FL methods achieve comparable results with SingleSet except Quickdraw
FedBN outperforms alternative FL methods by 10%
Alternative FL strategies are ineffective in feature shift non-iid datasets
FedBN excels by a non-negligible margin on three clients in ABIDE I

Conclusion and discussion

Proposes a novel federated learning aggregation method called FedBN
Mitigates feature shifts in non-IID data
Provides convergence guarantees for FedBN
Experiments demonstrate FedBN can improve convergence and model performance
Can be combined with different optimization algorithms, communication schemes, and aggregation techniques
Can be integrated into existing tool-kits/systems

B.3 proof of corollary 4.6

FedAvg and FedBN have different convergence rates when E = 1
Comparing the exponential factors in the convergence rates reduces to comparing two matrices

C fedbn algorithm

Proposed algorithm: FedBN
Initialize model parameters: w (l) 0,k , local update pace: E, and total optimization round T
Model Architecture: 6-layer Convolutional Neural Network (CNN)
Datasets: Office-31, Caltech-256, Amazon, DSLR, Webcam
Training Details: Cross-entropy loss, SGD optimizer, learning rate 10-2, batch size 32, training epochs 300, µ 10-2
Data sample number: Office-Caltech10 62, DomainNet 105

D.4 abide dataset and training details

Described real-world medical datasets
Described preprocessing and training details

Dataset:

Study was conducted using rs-fMRI data from ABIDE dataset
Data was downloaded from four largest sites (UM, NYU, USM, UCLA)
Subjects lacking filename were skipped, resulting in 88, 167, 52, 63 subjects for each site
Sliding windows were used to truncate raw time sequences of fMRI
Performance of FedBN and alternative methods was compared in Figure 5

FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

Link to paper

Abstract

Paper Content

Introduction

Preliminary

Federated averaging with local batch normalization

Proposed method -fedbn

Problem setup

Convergence analysis

Proof sketch

Experiments

Benchmark experiments

Analysis of local dataset size:

Effects of statistical heterogeneity:

Experiments on real-world datasets

Results and analysis:

Conclusion and discussion

B.3 proof of corollary 4.6

C fedbn algorithm

D.4 abide dataset and training details

Dataset:

Nyu um1 usm ucla1

Link to paper#

Abstract#

Paper Content#

Introduction#

Related work#

Preliminary#

Federated averaging with local batch normalization#

Proposed method -fedbn#

Problem setup#

Convergence analysis#

Proof sketch#

Experiments#

Benchmark experiments#

Analysis of local dataset size:#

Effects of statistical heterogeneity:#

Experiments on real-world datasets#

Results and analysis:#

Conclusion and discussion#

B.3 proof of corollary 4.6#

C fedbn algorithm#

D.4 abide dataset and training details#

Dataset:#

Nyu um1 usm ucla1#

Link to paper

Abstract

Paper Content

Introduction

Related work

Preliminary

Federated averaging with local batch normalization

Proposed method -fedbn

Problem setup

Convergence analysis

Proof sketch

Experiments

Benchmark experiments

Analysis of local dataset size:

Effects of statistical heterogeneity:

Experiments on real-world datasets

Results and analysis:

Conclusion and discussion

B.3 proof of corollary 4.6

C fedbn algorithm

D.4 abide dataset and training details

Dataset:

Nyu um1 usm ucla1