Click here to flash read.
Model averaging, a widely adopted technique in federated learning (FL),
aggregates multiple client models trained on heterogeneous data to obtain a
well-performed global model. However, the rationale behind its success is not
well understood. To shed light on this issue, we investigate the geometric
properties of model averaging by visualizing the loss/error landscape. The
geometrical visualization shows that the client models surround the global
model within a common basin, and the global model may deviate from the bottom
of the basin even though it performs better than the client models. To further
understand this phenomenon, we decompose the expected prediction error of the
global model into five factors related to client models. Specifically, we find
that the global-model error after early training mainly comes from i) the
client-model error on non-overlapping data between client datasets and the
global dataset and ii) the maximal distance between the global and client
models. Inspired by these findings, we propose adopting iterative moving
averaging (IMA) on global models to reduce the prediction error and limiting
client exploration to control the maximal distance at the late training. Our
experiments demonstrate that IMA significantly improves the accuracy and
training speed of existing FL methods on benchmark datasets with various data
heterogeneity.
No creative common's license