Click here to flash read.
Self-supervised learning (SSL) pipelines differ in many design choices such
as the architecture, augmentations, or pretraining data. Yet SSL is typically
evaluated using a single metric: linear probing on ImageNet. This does not
provide much insight into why or when a model is better, now how to improve it.
To address this, we propose an SSL risk decomposition, which generalizes the
classical supervised approximation-estimation decomposition by considering
errors arising from the representation learning step. Our decomposition
consists of four error components: approximation, representation usability,
probe generalization, and encoder generalization. We provide efficient
estimators for each component and use them to analyze the effect of 30 design
choices on 169 SSL vision models evaluated on ImageNet. Our analysis gives
valuable insights for designing and using SSL models. For example, it
highlights the main sources of error and shows how to improve SSL in specific
settings (full- vs few-shot) by trading off error components. All results and
pretrained models are at https://github.com/YannDubs/SSL-Risk-Decomposition.