Click here to flash read.
In disentangled representation learning, a model is asked to tease apart a
dataset's underlying sources of variation and represent them independently of
one another. Since the model is provided with no ground truth information about
these sources, inductive biases take a paramount role in enabling
disentanglement. In this work, we construct an inductive bias towards encoding
to and decoding from an organized latent space. Concretely, we do this by (i)
quantizing the latent space into discrete code vectors with a separate
learnable scalar codebook per dimension and (ii) applying strong model
regularization via an unusually high weight decay. Intuitively, the latent
space design forces the encoder to combinatorially construct codes from a small
number of distinct scalar values, which in turn enables the decoder to assign a
consistent meaning to each value. Regularization then serves to drive the model
towards this parsimonious strategy. We demonstrate the broad applicability of
this approach by adding it to both basic data-reconstructing (vanilla
autoencoder) and latent-reconstructing (InfoGAN) generative models. For
reliable evaluation, we also propose InfoMEC, a new set of metrics for
disentanglement that is cohesively grounded in information theory and fixes
well-established shortcomings in previous metrics. Together with
regularization, latent quantization dramatically improves the modularity and
explicitness of learned representations on a representative suite of benchmark
datasets. In particular, our quantized-latent autoencoder (QLAE) consistently
outperforms strong methods from prior work in these key disentanglement
properties without compromising data reconstruction.
No creative common's license