Click here to flash read.
Modern deep networks are highly complex and their inferential outcome very
hard to interpret. This is a serious obstacle to their transparent deployment
in safety-critical or bias-aware applications. This work contributes to
post-hoc interpretability, and specifically Network Dissection. Our goal is to
present a framework that makes it easier to discover the individual
functionality of each neuron in a network trained on a vision task; discovery
is performed in terms of textual description generation. To achieve this
objective, we leverage: (i) recent advances in multimodal vision-text models
and (ii) network layers founded upon the novel concept of stochastic local
competition between linear units. In this setting, only a small subset of layer
neurons are activated for a given input, leading to extremely high activation
sparsity (as low as only $\approx 4\%$). Crucially, our proposed method infers
(sparse) neuron activation patterns that enables the neurons to
activate/specialize to inputs with specific characteristics, diversifying their
individual functionality. This capacity of our method supercharges the
potential of dissection processes: human understandable descriptions are
generated only for the very few active neurons, thus facilitating the direct
investigation of the network's decision process. As we experimentally show, our
approach: (i) yields Vision Networks that retain or improve classification
performance, and (ii) realizes a principled framework for text-based
description and examination of the generated neuronal representations.
No creative common's license