Click here to flash read.
Most group fairness notions detect unethical biases by computing statistical
parity metrics on a model's output. However, this approach suffers from several
shortcomings, such as philosophical disagreement, mutual incompatibility, and
lack of interpretability. These shortcomings have spurred the research on
complementary bias detection methods that offer additional transparency into
the sources of discrimination and are agnostic towards an a priori decision on
the definition of fairness and choice of protected features. A recent proposal
in this direction is LUCID (Locating Unfairness through Canonical Inverse
Design), where canonical sets are generated by performing gradient descent on
the input space, revealing a model's desired input given a preferred output.
This information about the model's mechanisms, i.e., which feature values are
essential to obtain specific outputs, allows exposing potential unethical
biases in its internal logic. Here, we present LUCID-GAN, which generates
canonical inputs via a conditional generative model instead of gradient-based
inverse design. LUCID-GAN has several benefits, including that it applies to
non-differentiable models, ensures that canonical sets consist of realistic
inputs, and allows to assess proxy and intersectional discrimination. We
empirically evaluate LUCID-GAN on the UCI Adult and COMPAS data sets and show
that it allows for detecting unethical biases in black-box models without
requiring access to the training data.