Click here to flash read.
We introduce a boosting algorithm to pre-process data for fairness. Starting
from an initial fair but inaccurate distribution, our approach shifts towards
better data fitting while still ensuring a minimal fairness guarantee. To do
so, it learns the sufficient statistics of an exponential family with
boosting-compliant convergence. Importantly, we are able to theoretically prove
that the learned distribution will have a representation rate and statistical
rate data fairness guarantee. Unlike recent optimization based pre-processing
methods, our approach can be easily adapted for continuous domain features.
Furthermore, when the weak learners are specified to be decision trees, the
sufficient statistics of the learned distribution can be examined to provide
clues on sources of (un)fairness. Empirical results are present to display the
quality of result on real-world data.
No creative common's license