In machine learning, training data often capture the behaviour of multiple subgroups of some underlying human population. This behaviour can often be modelled as observations of an unknown dynamical system with an unobserved state. When the training data for the subgroups are not controlled carefully, however, under-representation bias arises. To counter under-representation bias, we introduce two natural notions of fairness in time-series forecasting problems: subgroup fairness and instantaneous fairness. These notions extend predictive parity to the learning of dynamical systems. We also show globally convergent methods for the fairness-constrained learning problems using hierarchies of convexifications of non-commutative polynomial optimisation problems. We also show that by exploiting sparsity in the convexifications, we can reduce the run time of our methods considerably. Our empirical results on a biased data set motivated by insurance applications and the well-known COMPAS data set demonstrate the efficacy of our methods.
翻译:在机器学习中,训练数据常反映人类群体中多个子群体的行为特征。此类行为可建模为具有未观测状态的未知动力系统的观测结果。当子群体的训练数据未得到审慎控制时,会产生欠表征偏差。为应对此类偏差,我们针对时间序列预测问题引入两种自然的公平性概念:子群体公平性与瞬时公平性。这些概念将预测均等性扩展至动力系统学习领域。我们进一步证明,通过非交换多项式优化问题的凸优化分层结构,可实现公平性约束学习问题的全局收敛求解方法。研究表明,利用凸优化中的稀疏性可显著降低方法运行时间。基于保险应用场景的偏倚数据集与著名的COMPAS数据集上的实证结果,验证了所提方法的有效性。