Crowdsourced machine learning on competition platforms such as Kaggle is a popular and often effective method for generating accurate models. Typically, teams vie for the most accurate model, as measured by overall error on a holdout set, and it is common towards the end of such competitions for teams at the top of the leaderboard to ensemble or average their models outside the platform mechanism to get the final, best global model. In arXiv:2201.10408, the authors developed an alternative crowdsourcing framework in the context of fair machine learning, in order to integrate community feedback into models when subgroup unfairness is present and identifiable. There, unlike in classical crowdsourced ML, participants deliberately specialize their efforts by working on subproblems, such as demographic subgroups in the service of fairness. Here, we take a broader perspective on this work: we note that within this framework, participants may both specialize in the service of fairness and simply to cater to their particular expertise (e.g., focusing on identifying bird species in an image classification task). Unlike traditional crowdsourcing, this allows for the diversification of participants' efforts and may provide a participation mechanism to a larger range of individuals (e.g. a machine learning novice who has insight into a specific fairness concern). We present the first medium-scale experimental evaluation of this framework, with 46 participating teams attempting to generate models to predict income from American Community Survey data. We provide an empirical analysis of teams' approaches, and discuss the novel system architecture we developed. From here, we give concrete guidance for how best to deploy such a framework.
翻译:在Kaggle等竞赛平台上的众包机器学习是生成准确模型的流行且有效的方法。通常,各团队竞相追求在保留集上整体误差最小的最准确模型,而在这类竞赛临近尾声时,排行榜前列的团队常会通过平台机制之外的集成或平均化,以获取最终的最佳全局模型。在arXiv:2201.10408中,作者发展了一种替代性众包框架,旨在公平机器学习语境下,当存在可识别的子群不公平性时,将社区反馈整合到模型中。与传统众包机器学习不同,在该框架中,参与者通过专注于子问题(例如为促进公平而针对人口统计子群)来有意地专业化自身努力。在此,我们对该工作采取更广阔的视角:我们注意到,在此框架内,参与者既可为了公平而专业化,也可单纯依据其特定专长(例如在图像分类任务中专注于识别鸟类物种)进行工作。与传统众包不同,这使得参与者的努力得以多样化,并可能为更广泛的个体(例如对特定公平问题有洞察的机器学习新手)提供参与机制。我们首次对该框架进行了中等规模的实验评估,共有46支参与团队尝试生成模型,以根据美国社区调查数据预测收入。我们对各团队的方法进行了实证分析,并讨论了所开发的新型系统架构。基于此,我们为如何最优部署此类框架提供了具体指导。