Billions of distributed, heterogeneous and resource constrained IoT devices deploy on-device machine learning (ML) for private, fast and offline inference on personal data. On-device ML is highly context dependent, and sensitive to user, usage, hardware and environment attributes. This sensitivity and the propensity towards bias in ML makes it important to study bias in on-device settings. Our study is one of the first investigations of bias in this emerging domain, and lays important foundations for building fairer on-device ML. We apply a software engineering lens, investigating the propagation of bias through design choices in on-device ML workflows. We first identify reliability bias as a source of unfairness and propose a measure to quantify it. We then conduct empirical experiments for a keyword spotting task to show how complex and interacting technical design choices amplify and propagate reliability bias. Our results validate that design choices made during model training, like the sample rate and input feature type, and choices made to optimize models, like light-weight architectures, the pruning learning rate and pruning sparsity, can result in disparate predictive performance across male and female groups. Based on our findings we suggest low effort strategies for engineers to mitigate bias in on-device ML.
翻译:数十亿分布式、异构且资源受限的物联网设备部署设备端机器学习(ML),以实现对个人数据的隐私、快速及离线推理。设备端ML高度依赖上下文,且对用户、使用场景、硬件及环境属性敏感。这种敏感性及其固有的偏差倾向,使得研究设备端环境中的偏差问题至关重要。我们的研究首次深入探索了这一新兴领域的偏差现象,为构建更公平的设备端ML奠定了重要基础。我们采用软件工程视角,探究设备端ML工作流中设计选择如何导致偏差传播。首先,我们识别可靠性偏差为不公平性来源,并提出量化该偏差的度量方法。随后,针对关键词检测任务开展实证实验,揭示复杂且相互交织的技术设计选择如何放大并传播可靠性偏差。研究结果验证了模型训练中的设计选择(如采样率与输入特征类型),以及模型优化中的选择(如轻量化架构、剪枝学习率与剪枝稀疏度)会导致男性和女性群体间的预测性能差异。基于研究发现,我们提出了工程师可低投入缓解设备端ML偏差的策略建议。