Tiny, always-on and fragile: Bias propagation through design choices in on-device machine learning workflows - 专知论文

会员服务 ·

0

有偏 · ML · Learning · Machine Learning · 设计 ·

2023 年 3 月 17 日

Tiny, always-on and fragile: Bias propagation through design choices in on-device machine learning workflows

翻译：微小、常开与脆弱：设备端机器学习工作流中设计选择导致的偏差传播

Wiebke Toussaint,Aaron Yi Ding,Fahim Kawsar,Akhil Mathur

from arxiv, To be published in ACM Transactions on Software Engineering and Methodology

Billions of distributed, heterogeneous and resource constrained IoT devices deploy on-device machine learning (ML) for private, fast and offline inference on personal data. On-device ML is highly context dependent, and sensitive to user, usage, hardware and environment attributes. This sensitivity and the propensity towards bias in ML makes it important to study bias in on-device settings. Our study is one of the first investigations of bias in this emerging domain, and lays important foundations for building fairer on-device ML. We apply a software engineering lens, investigating the propagation of bias through design choices in on-device ML workflows. We first identify reliability bias as a source of unfairness and propose a measure to quantify it. We then conduct empirical experiments for a keyword spotting task to show how complex and interacting technical design choices amplify and propagate reliability bias. Our results validate that design choices made during model training, like the sample rate and input feature type, and choices made to optimize models, like light-weight architectures, the pruning learning rate and pruning sparsity, can result in disparate predictive performance across male and female groups. Based on our findings we suggest low effort strategies for engineers to mitigate bias in on-device ML.

翻译：数十亿分布式、异构且资源受限的物联网设备部署设备端机器学习（ML），以实现对个人数据的隐私、快速及离线推理。设备端ML高度依赖上下文，且对用户、使用场景、硬件及环境属性敏感。这种敏感性及其固有的偏差倾向，使得研究设备端环境中的偏差问题至关重要。我们的研究首次深入探索了这一新兴领域的偏差现象，为构建更公平的设备端ML奠定了重要基础。我们采用软件工程视角，探究设备端ML工作流中设计选择如何导致偏差传播。首先，我们识别可靠性偏差为不公平性来源，并提出量化该偏差的度量方法。随后，针对关键词检测任务开展实证实验，揭示复杂且相互交织的技术设计选择如何放大并传播可靠性偏差。研究结果验证了模型训练中的设计选择（如采样率与输入特征类型），以及模型优化中的选择（如轻量化架构、剪枝学习率与剪枝稀疏度）会导致男性和女性群体间的预测性能差异。基于研究发现，我们提出了工程师可低投入缓解设备端ML偏差的策略建议。

0

相关内容

【开放书】设计机器学习系统，Designing Machine Learning Systems

【开放书】设计机器学习系统，Designing Machine Learning Systems

专知会员服务

78+阅读 · 2022年5月17日

机器学习损失函数概述，Loss Functions in Machine Learning

机器学习损失函数概述，Loss Functions in Machine Learning

专知会员服务

84+阅读 · 2022年3月19日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

116+阅读 · 2020年4月5日

【Manning2020新书】R/mlr机器学习，513页pdf，Machine Learning with R

【Manning2020新书】R/mlr机器学习，513页pdf，Machine Learning with R

专知会员服务

131+阅读 · 2020年3月7日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

【AAAI2020论文】隐私保留GBDT（Privacy-Preserving Gradient Boosting Decision Trees）

【AAAI2020论文】隐私保留GBDT（Privacy-Preserving Gradient Boosting Decision Trees）

专知会员服务

36+阅读 · 2019年11月15日

【O'Reilly AI Conference 2019】高管简报：机器学习系统隐私的进步（Executive Briefing: Advances in privacy for machine learning systems），Katharine Jarmul

【O'Reilly AI Conference 2019】高管简报：机器学习系统隐私的进步（Executive Briefing: Advances in privacy for machine learning systems），Katharine Jarmul

专知会员服务

16+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

MIT博士论文 | 图指导的预测（含GNN的泛化能力和表示能力分析）

MIT博士论文 | 图指导的预测（含GNN的泛化能力和表示能力分析）

图与推荐

0+阅读 · 2022年11月14日

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

【Manning2020新书】R/mlr机器学习，513页pdf，Machine Learning with R

【Manning2020新书】R/mlr机器学习，513页pdf，Machine Learning with R

专知

69+阅读 · 2020年3月7日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

基于模型的安全关键的信息物理融合系统的设计方法中的软件综合

国家自然科学基金

1+阅读 · 2014年12月31日

符号网络理论研究与应用

国家自然科学基金

0+阅读 · 2014年12月31日

基于动态因素的在线社会网络信息传播效果实时预测模型

国家自然科学基金

0+阅读 · 2013年12月31日

基于在线机器学习的组合算法交易策略研究

国家自然科学基金

5+阅读 · 2013年12月31日

产品、激励与种子顾客的交互模式对口碑传播效果的影响及机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

弱标注下基于主动学习的检测器适应问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

多天线无线通信系统的鲁棒性设计

国家自然科学基金

2+阅读 · 2012年12月31日

高速感应电机电流优化分配策略研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于化学与生物信息学的网络分布式计算共享平台研究

国家自然科学基金

2+阅读 · 2012年12月31日

基于主动学习的半监督领域本体自动构建

国家自然科学基金

4+阅读 · 2009年12月31日

Mlinear: Rethink the Linear Model for Time-series Forecasting

Arxiv

0+阅读 · 2023年5月8日

Augmented Datasheets for Speech Datasets and Ethical Decision-Making

Arxiv

0+阅读 · 2023年5月8日

Machine Learning Systems are Bloated and Vulnerable

Arxiv

1+阅读 · 2023年5月8日

Portfolio-Based Incentive Mechanism Design for Cross-Device Federated Learning

Arxiv

0+阅读 · 2023年5月6日

An Overview of AI and Blockchain Integration for Privacy-Preserving

Arxiv

0+阅读 · 2023年5月6日

Training Natural Language Processing Models on Encrypted Text for Enhanced Privacy

Arxiv

0+阅读 · 2023年5月3日

Recent Advances of Blockchain and its Applications

Arxiv

13+阅读 · 2022年8月16日

CELEST: Federated Learning for Globally Coordinated Threat Detection

Arxiv

17+阅读 · 2022年5月23日

On the Opportunities and Risks of Foundation Models

Arxiv

30+阅读 · 2021年8月18日

Memory Augmented Graph Neural Networks for Sequential Recommendation

Memory Augmented Graph Neural Networks for Sequential Recommendation

Arxiv

13+阅读 · 2019年12月26日

VIP会员

文章信息

相关主题

Machine Learning

最新内容

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

0+阅读 · 16分钟前

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

0+阅读 · 19分钟前

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

0+阅读 · 21分钟前

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

专知会员服务

0+阅读 · 36分钟前

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

专知会员服务

0+阅读 · 39分钟前

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

专知会员服务

5+阅读 · 6月16日

多模态代码智能综述：从视觉输入到可执行代码系统

多模态代码智能综述：从视觉输入到可执行代码系统

专知会员服务

6+阅读 · 6月16日

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

专知会员服务

5+阅读 · 6月16日

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

专知会员服务

5+阅读 · 6月16日

《通用大语言模型：无人机指挥与控制接口》最新40页

《通用大语言模型：无人机指挥与控制接口》最新40页

专知会员服务

15+阅读 · 6月16日

《通过小型无人机系统将情报能力“作战化”》

《通过小型无人机系统将情报能力“作战化”》

专知会员服务

6+阅读 · 6月16日

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

专知会员服务

10+阅读 · 6月16日

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

专知会员服务

21+阅读 · 6月15日

消耗优势：美军的“精确规模化”概念

消耗优势：美军的“精确规模化”概念

专知会员服务

8+阅读 · 6月15日

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

专知会员服务

9+阅读 · 6月15日

相关VIP内容

【开放书】设计机器学习系统，Designing Machine Learning Systems

【开放书】设计机器学习系统，Designing Machine Learning Systems

专知会员服务

78+阅读 · 2022年5月17日

机器学习损失函数概述，Loss Functions in Machine Learning

机器学习损失函数概述，Loss Functions in Machine Learning

专知会员服务

84+阅读 · 2022年3月19日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

116+阅读 · 2020年4月5日

【Manning2020新书】R/mlr机器学习，513页pdf，Machine Learning with R

【Manning2020新书】R/mlr机器学习，513页pdf，Machine Learning with R

专知会员服务

131+阅读 · 2020年3月7日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

【AAAI2020论文】隐私保留GBDT（Privacy-Preserving Gradient Boosting Decision Trees）

【AAAI2020论文】隐私保留GBDT（Privacy-Preserving Gradient Boosting Decision Trees）

专知会员服务

36+阅读 · 2019年11月15日

【O'Reilly AI Conference 2019】高管简报：机器学习系统隐私的进步（Executive Briefing: Advances in privacy for machine learning systems），Katharine Jarmul

【O'Reilly AI Conference 2019】高管简报：机器学习系统隐私的进步（Executive Briefing: Advances in privacy for machine learning systems），Katharine Jarmul

专知会员服务

16+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《短程弹道再入飞行器拦截时间中的一项异常现象》

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

从燃煤战舰到算法战争：水面指挥的永恒要求

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

相关资讯

MIT博士论文 | 图指导的预测（含GNN的泛化能力和表示能力分析）

MIT博士论文 | 图指导的预测（含GNN的泛化能力和表示能力分析）

图与推荐

0+阅读 · 2022年11月14日

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

【Manning2020新书】R/mlr机器学习，513页pdf，Machine Learning with R

【Manning2020新书】R/mlr机器学习，513页pdf，Machine Learning with R

专知

69+阅读 · 2020年3月7日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Mlinear: Rethink the Linear Model for Time-series Forecasting

Arxiv

0+阅读 · 2023年5月8日

Augmented Datasheets for Speech Datasets and Ethical Decision-Making

Arxiv

0+阅读 · 2023年5月8日

Machine Learning Systems are Bloated and Vulnerable

Arxiv

1+阅读 · 2023年5月8日

Portfolio-Based Incentive Mechanism Design for Cross-Device Federated Learning

Arxiv

0+阅读 · 2023年5月6日

An Overview of AI and Blockchain Integration for Privacy-Preserving

Arxiv

0+阅读 · 2023年5月6日

Training Natural Language Processing Models on Encrypted Text for Enhanced Privacy

Arxiv

0+阅读 · 2023年5月3日

Recent Advances of Blockchain and its Applications

Arxiv

13+阅读 · 2022年8月16日

CELEST: Federated Learning for Globally Coordinated Threat Detection

Arxiv

17+阅读 · 2022年5月23日

On the Opportunities and Risks of Foundation Models

Arxiv

30+阅读 · 2021年8月18日

Memory Augmented Graph Neural Networks for Sequential Recommendation

Memory Augmented Graph Neural Networks for Sequential Recommendation

Arxiv

13+阅读 · 2019年12月26日

相关基金

基于模型的安全关键的信息物理融合系统的设计方法中的软件综合

国家自然科学基金

1+阅读 · 2014年12月31日

符号网络理论研究与应用

国家自然科学基金

0+阅读 · 2014年12月31日

基于动态因素的在线社会网络信息传播效果实时预测模型

国家自然科学基金

0+阅读 · 2013年12月31日

基于在线机器学习的组合算法交易策略研究

国家自然科学基金

5+阅读 · 2013年12月31日

产品、激励与种子顾客的交互模式对口碑传播效果的影响及机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

弱标注下基于主动学习的检测器适应问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

多天线无线通信系统的鲁棒性设计

国家自然科学基金

2+阅读 · 2012年12月31日

高速感应电机电流优化分配策略研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于化学与生物信息学的网络分布式计算共享平台研究

国家自然科学基金

2+阅读 · 2012年12月31日

基于主动学习的半监督领域本体自动构建

国家自然科学基金

4+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员