Changing Data Sources in the Age of Machine Learning for Official Statistics - 专知论文

会员服务 ·

0

统计量 · Machine Learning · Learning · 模型评估 · Automator ·

2023 年 6 月 7 日

Changing Data Sources in the Age of Machine Learning for Official Statistics

翻译：机器学习时代的官方统计数据源变更

Cedric De Boom,Michael Reusens

from arxiv, Presented at UNECE Machine Learning for Official Statistics Workshop 2023

Data science has become increasingly essential for the production of official statistics, as it enables the automated collection, processing, and analysis of large amounts of data. With such data science practices in place, it enables more timely, more insightful and more flexible reporting. However, the quality and integrity of data-science-driven statistics rely on the accuracy and reliability of the data sources and the machine learning techniques that support them. In particular, changes in data sources are inevitable to occur and pose significant risks that are crucial to address in the context of machine learning for official statistics. This paper gives an overview of the main risks, liabilities, and uncertainties associated with changing data sources in the context of machine learning for official statistics. We provide a checklist of the most prevalent origins and causes of changing data sources; not only on a technical level but also regarding ownership, ethics, regulation, and public perception. Next, we highlight the repercussions of changing data sources on statistical reporting. These include technical effects such as concept drift, bias, availability, validity, accuracy and completeness, but also the neutrality and potential discontinuation of the statistical offering. We offer a few important precautionary measures, such as enhancing robustness in both data sourcing and statistical techniques, and thorough monitoring. In doing so, machine learning-based official statistics can maintain integrity, reliability, consistency, and relevance in policy-making, decision-making, and public discourse.

翻译：数据科学在官方统计生产中的重要性日益凸显，因为它实现了海量数据的自动化采集、处理与分析。借助这些数据科学实践，统计报告能够更及时、更深入且更灵活地呈现。然而，基于数据科学的统计质量与完整性，取决于数据源的准确性和可靠性，以及支撑它们的机器学习技术。特别是在官方统计的机器学习应用中，数据源的变更不可避免，且会带来亟需应对的重大风险。本文概述了机器学习时代官方统计中数据源变更的主要风险、责任与不确定性因素。我们从技术层面以及所有权、伦理、法规与公众认知层面，梳理了数据源变更最常见的根源与成因清单。随后，我们重点阐述了数据源变更对统计报告的影响，包括概念漂移、偏差、可用性、有效性、准确性与完整性等技术效应，以及统计产品的中立性与潜在中断风险。我们提出若干重要预防措施，例如增强数据获取与统计技术的鲁棒性，并实施严密监测。通过上述举措，基于机器学习的官方统计能够在政策制定、决策与公共讨论中保持完整性、可靠性、一致性与相关性。

0

相关内容

统计量

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

128+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

116+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

离子注入合成In纳米颗粒在Al薄膜中超导性质的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于化感物质Tricin设计、合成具有高选择性的生态安全除草剂

国家自然科学基金

0+阅读 · 2015年12月31日

三维立体GR/有序介孔Bi2MoO6 异质结的设计合成及可见光催化性能

国家自然科学基金

0+阅读 · 2014年12月31日

过渡金属元素M（M=Fe、Co、Ni）掺杂ZnMn2O4纳米晶体的合成及性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

贵金属杂化二硫化钼/氧化石墨烯插层纳米复合物的可控合成与光催化性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

高活性负载型金属氧化物薄膜催化剂的设计及催化性能的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

具有微孔、介孔和大孔的多级孔MOF材料的设计合成与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

深海放线菌Streptomyces sp. SCSIO 03032抗肿瘤天然产物Spiroindimicins生物合成研究

国家自然科学基金

0+阅读 · 2012年12月31日

柔性磁致伸缩FeGa薄膜与多层膜的磁性与输运性质的应力调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

降解氯代芳烃的双孔Fe-Ca-Ox/TiO2催化剂设计合成与催化性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

Matching Patients to Clinical Trials with Large Language Models

Arxiv

0+阅读 · 2023年7月28日

The Applicability of Federated Learning to Official Statistics

Arxiv

0+阅读 · 2023年7月28日

Efficient Estimation of the Local Robustness of Machine Learning Models

Arxiv

0+阅读 · 2023年7月26日

Survey of Human Models for Verification of Human-Machine Systems

Arxiv

0+阅读 · 2023年7月25日

Beyond One-Model-Fits-All: A Survey of Domain Specialization for Large Language Models

Arxiv

66+阅读 · 2023年5月31日

A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

Arxiv

17+阅读 · 2023年1月18日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

28+阅读 · 2021年6月16日

Recent Advances in Large Margin Learning

Arxiv

12+阅读 · 2021年3月25日

A Survey of Machine Learning for Computer Architecture and Systems

Arxiv

18+阅读 · 2021年2月16日

Multimodal Machine Learning: A Survey and Taxonomy

Arxiv

151+阅读 · 2017年8月1日

VIP会员

文章信息

相关主题

Machine Learning

最新内容

《越野作战环境下路径规划的多准则整数规划模型》

《越野作战环境下路径规划的多准则整数规划模型》

专知会员服务

3+阅读 · 25分钟前

人工智能大语言模型引擎如何重塑全球冲突信息环境最新50页

人工智能大语言模型引擎如何重塑全球冲突信息环境最新50页

专知会员服务

2+阅读 · 31分钟前

《防空系统对自主武器系统辩论中“有意义的人类控制”的启示》70页报告

《防空系统对自主武器系统辩论中“有意义的人类控制”的启示》70页报告

专知会员服务

2+阅读 · 38分钟前

“对标ChatGPT”：乌军研发Marichka AI系统用于战场筹划

“对标ChatGPT”：乌军研发Marichka AI系统用于战场筹划

专知会员服务

2+阅读 · 42分钟前

《同步多无人机系统中的故障与通信》

《同步多无人机系统中的故障与通信》

专知会员服务

2+阅读 · 今天6:23

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

专知会员服务

2+阅读 · 7月28日

博士论文 | 从算法到基础模型：强化学习的统一视角

博士论文 | 从算法到基础模型：强化学习的统一视角

专知会员服务

6+阅读 · 7月28日

面向国防作战的最佳自主与蜂群无人机技术

面向国防作战的最佳自主与蜂群无人机技术

专知会员服务

7+阅读 · 7月28日

《异构人类团队的协作决策过程混合建模研究》

《异构人类团队的协作决策过程混合建模研究》

专知会员服务

7+阅读 · 7月28日

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

专知会员服务

8+阅读 · 7月28日

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

专知会员服务

9+阅读 · 7月28日

博士论文 | 面向大模型推理的内存高效算法

博士论文 | 面向大模型推理的内存高效算法

专知会员服务

5+阅读 · 7月27日

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

专知会员服务

10+阅读 · 7月27日

《无人系统互操作性导论——无人系统联合架构（JAUS）》

《无人系统互操作性导论——无人系统联合架构（JAUS）》

专知会员服务

14+阅读 · 7月27日

美空军新型反无人机部队初探

美空军新型反无人机部队初探

专知会员服务

9+阅读 · 7月27日

相关VIP内容

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

128+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

116+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能大语言模型引擎如何重塑全球冲突信息环境最新50页

“对标ChatGPT”：乌军研发Marichka AI系统用于战场筹划

《越野作战环境下路径规划的多准则整数规划模型》

《防空系统对自主武器系统辩论中“有意义的人类控制”的启示》70页报告

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Matching Patients to Clinical Trials with Large Language Models

Arxiv

0+阅读 · 2023年7月28日

The Applicability of Federated Learning to Official Statistics

Arxiv

0+阅读 · 2023年7月28日

Efficient Estimation of the Local Robustness of Machine Learning Models

Arxiv

0+阅读 · 2023年7月26日

Survey of Human Models for Verification of Human-Machine Systems

Arxiv

0+阅读 · 2023年7月25日

Beyond One-Model-Fits-All: A Survey of Domain Specialization for Large Language Models

Arxiv

66+阅读 · 2023年5月31日

A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

Arxiv

17+阅读 · 2023年1月18日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

28+阅读 · 2021年6月16日

Recent Advances in Large Margin Learning

Arxiv

12+阅读 · 2021年3月25日

A Survey of Machine Learning for Computer Architecture and Systems

Arxiv

18+阅读 · 2021年2月16日

Multimodal Machine Learning: A Survey and Taxonomy

Arxiv

151+阅读 · 2017年8月1日

相关基金

离子注入合成In纳米颗粒在Al薄膜中超导性质的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于化感物质Tricin设计、合成具有高选择性的生态安全除草剂

国家自然科学基金

0+阅读 · 2015年12月31日

三维立体GR/有序介孔Bi2MoO6 异质结的设计合成及可见光催化性能

国家自然科学基金

0+阅读 · 2014年12月31日

过渡金属元素M（M=Fe、Co、Ni）掺杂ZnMn2O4纳米晶体的合成及性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

贵金属杂化二硫化钼/氧化石墨烯插层纳米复合物的可控合成与光催化性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

高活性负载型金属氧化物薄膜催化剂的设计及催化性能的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

具有微孔、介孔和大孔的多级孔MOF材料的设计合成与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

深海放线菌Streptomyces sp. SCSIO 03032抗肿瘤天然产物Spiroindimicins生物合成研究

国家自然科学基金

0+阅读 · 2012年12月31日

柔性磁致伸缩FeGa薄膜与多层膜的磁性与输运性质的应力调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

降解氯代芳烃的双孔Fe-Ca-Ox/TiO2催化剂设计合成与催化性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员