PADME-SoSci: A Platform for Analytics and Distributed Machine Learning for the Social Sciences - 专知论文

会员服务 ·

0

分析 · 分布式机器学习 · 匿名化技术 · 匿名化 · 分析工具 ·

2023 年 4 月 3 日

PADME-SoSci: A Platform for Analytics and Distributed Machine Learning for the Social Sciences

翻译：PADME-SoSci：面向社会科学的分布式机器学习分析平台

Zeyd Boukhers,Arnim Bleier,Yeliz Ucer Yediel,Mio Hienstorfer-Heitmann,Mehrshad Jaberansary,Adamantios Koumpis,Oya Beyan

from arxiv, accepted to be published @ ACM/IEEE JCDL 2023 - Joint Conference on Digital Libraries

Data privacy and ownership are significant in social data science, raising legal and ethical concerns. Sharing and analyzing data is difficult when different parties own different parts of it. An approach to this challenge is to apply de-identification or anonymization techniques to the data before collecting it for analysis. However, this can reduce data utility and increase the risk of re-identification. To address these limitations, we present PADME, a distributed analytics tool that federates model implementation and training. PADME uses a federated approach where the model is implemented and deployed by all parties and visits each data location incrementally for training. This enables the analysis of data across locations while still allowing the model to be trained as if all data were in a single location. Training the model on data in its original location preserves data ownership. Furthermore, the results are not provided until the analysis is completed on all data locations to ensure privacy and avoid bias in the results.

翻译：数据隐私与所有权在社会数据科学中具有重要意义，引发了法律和伦理方面的关切。当不同方拥有同一数据集的不同部分时，共享和分析数据变得困难。应对这一挑战的方法是在收集分析数据前对其应用去标识化或匿名化技术。然而，这会降低数据效用并增加重识别风险。为解决这些局限性，我们提出了PADME——一种实现模型训练与部署联邦化的分布式分析工具。PADME采用联邦化方法，由所有参与方共同实现并部署模型，逐步访问每个数据位置进行增量训练。这种方法能够跨数据位置进行分析，同时使模型训练效果如同所有数据集中于单一位置。在数据原始位置训练模型可保留数据所有权。此外，只有在所有数据位置完成分析后才会输出结果，以确保隐私性并避免结果偏差。

0

相关内容

【2023新书】实用数据隐私:增强数据的隐私性和安全性，599页pdf

【2023新书】实用数据隐私:增强数据的隐私性和安全性，599页pdf

专知会员服务

83+阅读 · 2023年5月1日

【2022新书】Python数据科学导论，309页pdf

【2022新书】Python数据科学导论，309页pdf

专知会员服务

83+阅读 · 2022年8月6日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

128+阅读 · 2022年4月21日

【Manning2022新书】Python与PySpark的数据分析，458页pdf，Data Analysis with Python and PySpark

【Manning2022新书】Python与PySpark的数据分析，458页pdf，Data Analysis with Python and PySpark

专知会员服务

123+阅读 · 2022年3月20日

【人工智能+人力资源】人力资源专业人士的工具箱，Human-Centred Artificial Intelligence for Human Resources: A Toolkit for Human Resources Professionals

【人工智能+人力资源】人力资源专业人士的工具箱，Human-Centred Artificial Intelligence for Human Resources: A Toolkit for Human Resources Professionals

专知会员服务

29+阅读 · 2022年2月17日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

116+阅读 · 2020年4月5日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

机器学习与物理科学（Machine learning and the physical sciences），附44页pdf

机器学习与物理科学（Machine learning and the physical sciences），附44页pdf

专知会员服务

51+阅读 · 2019年12月10日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

63+阅读 · 2019年10月26日

特征筛选还在用XGB的Feature Importance？试试Permutation Importance

特征筛选还在用XGB的Feature Importance？试试Permutation Importance

PaperWeekly

0+阅读 · 2022年9月30日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

泡泡机器人SLAM

11+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

低秩张量补全问题的算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

若干新型车间作业排序问题研究

国家自然科学基金

0+阅读 · 2015年12月31日

小客车摇号政策的福利及行为影响分析：以北京为例

国家自然科学基金

1+阅读 · 2013年12月31日

PVT-AW-PCES集成系统耦合运行机理与特性规律研究

国家自然科学基金

0+阅读 · 2013年12月31日

高能物理数据分析的Hadoop/HBASE平台研究

国家自然科学基金

1+阅读 · 2012年12月31日

真空管道高速系统热压耦合生热规律及能耗研究

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于碳排放的多级供应链优化问题的理论与算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

土壤酸度与土壤表面电化学性质之间的互馈关系研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于GPS浮动车数据的城市道路交通信息提取与分析

国家自然科学基金

0+阅读 · 2008年12月31日

Machine Learning for Synthetic Data Generation: A Review

Arxiv

0+阅读 · 2023年5月23日

A first look into the carbon footprint of federated learning

Arxiv

0+阅读 · 2023年5月22日

Is TinyML Sustainable? Assessing the Environmental Impacts of Machine Learning on Microcontrollers

Arxiv

0+阅读 · 2023年5月19日

PS-FedGAN: An Efficient Federated Learning Framework Based on Partially Shared Generative Adversarial Networks For Data Privacy

Arxiv

0+阅读 · 2023年5月19日

Free Lunch for Privacy Preserving Distributed Graph Learning

Arxiv

0+阅读 · 2023年5月19日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

37+阅读 · 2021年8月2日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

A Survey of Deep Learning for Scientific Discovery

A Survey of Deep Learning for Scientific Discovery

Arxiv

29+阅读 · 2020年3月26日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

VIP会员

文章信息

相关主题

分布式机器学习

匿名化技术

最新内容

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

3+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

5+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

6+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

7+阅读 · 6月18日

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

11+阅读 · 6月18日

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

10+阅读 · 6月18日

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

7+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

11+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

7+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

15+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

8+阅读 · 6月17日

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

6+阅读 · 6月17日

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

8+阅读 · 6月17日

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

8+阅读 · 6月17日

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

专知会员服务

10+阅读 · 6月17日

相关VIP内容

【2023新书】实用数据隐私:增强数据的隐私性和安全性，599页pdf

【2023新书】实用数据隐私:增强数据的隐私性和安全性，599页pdf

专知会员服务

83+阅读 · 2023年5月1日

【2022新书】Python数据科学导论，309页pdf

【2022新书】Python数据科学导论，309页pdf

专知会员服务

83+阅读 · 2022年8月6日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

128+阅读 · 2022年4月21日

【Manning2022新书】Python与PySpark的数据分析，458页pdf，Data Analysis with Python and PySpark

【Manning2022新书】Python与PySpark的数据分析，458页pdf，Data Analysis with Python and PySpark

专知会员服务

123+阅读 · 2022年3月20日

【人工智能+人力资源】人力资源专业人士的工具箱，Human-Centred Artificial Intelligence for Human Resources: A Toolkit for Human Resources Professionals

【人工智能+人力资源】人力资源专业人士的工具箱，Human-Centred Artificial Intelligence for Human Resources: A Toolkit for Human Resources Professionals

专知会员服务

29+阅读 · 2022年2月17日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

116+阅读 · 2020年4月5日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

机器学习与物理科学（Machine learning and the physical sciences），附44页pdf

机器学习与物理科学（Machine learning and the physical sciences），附44页pdf

专知会员服务

51+阅读 · 2019年12月10日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

63+阅读 · 2019年10月26日

热门VIP内容

开通专知VIP会员享更多权益服务

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

相关资讯

特征筛选还在用XGB的Feature Importance？试试Permutation Importance

特征筛选还在用XGB的Feature Importance？试试Permutation Importance

PaperWeekly

0+阅读 · 2022年9月30日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

泡泡机器人SLAM

11+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

相关论文

Machine Learning for Synthetic Data Generation: A Review

Arxiv

0+阅读 · 2023年5月23日

A first look into the carbon footprint of federated learning

Arxiv

0+阅读 · 2023年5月22日

Is TinyML Sustainable? Assessing the Environmental Impacts of Machine Learning on Microcontrollers

Arxiv

0+阅读 · 2023年5月19日

PS-FedGAN: An Efficient Federated Learning Framework Based on Partially Shared Generative Adversarial Networks For Data Privacy

Arxiv

0+阅读 · 2023年5月19日

Free Lunch for Privacy Preserving Distributed Graph Learning

Arxiv

0+阅读 · 2023年5月19日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

37+阅读 · 2021年8月2日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

A Survey of Deep Learning for Scientific Discovery

A Survey of Deep Learning for Scientific Discovery

Arxiv

29+阅读 · 2020年3月26日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

相关基金

低秩张量补全问题的算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

若干新型车间作业排序问题研究

国家自然科学基金

0+阅读 · 2015年12月31日

小客车摇号政策的福利及行为影响分析：以北京为例

国家自然科学基金

1+阅读 · 2013年12月31日

PVT-AW-PCES集成系统耦合运行机理与特性规律研究

国家自然科学基金

0+阅读 · 2013年12月31日

高能物理数据分析的Hadoop/HBASE平台研究

国家自然科学基金

1+阅读 · 2012年12月31日

真空管道高速系统热压耦合生热规律及能耗研究

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于碳排放的多级供应链优化问题的理论与算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

土壤酸度与土壤表面电化学性质之间的互馈关系研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于GPS浮动车数据的城市道路交通信息提取与分析

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员