A Human-in-the-Loop Approach for Information Extraction from Privacy Policies under Data Scarcity - 专知论文

会员服务 ·

0

INFORMS · 信息抽取 · prototype · Processing（编程语言） · Automator ·

2023 年 5 月 24 日

A Human-in-the-Loop Approach for Information Extraction from Privacy Policies under Data Scarcity

翻译：数据稀缺条件下基于人机协同的隐私政策信息抽取方法

Michael Gebauer,Faraz Mashhur,Nicola Leschke,Elias Grünewald,Frank Pallas

from arxiv, Accepted for 2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&P)

Machine-readable representations of privacy policies are door openers for a broad variety of novel privacy-enhancing and, in particular, transparency-enhancing technologies (TETs). In order to generate such representations, transparency information needs to be extracted from written privacy policies. However, respective manual annotation and extraction processes are laborious and require expert knowledge. Approaches for fully automated annotation, in turn, have so far not succeeded due to overly high error rates in the specific domain of privacy policies. In the end, a lack of properly annotated privacy policies and respective machine-readable representations persists and enduringly hinders the development and establishment of novel technical approaches fostering policy perception and data subject informedness. In this work, we present a prototype system for a `Human-in-the-Loop' approach to privacy policy annotation that integrates ML-generated suggestions and ultimately human annotation decisions. We propose an ML-based suggestion system specifically tailored to the constraint of data scarcity prevalent in the domain of privacy policy annotation. On this basis, we provide meaningful predictions to users thereby streamlining the annotation process. Additionally, we also evaluate our approach through a prototypical implementation to show that our ML-based extraction approach provides superior performance over other recently used extraction models for legal documents.

翻译：可机读表示的隐私政策为新兴的隐私增强技术，尤其是透明度增强技术（TETs）的广泛应用打开了大门。为生成此类表示，需从书面隐私政策中提取透明度信息。然而，相应的人工标注与提取过程不仅费时费力，还需领域专家知识。而全自动标注方法因在隐私政策这一特定领域存在过高错误率，迄今未能成功。最终，缺乏充分标注的隐私政策及其对应的可机读表示持续存在，制约了促进政策理解与数据主体知情权的新型技术方法的发展与确立。本文提出一种基于"人在回路"的隐私政策标注原型系统，该系统融合了机器学习生成的建议与最终的人工标注决策。我们设计了一套专为隐私政策标注领域普遍存在的数据稀缺约束定制的机器学习建议系统。在此基础上，我们为用户提供有意义的预测，从而简化标注流程。此外，我们通过原型实现评估了该方法，结果表明，与近期其他用于法律文档的提取模型相比，本文提出的基于机器学习的提取方法具有更优性能。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【干货书】机器学习练习册，211页pdf，Exercises in Machine Learning

【干货书】机器学习练习册，211页pdf，Exercises in Machine Learning

专知会员服务

112+阅读 · 2022年10月5日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

【干货书】机器学习练习册，211页pdf，Exercises in Machine Learning

【干货书】机器学习练习册，211页pdf，Exercises in Machine Learning

专知

4+阅读 · 2022年10月5日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

基于下丘脑弓状核-外侧隔核ghrelin神经通路探讨腹部推拿对摄食影响的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

LOC283683-NIPA1-BMPRII途径对胆固醇平衡和动脉粥样硬化的影响及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

催化型氮杂Wittig反应合成多取代杂环的新方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于机器人嗅觉的搜寻定位技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

大型天文望远镜状态监控与故障诊断技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

过渡金属催化的芳炔反应研究

国家自然科学基金

0+阅读 · 2008年12月31日

Fast Rates for the Regret of Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年7月12日

A Mixed Reality System for Interaction with Heterogeneous Robotic Systems

Arxiv

0+阅读 · 2023年7月12日

Reactive and human-in-the-loop planning and control of multi-robot systems under LTL specifications in dynamic environments

Arxiv

0+阅读 · 2023年7月12日

FAIRO: Fairness-aware Adaptation in Sequential-Decision Making for Human-in-the-Loop Systems

Arxiv

0+阅读 · 2023年7月12日

Programmable Synthetic Tabular Data Generation

Arxiv

0+阅读 · 2023年7月10日

An Examination of Wearable Sensors and Video Data Capture for Human Exercise Classification

Arxiv

0+阅读 · 2023年7月10日

Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data

Arxiv

0+阅读 · 2023年7月10日

MentalHealthAI: Utilizing Personal Health Device Data to Optimize Psychiatry Treatment

Arxiv

0+阅读 · 2023年7月9日

Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control

Arxiv

0+阅读 · 2023年7月9日

Zero-Shot Transfer Learning for Event Extraction

Arxiv

10+阅读 · 2017年7月4日

VIP会员

文章信息

相关主题

Processing（编程语言）

最新内容

《无人机对海面作战影响评估》

《无人机对海面作战影响评估》

专知会员服务

7+阅读 · 7月21日

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

专知会员服务

8+阅读 · 7月21日

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

专知会员服务

2+阅读 · 7月21日

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

专知会员服务

4+阅读 · 7月21日

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

专知会员服务

6+阅读 · 7月21日

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

5+阅读 · 7月20日

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

专知会员服务

7+阅读 · 7月20日

美空军AI完成F-16战斗机自主空战历史性试飞

美空军AI完成F-16战斗机自主空战历史性试飞

专知会员服务

6+阅读 · 7月20日

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

专知会员服务

8+阅读 · 7月20日

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

专知会员服务

6+阅读 · 7月20日

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

专知会员服务

8+阅读 · 7月20日

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

专知会员服务

8+阅读 · 7月20日

深入Project Maven：为何人工智能在战场上依然失灵

深入Project Maven：为何人工智能在战场上依然失灵

专知会员服务

15+阅读 · 7月19日

锻造未来士兵：外骨骼、基因工程与赛博格

锻造未来士兵：外骨骼、基因工程与赛博格

专知会员服务

7+阅读 · 7月19日

《无人机系统（UAS）通信网状网络试验性部署》50页报告

《无人机系统（UAS）通信网状网络试验性部署》50页报告

专知会员服务

10+阅读 · 7月19日

相关VIP内容

【干货书】机器学习练习册，211页pdf，Exercises in Machine Learning

【干货书】机器学习练习册，211页pdf，Exercises in Machine Learning

专知会员服务

112+阅读 · 2022年10月5日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

《无人机对海面作战影响评估》

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

【干货书】机器学习练习册，211页pdf，Exercises in Machine Learning

【干货书】机器学习练习册，211页pdf，Exercises in Machine Learning

专知

4+阅读 · 2022年10月5日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Fast Rates for the Regret of Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年7月12日

A Mixed Reality System for Interaction with Heterogeneous Robotic Systems

Arxiv

0+阅读 · 2023年7月12日

Reactive and human-in-the-loop planning and control of multi-robot systems under LTL specifications in dynamic environments

Arxiv

0+阅读 · 2023年7月12日

FAIRO: Fairness-aware Adaptation in Sequential-Decision Making for Human-in-the-Loop Systems

Arxiv

0+阅读 · 2023年7月12日

Programmable Synthetic Tabular Data Generation

Arxiv

0+阅读 · 2023年7月10日

An Examination of Wearable Sensors and Video Data Capture for Human Exercise Classification

Arxiv

0+阅读 · 2023年7月10日

Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data

Arxiv

0+阅读 · 2023年7月10日

MentalHealthAI: Utilizing Personal Health Device Data to Optimize Psychiatry Treatment

Arxiv

0+阅读 · 2023年7月9日

Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control

Arxiv

0+阅读 · 2023年7月9日

Zero-Shot Transfer Learning for Event Extraction

Arxiv

10+阅读 · 2017年7月4日

相关基金

基于下丘脑弓状核-外侧隔核ghrelin神经通路探讨腹部推拿对摄食影响的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

LOC283683-NIPA1-BMPRII途径对胆固醇平衡和动脉粥样硬化的影响及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

催化型氮杂Wittig反应合成多取代杂环的新方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于机器人嗅觉的搜寻定位技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

大型天文望远镜状态监控与故障诊断技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

过渡金属催化的芳炔反应研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员