PoliGraph: Automated Privacy Policy Analysis using Knowledge Graphs

from arxiv, 24 pages, 15 figures (including subfigures), 9 tables. This is the extended version of the paper with the same title published at USENIX Security '23

Privacy policies disclose how an organization collects and handles personal information. Recent work has made progress in leveraging natural language processing (NLP) to automate privacy policy analysis and extract data collection statements from different sentences, considered in isolation from each other. In this paper, we view and analyze, for the first time, the entire text of a privacy policy in an integrated way. In terms of methodology: (1) we define PoliGraph, a type of knowledge graph that captures statements in a privacy policy as relations between different parts of the text; and (2) we develop an NLP-based tool, PoliGraph-er, to automatically extract PoliGraph from the text. In addition, (3) we revisit the notion of ontologies, previously defined in heuristic ways, to capture subsumption relations between terms. We make a clear distinction between local and global ontologies to capture the context of individual privacy policies, application domains, and privacy laws. Using a public dataset for evaluation, we show that PoliGraph-er identifies 40% more collection statements than prior state-of-the-art, with 97% precision. In terms of applications, PoliGraph enables automated analysis of a corpus of privacy policies and allows us to: (1) reveal common patterns in the texts across different privacy policies, and (2) assess the correctness of the terms as defined within a privacy policy. We also apply PoliGraph to: (3) detect contradictions in a privacy policy, where we show false alarms by prior work, and (4) analyze the consistency of privacy policies and network traffic, where we identify significantly more clear disclosures than prior work.

翻译：隐私政策披露了组织如何收集和处理个人信息。近期研究在利用自然语言处理（NLP）自动化分析隐私政策、从孤立考虑的独立句子中提取数据收集陈述方面取得了进展。本文首次以整体方式对隐私政策全文进行综合审视与分析。方法论层面：（1）我们定义了PoliGraph——一种将隐私政策中陈述捕捉为文本不同部分之间关系的知识图谱类型；（2）开发了基于NLP的工具PoliGraph-er，用于从文本中自动提取PoliGraph。此外，（3）我们重新审视了此前以启发式方式定义的本体概念，以捕获术语之间的包含关系，并明确区分局部本体与全局本体，以涵盖个体隐私政策、应用领域及隐私法律的上下文。基于公开数据集的评估显示，PoliGraph-er比现有最优方法多识别出40%的收集陈述，且精确率达97%。在应用层面，PoliGraph实现了隐私政策语料库的自动化分析，可支持：（1）揭示不同隐私政策文本中的共通模式；（2）评估隐私政策中定义的术语正确性。我们还应用PoliGraph实现了：（3）检测隐私政策中的矛盾——发现了此前研究的误报；（4）分析隐私政策与网络流量的一致性——识别出比先前工作显著更多的明确披露内容。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日