Machine-readable representations of privacy policies are door openers for a broad variety of novel privacy-enhancing and, in particular, transparency-enhancing technologies (TETs). In order to generate such representations, transparency information needs to be extracted from written privacy policies. However, respective manual annotation and extraction processes are laborious and require expert knowledge. Approaches for fully automated annotation, in turn, have so far not succeeded due to overly high error rates in the specific domain of privacy policies. In the end, a lack of properly annotated privacy policies and respective machine-readable representations persists and enduringly hinders the development and establishment of novel technical approaches fostering policy perception and data subject informedness. In this work, we present a prototype system for a `Human-in-the-Loop' approach to privacy policy annotation that integrates ML-generated suggestions and ultimately human annotation decisions. We propose an ML-based suggestion system specifically tailored to the constraint of data scarcity prevalent in the domain of privacy policy annotation. On this basis, we provide meaningful predictions to users thereby streamlining the annotation process. Additionally, we also evaluate our approach through a prototypical implementation to show that our ML-based extraction approach provides superior performance over other recently used extraction models for legal documents.
翻译:可机读表示的隐私政策为新兴的隐私增强技术,尤其是透明度增强技术(TETs)的广泛应用打开了大门。为生成此类表示,需从书面隐私政策中提取透明度信息。然而,相应的人工标注与提取过程不仅费时费力,还需领域专家知识。而全自动标注方法因在隐私政策这一特定领域存在过高错误率,迄今未能成功。最终,缺乏充分标注的隐私政策及其对应的可机读表示持续存在,制约了促进政策理解与数据主体知情权的新型技术方法的发展与确立。本文提出一种基于"人在回路"的隐私政策标注原型系统,该系统融合了机器学习生成的建议与最终的人工标注决策。我们设计了一套专为隐私政策标注领域普遍存在的数据稀缺约束定制的机器学习建议系统。在此基础上,我们为用户提供有意义的预测,从而简化标注流程。此外,我们通过原型实现评估了该方法,结果表明,与近期其他用于法律文档的提取模型相比,本文提出的基于机器学习的提取方法具有更优性能。