Machine-readable representations of privacy policies are door openers for a broad variety of novel privacy-enhancing and, in particular, transparency-enhancing technologies (TETs). In order to generate such representations, transparency information needs to be extracted from written privacy policies. However, respective manual annotation and extraction processes are laborious and require expert knowledge. Approaches for fully automated annotation, in turn, have so far not succeeded due to overly high error rates in the specific domain of privacy policies. In the end, a lack of properly annotated privacy policies and respective machine-readable representations persists and enduringly hinders the development and establishment of novel technical approaches fostering policy perception and data subject informedness. In this work, we present a prototype system for a `Human-in-the-Loop' approach to privacy policy annotation that integrates ML-generated suggestions and ultimately human annotation decisions. We propose an ML-based suggestion system specifically tailored to the constraint of data scarcity prevalent in the domain of privacy policy annotation. On this basis, we provide meaningful predictions to users thereby streamlining the annotation process. Additionally, we also evaluate our approach through a prototypical implementation to show that our ML-based extraction approach provides superior performance over other recently used extraction models for legal documents.
翻译:隐私政策的机器可读表示为多种新型隐私增强技术,尤其是透明增强技术(TETs)的应用开辟了广阔前景。生成此类表示需从书面隐私政策中提取透明度信息,但相应的人工标注与提取过程不仅耗时耗力,还要求具备专业知识。而完全自动化的标注方法因在隐私政策这一特定领域存在过高错误率,至今未能取得突破。最终,缺乏充分标注的隐私政策及其对应的机器可读表示,持续阻碍着促进政策理解与数据主体知情权的新型技术方案的发展与落地。本文提出了一套基于"人在回路"策略的隐私政策标注原型系统,该系统整合了机器学习生成的建议与人工标注决策。我们设计了一种针对隐私政策标注领域普遍存在的数据稀缺约束的机器学习建议系统,在此基础上为用户提供有意义的预测,从而简化标注流程。通过原型系统实现与评估,我们证明了基于机器学习的抽取方法在法律文档抽取任务中优于近期其他抽取模型。