PolicyGapper: Automated Detection of Inconsistencies Between Google Play Data Safety Sections and Privacy Policies Using LLMs

Mobile application developers are required to disclose how they collect, use, and share user data in compliance with privacy regulations. To support transparency, major app marketplaces have introduced standardized disclosure mechanisms. In 2022, Google mandated the Data Safety Section (DSS) on Google Play, requiring developers to summarize their data practices. However, compiling accurate DSS disclosures is challenging, as they must remain consistent with the corresponding privacy policy (PP), and no automated tool currently verifies this alignment. Prior studies indicate that nearly 80% of popular apps contain incomplete or misleading DSS declarations. We present PolicyGapper, an LLM-based methodology for automatically detecting discrepancies between DSS disclosures and privacy policies. PolicyGapper operates in four stages: scraping, pre-processing, analysis, and post-processing, without requiring access to application binaries. We evaluate PolicyGapper on a dataset of 330 top-ranked apps spanning all 33 Google Play categories, collected in Q3 2025. The approach identifies 2,689 omitted disclosures, including 2,040 related to data collection and 649 to data sharing. Manual validation on a stratified 10% subset, repeated across three independent runs, yields an average Precision of 0.75, Recall of 0.77, Accuracy of 0.69, and F1-score of 0.76. To support reproducibility, we release a complete replication package, including the dataset, prompts, source code, and results available at https://github.com/Mobile-IoT-Security-Lab/PolicyGapper and https://doi.org/10.5281/zenodo.19628493.

翻译：移动应用开发者需要按照隐私法规披露其如何收集、使用和共享用户数据。为提升透明度，主流应用市场引入了标准化的披露机制。2022年，谷歌要求在Google Play中提供数据安全部分（DSS），要求开发者总结其数据处理实践。然而，编制准确的DSS披露信息具有挑战性，因为其必须与相应的隐私策略（PP）保持一致，而目前尚无自动化工具验证这种一致性。已有研究表明，近80%的热门应用包含不完整或具有误导性的DSS声明。我们提出PolicyGapper，一种基于大语言模型的方法，用于自动检测DSS披露信息与隐私策略之间的差异。PolicyGapper包含四个阶段：数据抓取、预处理、分析和后处理，无需访问应用二进制文件。我们在2025年第三季度收集的覆盖所有33个Google Play类别的330个顶级应用数据集上评估了PolicyGapper。该方法识别出2,689个遗漏的披露项，其中2,040个与数据收集相关，649个与数据共享相关。在分层抽取的10%子集上，经过三次独立重复运行的人工验证得出，平均精确率为0.75，召回率为0.77，准确率为0.69，F1分数为0.76。为支持可重复性，我们发布了完整的复现包，包括数据集、提示词、源代码和结果，可在https://github.com/Mobile-IoT-Security-Lab/PolicyGapper 和 https://doi.org/10.5281/zenodo.19628493 获取。