Modern mobile applications consume large amounts of data to function, raising significant privacy concerns and regulatory challenges. While prior work has primarily focused on detecting compliance gaps through policy analysis, there remains a lack of actionable guidance for developers to implement privacy principles at the code level. In this paper, we focus on data minimization as a developer-operationalizable principle and investigate its realization in Android applications. We conduct a formative study on 1,114 open-source Android apps to identify ten recurring data minimization scenarios across five data-handling stages. Building on this, we perform a large-scale analysis of 9,875 real-world APKs and distill 31 actionable coding guidelines to support privacy-compliant development. We further examine LLM-based code generation in Android development and find that state-of-the-art models consistently reproduce data minimization-risky practices, indicating that they inherit and amplify patterns from real-world code. Encouragingly, incorporating our guidelines eliminates these issues across all evaluated models. Our work advocates a shift toward responding to privacy regulatory requirements at their code-level root causes, enabling better compliance in both human and AI-assisted programming.
翻译:现代移动应用为发挥功能而消耗大量数据,引发了严重的隐私担忧和监管挑战。虽然以往工作主要通过政策分析来检测合规性差距,但仍缺乏可操作的指导来帮助开发者在代码层面实现隐私原则。本文聚焦数据最小化这一可开发者操作化的原则,研究其在安卓应用中的实现。我们对1,114个开源安卓应用进行形成性研究,识别出跨五个数据处理阶段的十个重复出现的数据最小化场景。基于此,我们对9,875个真实世界的APK进行大规模分析,提炼出31条可操作的编码指南以支持隐私合规开发。我们进一步考察了安卓开发中基于大语言模型的代码生成,发现最先进的模型持续复现数据最小化风险做法,表明它们继承并放大了真实世界代码中的模式。令人鼓舞的是,纳入我们的指南后,所有被评估模型的这些问题均被消除。我们的工作倡导转向在代码级根源上响应隐私监管要求,从而在人工与AI辅助编程中实现更好的合规性。