Auditing the information leakage of latent sensitive features during the transborder data flow has attracted sufficient attention from global digital regulators. However, there is missing a technical approach for the audit practice due to two technical challenges. Firstly, there is a lack of theory and tools for measuring the information of sensitive latent features in a dataset. Secondly, the transborder data flow involves multi-stakeholders with diverse interests, which means the audit must be trustless. Despite the tremendous efforts in protecting data privacy, an important issue that has long been neglected is that the transmitted data in data flows can leak other regulated information that is not explicitly contained in the data, leading to unaware information leakage risks. To unveil such risks trustfully before the actual data transfer, we propose FIAT, a Fine-grained Information Audit system for Trustless transborder data flow. In FIAT, we use a learning approach to quantify the amount of information leakage, while the technologies of zero-knowledge proof and smart contracts are applied to provide trustworthy and privacy-preserving auditing results. Experiments show that large information leakage can boost the predictability of uninvolved information using simple machine-learning models, revealing the importance of information auditing. Further performance benchmarking also validates the efficiency and scalability of the FIAT auditing system.
翻译:跨境数据流中潜在敏感特征的信息泄露审计已引起全球数字监管机构的广泛关注。然而,由于两项技术挑战,审计实践中缺少相应的技术手段。首先,缺乏衡量数据集中敏感潜在特征信息的理论与工具。其次,跨境数据流涉及利益多元化的多方利益相关者,这意味着审计必须实现无信任。尽管在数据隐私保护方面付出了巨大努力,但长期被忽视的一个重要问题是:数据流中传输的数据可能泄露其他未明确包含在数据中的受监管信息,从而导致无意识的信息泄露风险。为了在实际数据传输前可信地揭示此类风险,我们提出了FIAT——一种面向无信任跨境数据流的细粒度信息审计系统。在FIAT中,我们采用学习方法量化信息泄露量,同时应用零知识证明和智能合约技术提供可信且保护隐私的审计结果。实验表明,大量信息泄露可通过简单机器学习模型提升对无关信息的可预测性,揭示了信息审计的重要性。进一步性能基准测试也验证了FIAT审计系统的效率与可扩展性。