Synthetic Data is increasingly important in financial applications. In addition to the benefits it provides, such as improved financial modeling and better testing procedures, it poses privacy risks as well. Such data may arise from client information, business information, or other proprietary sources that must be protected. Even though the process by which Synthetic Data is generated serves to obscure the original data to some degree, the extent to which privacy is preserved is hard to assess. Accordingly, we introduce a hierarchy of ``levels'' of privacy that are useful for categorizing Synthetic Data generation methods and the progressively improved protections they offer. While the six levels were devised in the context of financial applications, they may also be appropriate for other industries as well. Our paper includes: A brief overview of Financial Synthetic Data, how it can be used, how its value can be assessed, privacy risks, and privacy attacks. We close with details of the ``Six Levels'' that include defenses against those attacks.
翻译:合成数据在金融应用中日益重要。除了改善金融建模和优化测试流程等益处外,它也存在隐私风险。此类数据可能来源于客户信息、商业信息或其他需要保护的专有数据源。尽管合成数据的生成过程在一定程度上掩盖了原始数据,但其隐私保护程度难以评估。为此,我们提出了一种分层式的“隐私层级”体系,用于对合成数据生成方法及其逐步提升的保护能力进行分类。虽然这六个层级是针对金融应用场景设计的,但它们同样适用于其他行业。本文内容包括:金融合成数据概述、其应用方式、价值评估方法、隐私风险及隐私攻击。最后,我们详细阐述了包含针对这些攻击的防御措施的“六个层级”。