Financial institutions face tension between maximizing data utility and mitigating the re-identification risks inherent in traditional anonymization methods. This paper explores Differentially Private (DP) synthetic data as a robust "Privacy by Design" framework to resolve this conflict, ensuring output privacy while satisfying stringent regulatory obligations. We examine two distinct generative paradigms: Direct Tabular Synthesis, which reconstructs high-fidelity joint distributions from raw data, and DP-Seeded Agent-Based Modeling (ABM), which uses DP-protected aggregates to parameterize complex, stateful simulations. While tabular synthesis excels at reflecting static historical correlations for QA testing and business analytics, the DP-Seeded ABM offers a forward-looking "counterfactual laboratory" capable of modeling dynamic market behaviors and black swan events. By decoupling individual identities from data utility, these methodologies eliminate traditional data-clearing bottlenecks, enabling seamless cross-institutional research and compliant decision-making in an evolving regulatory landscape.
翻译:金融机构面临在最大化数据效用与缓解传统匿名化方法固有重识别风险之间的张力。本文探讨了差分隐私合成数据作为一种稳健的"隐私设计"框架,用以解决这一冲突,在确保输出隐私的同时满足严格的监管要求。我们研究了两种不同的生成范式:直接表格合成(Direct Tabular Synthesis),该范式从原始数据中重构高保真联合分布;以及基于差分隐私种子的智能体建模(DP-Seeded ABM),该方法利用受差分隐私保护的聚合数据对复杂、有状态的仿真进行参数化。表格合成在反映用于质量保证测试和商业分析的静态历史相关性方面表现出色,而基于差分隐私种子的智能体建模则提供了一个前瞻性的"反事实实验室",能够模拟动态市场行为和黑天鹅事件。通过将个体身份与数据效用解耦,这些方法消除了传统数据清理瓶颈,使得在不断演进的监管环境中实现无缝的跨机构研究和合规决策成为可能。