Learning with relational and network-structured data is increasingly vital in sensitive domains where protecting the privacy of individual entities is paramount. Differential Privacy (DP) offers a principled approach for quantifying privacy risks, with DP-SGD emerging as a standard mechanism for private model training. However, directly applying DP-SGD to relational learning is challenging due to two key factors: (i) entities often participate in multiple relations, resulting in high and difficult-to-control sensitivity; and (ii) relational learning typically involves multi-stage, potentially coupled (interdependent) sampling procedures that make standard privacy amplification analyses inapplicable. This work presents a principled framework for relational learning with formal entity-level DP guarantees. We provide a rigorous sensitivity analysis and introduce an adaptive gradient clipping scheme that modulates clipping thresholds based on entity occurrence frequency. We also extend the privacy amplification results to a tractable subclass of coupled sampling, where the dependence arises only through sample sizes. These contributions lead to a tailored DP-SGD variant for relational data with provable privacy guarantees. Experiments on fine-tuning text encoders over text-attributed network-structured relational data demonstrate the strong utility-privacy trade-offs of our approach. Our code is available at https://github.com/Graph-COM/Node_DP.
翻译:在敏感领域中,利用关系和网络结构数据进行学习日益重要,保护个体实体的隐私至关重要。差分隐私(DP)为量化隐私风险提供了原则性方法,其中DP-SGD已成为私有模型训练的标准机制。然而,由于两个关键因素,将DP-SGD直接应用于关系学习具有挑战性:(i)实体通常参与多重关系,导致敏感度高且难以控制;(ii)关系学习通常涉及多阶段、可能耦合(相互依赖)的采样过程,使得标准隐私放大分析不再适用。本研究提出了一个具有形式化实体级DP保证的关系学习原则性框架。我们提供了严格的敏感性分析,并引入了一种自适应梯度裁剪方案,该方案根据实体出现频率调整裁剪阈值。我们还将隐私放大结果扩展到一类可处理的耦合采样子类,其中依赖性仅通过样本量产生。这些贡献催生了一种针对关系数据的定制化DP-SGD变体,并具有可证明的隐私保证。在文本属性网络结构关系数据上对文本编码器进行微调的实验表明,我们的方法实现了优异的效用-隐私权衡。代码发布于https://github.com/Graph-COM/Node_DP。