When distributed agents exchange text across organizational boundaries, privacy leakage arises not only from explicit identifiers but also from distributional signatures such as formatting conventions, vocabulary choices, and syntactic patterns. We propose DiSan(Disentangled Sanitization), a privacy-preserving sanitization framework and a built-in component of Intern-Shannon for multi-agent collaboration. DiSan uses a two-stream encoder to factorize text into a source-invariant role subspace that preserves task semantics and a source-identifying style subspace that remains local. Federated proto-type alignment and adversarial regularization enable joint training without centralizing raw text. Experiments show that identifier-level masking is insufficient: masking 19.2% of tokens reduces TF-IDF stylometric attribution by only 18.6%. By contrast, DiSan reduces answer-level PII exposure by 20 times while maintaining 83% answer faithfulness on a distributed multi-agent RAG benchmark, and lowers Enron stylometric attribution by 73.2% under TF-IDF and 70.6% under a neural probe.
翻译:当分布式智能体跨组织边界交换文本时,隐私泄露不仅源于显式标识符,还源自分布特征——如格式惯例、词汇选择与句法模式。我们提出DiSan(解耦净化),一种面向多智能体协作的隐私保护净化框架,作为Intern-Shannon的内置组件。DiSan采用双流编码器将文本分解为保留任务语义的源不变角色子空间与保持本地的源识别风格子空间。联邦原型对齐与对抗正则化使得无需集中原始文本即可实现联合训练。实验表明,标识符级掩蔽不足:掩蔽19.2%的token仅使TF-IDF文体归因降低18.6%。相比之下,在分布式多智能体RAG基准测试中,DiSan将答案级PII暴露降低20倍,同时保持83%的答案忠实度,并在Enron数据集上使TF-IDF文体归因降低73.2%、神经探针归因降低70.6%。