Differential Privacy (DP) for text matured from disjointed word-level substitutions to contiguous sentence-level rewriting by leveraging the generative capacity of language models. While this form of text privatization is best suited for balancing formal privacy guarantees with grammatical coherence, its impact on the register identity of text remains largely unexplored. By conducting a multidimensional stylistic profiling of differentially-private rewriting, we demonstrate that the cost of privacy extends far beyond lexical variation. Specifically, we find that rewriting under privacy constraints induces a systematic functional mutation of the text's communicative signature. This shift is characterized by the severe attrition of interactive markers, contextual references, and complex subordination. By comparing autoregressive paraphrasing against bidirectional substitution across a spectrum of privacy budgets, we observe that both architectures force convergence toward a non-involved and non-persuasive register. This register-blind sanitization effectively preserves semantic content but structurally homogenizes the nuanced stylistic markers that define human-authored discourse.
翻译:差分隐私(DP)在文本领域已从零散的字级替代成熟为利用语言模型生成能力的连续句级重写。尽管这种文本私有化形式在以形式化隐私保障换取语法连贯性方面最为理想,但其对文本语域特征的影响仍鲜有探索。通过多维文体特征分析差分隐私重写过程,我们发现隐私代价远不止词汇层面的变异。具体而言,隐私约束下的重写会诱使文本的交际特征发生系统性功能突变。这种转变表现为互动标记、语境指涉及复杂从属结构的严重衰减。通过对比不同隐私预算下自回归式释义与双向替代策略,我们观察到两种架构均迫使文本向非介入型和非说服型语域收敛。这种无视语域的净化机制虽能有效保留语义内容,却从结构层面均质化了定义人类话语特征的细微文体标记。