Recent literature has seen a considerable uptick in $\textit{Differentially Private Natural Language Processing}$ (DP NLP). This includes DP text privatization, where potentially sensitive input texts are transformed under DP to achieve privatized output texts that ideally mask sensitive information $\textit{and}$ maintain original semantics. Despite continued work to address the open challenges in DP text privatization, there remains a scarcity of work addressing user perceptions of this technology, a crucial aspect which serves as the final barrier to practical adoption. In this work, we conduct a survey study with 721 laypersons around the globe, investigating how the factors of $\textit{scenario}$, $\textit{data sensitivity}$, $\textit{mechanism type}$, and $\textit{reason for data collection}$ impact user preferences for text privatization. We learn that while all these factors play a role in influencing privacy decisions, users are highly sensitive to the utility and coherence of the private output texts. Our findings highlight the socio-technical factors that must be considered in the study of DP NLP, opening the door to further user-based investigations going forward.
翻译:近期文献中,$\textit{差分隐私自然语言处理}$(DP NLP)的研究显著增加。这包括DP文本私有化,即对可能敏感的输入文本进行差分隐私变换,以产生理想情况下既能掩盖敏感信息$\textit{又}$能保持原始语义的私有化输出文本。尽管已有持续工作致力于解决DP文本私有化中的开放挑战,但关注用户对该技术感知的研究仍然匮乏,而这一关键方面是技术实际应用的最后障碍。在本研究中,我们对全球721名非专业人士进行了调查研究,探究$\textit{场景}$、$\textit{数据敏感性}$、$\textit{机制类型}$和$\textit{数据收集原因}$等因素如何影响用户对文本私有化的偏好。我们发现,尽管所有这些因素都在影响隐私决策中发挥作用,但用户对私有化输出文本的效用和连贯性高度敏感。我们的研究结果突显了在DP NLP研究中必须考虑的社会技术因素,为未来进一步的用户导向研究打开了大门。