Information Disguise (ID), a part of computational ethics in Natural Language Processing (NLP), is concerned with best practices of textual paraphrasing to prevent the non-consensual use of authors' posts on the Internet. Research on ID becomes important when authors' written online communication pertains to sensitive domains, e.g., mental health. Over time, researchers have utilized AI-based automated word spinners (e.g., SpinRewriter, WordAI) for paraphrasing content. However, these tools fail to satisfy the purpose of ID as their paraphrased content still leads to the source when queried on search engines. There is limited prior work on judging the effectiveness of paraphrasing methods for ID on search engines or their proxies, neural retriever (NeurIR) models. We propose a framework where, for a given sentence from an author's post, we perform iterative perturbation on the sentence in the direction of paraphrasing with an attempt to confuse the search mechanism of a NeurIR system when the sentence is queried on it. Our experiments involve the subreddit 'r/AmItheAsshole' as the source of public content and Dense Passage Retriever as a NeurIR system-based proxy for search engines. Our work introduces a novel method of phrase-importance rankings using perplexity scores and involves multi-level phrase substitutions via beam search. Our multi-phrase substitution scheme succeeds in disguising sentences 82% of the time and hence takes an essential step towards enabling researchers to disguise sensitive content effectively before making it public. We also release the code of our approach.
翻译:信息伪装(ID)是自然语言处理(NLP)中计算伦理学的一部分,旨在通过文本释义的最佳实践,防止作者在互联网上发布的内容被未经授权使用。当作者在网上的书面交流涉及敏感领域(如心理健康)时,信息伪装研究变得尤为重要。随着时间推移,研究人员利用基于AI的自动词语替换工具(如SpinRewriter、WordAI)进行内容释义。然而,这些工具无法满足信息伪装的目的,因为其释义后的内容在被搜索引擎查询时仍会指向原始来源。目前,关于释义方法在搜索引擎或其代理——神经检索(NeurIR)模型——上的有效性评估研究十分有限。我们提出一个框架:针对作者某条帖子中的给定句子,通过迭代扰动向释义方向推进,试图混淆神经检索系统在查询该句子时的搜索机制。实验中,我们以子版块"r/AmItheAsshole"作为公开内容来源,并以稠密段落检索器作为基于神经检索系统的搜索引擎代理。本研究引入了一种基于困惑度分数的短语重要性排序新方法,并通过束搜索实现多层级短语替换。我们的多短语替换方案成功伪装句子的概率达82%,这为研究人员在公开敏感内容前有效伪装其文本迈出了关键一步。我们还公开了本方法的代码。