Using language models as a remote service entails sending private information to an untrusted provider. In addition, potential eavesdroppers can intercept the messages, thereby exposing the information. In this work, we explore the prospects of avoiding such data exposure at the level of text manipulation. We focus on text classification models, examining various token mapping and contextualized manipulation functions in order to see whether classifier accuracy may be maintained while keeping the original text unrecoverable. We find that although some token mapping functions are easy and straightforward to implement, they heavily influence performance on the downstream task, and via a sophisticated attacker can be reconstructed. In comparison, the contextualized manipulation provides an improvement in performance.
翻译:将语言模型作为远程服务使用,意味着需要向不可信的提供商发送私人信息。此外,潜在的窃听者可能拦截这些消息,从而导致信息泄露。在本研究中,我们探索了在文本操作层面避免此类数据暴露的可能性。我们专注于文本分类模型,研究了多种词元映射和上下文感知操作函数,以探究在保持分类器准确性的同时,能否使原始文本无法被恢复。研究发现,尽管某些词元映射函数易于实现且操作直接,但它们会严重影响下游任务的性能,并且可能被复杂的攻击者重构。相比之下,上下文感知操作在性能上有所提升。