Using language models as a remote service entails sending private information to an untrusted provider. In addition, potential eavesdroppers can intercept the messages, thereby exposing the information. In this work, we explore the prospects of avoiding such data exposure at the level of text manipulation. We focus on text classification models, examining various token mapping and contextualized manipulation functions in order to see whether classifier accuracy may be maintained while keeping the original text unrecoverable. We find that although some token mapping functions are easy and straightforward to implement, they heavily influence performance on the downstream task, and via a sophisticated attacker can be reconstructed. In comparison, the contextualized manipulation provides an improvement in performance.
翻译:将语言模型作为远程服务使用,意味着需要向不可信的提供商发送私有信息。此外,潜在的窃听者可能拦截这些消息,从而导致信息泄露。在本工作中,我们探索在文本操作层面避免此类数据暴露的可能性。我们聚焦于文本分类模型,考察了多种词元映射和上下文感知的操纵函数,以探究是否能在保持原始文本不可恢复的同时维持分类器的准确性。我们发现,尽管某些词元映射函数易于实现且操作直接,但它们会严重影响下游任务的性能,并且可能被复杂的攻击者重构。相比之下,上下文感知的操纵方法在性能上有所提升。