We propose PRISM to enable users of machine translation systems to preserve the privacy of data on their own initiative. There is a growing demand to apply machine translation systems to data that require privacy protection. While several machine translation engines claim to prioritize privacy, the extent and specifics of such protection are largely ambiguous. First, there is often a lack of clarity on how and to what degree the data is protected. Even if service providers believe they have sufficient safeguards in place, sophisticated adversaries might still extract sensitive information. Second, vulnerabilities may exist outside of these protective measures, such as within communication channels, potentially leading to data leakage. As a result, users are hesitant to utilize machine translation engines for data demanding high levels of privacy protection, thereby missing out on their benefits. PRISM resolves this problem. Instead of relying on the translation service to keep data safe, PRISM provides the means to protect data on the user's side. This approach ensures that even machine translation engines with inadequate privacy measures can be used securely. For platforms already equipped with privacy safeguards, PRISM acts as an additional protection layer, reinforcing their security furthermore. PRISM adds these privacy features without significantly compromising translation accuracy. Our experiments demonstrate the effectiveness of PRISM using real-world translators, T5 and ChatGPT (GPT-3.5-turbo), and the datasets with two languages. PRISM effectively balances privacy protection with translation accuracy.
翻译:我们提出PRISM框架,使机器翻译系统用户能够主动保护数据隐私。当前,将机器翻译系统应用于需要隐私保护的数据的需求日益增长。尽管部分机器翻译引擎声称优先考虑隐私,但这类保护的实际范围与具体措施仍存在较大模糊性。首先,数据受保护的程度与实现方式往往缺乏明确说明。即便服务提供商自认已部署充分防护措施,高级攻击者仍可能提取敏感信息。其次,防护机制之外的通信链路等环节可能存在脆弱性,导致数据泄露风险。这使得用户对使用机器翻译引擎处理高隐私需求数据持谨慎态度,从而无法享受其带来的便利。PRISM解决了这一问题。该框架不依赖翻译服务保障数据安全,而是提供用户侧数据保护方案。这种方式即使隐私保护措施不足的机器翻译引擎也能安全使用。对于已配备隐私防护功能的平台,PRISM可作为附加安全层强化其防护能力。在确保翻译质量几乎不受影响的前提下,PRISM实现了隐私增强。通过使用T5和ChatGPT(GPT-3.5-turbo)两种真实翻译器及双语数据集进行的实验,验证了PRISM的有效性。该框架在隐私保护与翻译准确性之间取得了良好平衡。