The availability of powerful open-source large language models (LLMs) opens exciting use-cases, such as using personal data to fine-tune these models to imitate a user's unique writing style. Two key requirements for such assistants are personalization - in the sense that the assistant should recognizably reflect the user's own writing style - and privacy - users may justifiably be wary of uploading extremely personal data, such as their email archive, to a third-party service. In this paper, we present a new design and evaluation for such an automated assistant, for the specific use case of email generation, which we call Panza. Panza's personalization features are based on a combination of fine-tuning using a variant of the Reverse Instructions technique together with Retrieval-Augmented Generation (RAG). We demonstrate that this combination allows us to fine-tune an LLM to reflect a user's writing style using limited data, while executing on extremely limited resources, e.g. on a free Google Colab instance. Our key methodological contribution is the first detailed study of evaluation metrics for this personalized writing task, and of how different choices of system components--the use of RAG and of different fine-tuning approaches-impact the system's performance. Additionally, we demonstrate that very little data - under 100 email samples - are sufficient to create models that convincingly imitate humans. This finding showcases a previously-unknown attack vector in language models - that access to a small number of writing samples can allow a bad actor to cheaply create generative models that imitate a target's writing style. We are releasing the full Panza code as well as three new email datasets licensed for research use at https://github.com/IST-DASLab/PanzaMail.
翻译:强大开源大语言模型(LLMs)的出现开启了激动人心的应用场景,例如利用个人数据对这些模型进行微调以模仿用户独特的写作风格。此类助手需满足两个关键要求:个性化(即助手应能可识别地反映用户自身的写作风格)与隐私性(用户有充分理由警惕将高度个人化的数据——如邮件存档——上传至第三方服务)。本文针对邮件生成这一具体应用场景,提出了一种新型自动化助手的设计与评估方案,我们将其命名为Panza。Panza的个性化功能基于反向指令技术变体与检索增强生成(RAG)相结合的微调方法。我们证明该组合方案能够利用有限数据在极低计算资源(例如免费版Google Colab实例)上微调LLM以反映用户写作风格。我们的核心方法论贡献在于首次对该个性化写作任务的评估指标进行了系统研究,并深入分析了不同系统组件(RAG的使用及不同微调方法的选择)如何影响系统性能。此外,我们证实仅需极少数据——不足100封邮件样本——即可构建出令人信服的人类模仿模型。这一发现揭示了语言模型中先前未知的攻击向量:获取少量写作样本即可使恶意行为者低成本创建模仿目标写作风格的生成模型。我们已在https://github.com/IST-DASLab/PanzaMail发布完整的Panza代码及三个授权研究使用的新邮件数据集。