The availability of powerful open-source large language models (LLMs) opens exciting use cases, such as automated personal assistants that adapt to the user's unique data and demands. Two key requirements for such assistants are personalization - in the sense that the assistant should reflect the user's own writing style - and privacy - users may prefer to always store their personal data locally, on their own computing device. In this application paper, we present a new design and evaluation for such an automated assistant, for the specific use case of email generation, which we call Panza. Specifically, Panza can be trained and deployed locally on commodity hardware, and is personalized to the user's writing style. Panza's personalization features are based on a combination of fine-tuning using a variant of the Reverse Instructions technique together with Retrieval-Augmented Generation (RAG). We demonstrate that this combination allows us to fine-tune an LLM to better reflect a user's writing style using limited data, while executing on extremely limited resources, e.g. on a free Google Colab instance. Our key methodological contribution is what we believe to be the first detailed study of evaluation metrics for this personalized writing task, and of how different choices of system components - e.g. the use of RAG and of different fine-tuning approaches - impact the system's performance. We also perform an ablation study showing that less than 100 emails are generally sufficient to produce a credible Panza model. We are releasing the full Panza code as well as a new "David" personalized email dataset licensed for research use, both available on https://github.com/IST-DASLab/PanzaMail.
翻译:强大开源大语言模型(LLMs)的出现开启了令人兴奋的应用场景,例如能够适应用户独特数据和需求的自动化个人助手。此类助手的两个关键需求是个性化(即助手应反映用户自身的写作风格)与隐私性(用户可能更倾向于始终将个人数据存储在本地设备上)。在本应用论文中,我们针对电子邮件生成这一具体应用场景,提出了一种新型自动化助手的设计与评估方案,并将其命名为Panza。具体而言,Panza可在消费级硬件上完成本地训练与部署,并能适应用户的写作风格进行个性化适配。Panza的个性化功能基于反向指令变体技术与检索增强生成(RAG)的结合微调实现。我们证明,这种组合方式能够在有限数据条件下,使LLM更好地反映用户的写作风格,同时可在极度受限的资源环境(例如免费版Google Colab实例)中运行。我们的核心方法论贡献在于首次系统研究了该个性化写作任务的评估指标体系,以及不同系统组件选择(例如RAG的使用与不同微调方法)对系统性能的影响。我们还通过消融实验表明,通常不足100封电子邮件即足以构建可信的Panza模型。我们已发布完整的Panza代码及可用于研究的新型"David"个性化电子邮件数据集,两者均可在https://github.com/IST-DASLab/PanzaMail获取。