Authorship Verification (AV) (do two documents have the same author?) is essential for many sensitive real-life applications. AV is often used in proprietary domains that require a private, offline model, making SOTA online models like ChatGPT undesirable. Other SOTA systems use methods, e.g. Siamese Networks, that are uninterpretable, and hence cannot be trusted in high-stakes applications. In this work, we take the first step to address the above challenges with our model CAVE (Controllable Authorship Verification Explanations): CAVE generates free-text AV explanations that are controlled to be 1) structured (can be decomposed into sub-explanations with respect to relevant linguistic features), and 2) easily verified for explanation-label consistency (via intermediate labels in sub-explanations). In this work, we train a Llama-3-8B as CAVE; since there are no human-written corpora for AV explanations, we sample silver-standard explanations from GPT-4-TURBO and distill them into a pretrained Llama-3-8B. Results on three difficult AV datasets IMdB2, Blog-Auth, and FanFiction show that CAVE generates high quality explanations (as measured by automatic and human evaluation) as well as competitive task accuracies.
翻译:作者身份验证(AV)(两篇文档是否由同一作者所写?)在许多敏感的实际应用中至关重要。AV通常用于需要私有、离线模型的专有领域,这使得像ChatGPT这样的最先进在线模型并不适用。其他最先进的系统使用的方法(例如孪生网络)难以解释,因此无法在高风险应用中受到信任。在这项工作中,我们迈出了解决上述挑战的第一步,提出了我们的模型CAVE(可控的作者身份验证解释):CAVE生成自由文本的AV解释,这些解释被控制为1)结构化(可分解为关于相关语言特征的子解释),以及2)易于验证解释与标签的一致性(通过子解释中的中间标签)。在本工作中,我们训练了一个Llama-3-8B作为CAVE;由于没有用于AV解释的人工撰写语料库,我们从GPT-4-TURBO中采样银标准解释,并将其蒸馏到预训练的Llama-3-8B中。在三个困难的AV数据集IMdB2、Blog-Auth和FanFiction上的结果表明,CAVE生成了高质量的解释(通过自动和人工评估衡量),并取得了具有竞争力的任务准确率。