Knowledge editing (KE) aims to efficiently and precisely modify the behavior of large language models (LLMs) to update specific knowledge without negatively influencing other knowledge. Current research primarily focuses on white-box LLMs editing, overlooking an important scenario: black-box LLMs editing, where LLMs are accessed through interfaces and only textual output is available. To address the limitations of existing evaluations that are not inapplicable to black-box LLM editing and lack comprehensiveness, we propose a multi-perspective evaluation framework, incorporating the assessment of style retention for the first time. To tackle privacy leaks of editing data and style over-editing in current methods, we introduce a novel postEdit framework, resolving privacy concerns through downstream post-processing and maintaining textual style consistency via fine-grained editing to original responses. Experiments and analysis on two benchmarks demonstrate that postEdit outperforms all baselines and achieves strong generalization, especially with huge improvements on style retention (average $+20.82\%\uparrow$).
翻译:知识编辑(KE)旨在高效精确地修改大语言模型(LLMs)的行为,以更新特定知识而不对其他知识产生负面影响。当前研究主要聚焦于白盒LLM编辑,忽略了一个重要场景:黑盒LLM编辑,即通过接口访问LLM且仅能获取文本输出。针对现有评估无法适用于黑盒LLM编辑且缺乏全面性的局限,我们首次提出融合风格保持评估的多视角评价框架。为应对当前方法中存在的编辑数据隐私泄露与风格过度编辑问题,我们创新性地提出postEdit框架,通过下游后处理解决隐私问题,并借助对原始回答的细粒度编辑维持文本风格一致性。在两个基准上的实验与分析表明,postEdit优于所有基线方法且具备强泛化能力,尤其在风格保持方面取得巨大提升(平均提升$+20.82\%\uparrow$)。