Knowledge editing offers an efficient way to update model knowledge without full retraining, but prior work has concentrated almost exclusively on textual or visual modalities. We introduce SAKE, the first benchmark specifically designed for editing auditory attribute knowledge in Large Audio-Language Models (LALMs). Unlike factual updates, SAKE targets several abstract auditory attributes, capturing knowledge types that go beyond conventional textual and visual domains. We benchmark seven editing methods on two LALMs along four dimensions: reliability, generality, audio/text locality, and portability. Results highlight challenges such as preserving intra-attribute knowledge unrelated to the edit, generalizing edits to multimodal reasoning, and maintaining edits under sequential updates. SAKE provides a principled framework to study how knowledge editing extends to the auditory modalities, opening new directions for maintaining and adapting LALMs in more diverse real-world scenarios.
翻译:知识编辑提供了一种无需完整重训练即可更新模型知识的高效方法,但先前工作几乎完全集中于文本或视觉模态。本文提出SAKE,这是首个专门为大型音频-语言模型(LALMs)中听觉属性知识编辑设计的基准。与事实性更新不同,SAKE针对若干抽象听觉属性,捕捉超越传统文本与视觉领域的知识类型。我们在两种LALMs上对七种编辑方法进行了四个维度的基准测试:可靠性、泛化性、音频/文本局部性及可移植性。结果揭示了若干挑战,包括:保护与编辑无关的同类属性知识、将编辑泛化至多模态推理,以及在序列更新下保持编辑效果。SAKE为研究知识编辑如何扩展至听觉模态提供了原则性框架,为在更丰富的现实场景中维护和适配LALMs开辟了新方向。