Editing Arbitrary Propositions in LLMs without Subject Labels

Large Language Model (LLM) editing modifies factual information in LLMs. Locate-and-Edit (L\&E) methods accomplish this by finding where relevant information is stored within the neural network, and editing the weights at that location. The goal of editing is to modify the response of an LLM to a proposition independently of its phrasing, while not modifying its response to other related propositions. Existing methods are limited to binary propositions, which represent straightforward binary relations between a subject and an object. Furthermore, existing methods rely on semantic subject labels, which may not be available or even be well-defined in practice. In this paper, we show that both of these issues can be effectively skirted with a simple and fast localization method called Gradient Tracing (GT). This localization method allows editing arbitrary propositions instead of just binary ones, and does so without the need for subject labels. As propositions always have a truth value, our experiments prompt an LLM as a boolean classifier, and edit its T/F response to propositions. Our method applies GT for location tracing, and then edit the model at that location using a mild variant of Rank-One Model Editing (ROME). On datasets of binary propositions derived from the CounterFact dataset, we show that our method -- without access to subject labels -- performs close to state-of-the-art L\&E methods which has access subject labels. We then introduce a new dataset, Factual Accuracy Classification Test (FACT), which includes non-binary propositions and for which subject labels are not generally applicable, and therefore is beyond the scope of existing L\&E methods. Nevertheless, we show that with our method editing is possible on FACT.

翻译：大语言模型编辑技术可修改LLM中的事实信息。定位-编辑方法通过寻找神经网络中存储相关信息的区域，并编辑该位置的权重来实现这一目标。编辑的目的是独立于命题表述方式修改LLM对命题的响应，同时保持其对其他相关命题的响应不变。现有方法受限于二元命题（即仅表示主体与客体间直接二元关系的命题），且依赖语义主体标签，但在实际应用中这些标签可能缺失甚至难以明确定义。本文证明，通过一种名为梯度追踪的简单快速定位方法，这两种问题均可有效规避。该定位方法无需主体标签即可实现对任意命题（而非仅限于二元命题）的编辑。由于命题始终具有真值，本实验将LLM作为布尔分类器进行提示，并编辑其对命题的真/假响应。本方法采用梯度追踪进行定位，随后使用秩一模型编辑的温和变体对定位区域进行模型编辑。在基于CounterFact数据集的二元命题实验中，我们的方法即使不依赖主体标签，其性能也接近需要主体标签的最先进定位-编辑方法。我们进一步引入新数据集FACT（事实准确性分类测试），该数据集包含非二元命题且通常无法适用主体标签，故超出了现有定位-编辑方法的研究范畴。然而实验证明，本方法可在FACT数据集上成功实现编辑。