Test-time interventions for language models can enhance factual accuracy, mitigate harmful outputs, and improve model efficiency without costly retraining. But despite a flood of new methods, different types of interventions are largely developing independently. In practice, multiple interventions must be applied sequentially to the same model, yet we lack standardized ways to study how interventions interact. We fill this gap by introducing composable interventions, a framework to study the effects of using multiple interventions on the same language models, featuring new metrics and a unified codebase. Using our framework, we conduct extensive experiments and compose popular methods from three emerging intervention categories -- Knowledge Editing, Model Compression, and Machine Unlearning. Our results from 310 different compositions uncover meaningful interactions: compression hinders editing and unlearning, composing interventions hinges on their order of application, and popular general-purpose metrics are inadequate for assessing composability. Taken together, our findings showcase clear gaps in composability, suggesting a need for new multi-objective interventions. All of our code is public: https://github.com/hartvigsen-group/composable-interventions.
翻译:语言模型的测试时干预能够在不进行昂贵重新训练的情况下提高事实准确性、缓解有害输出并提升模型效率。尽管新方法层出不穷,但不同类型的干预技术大多独立发展。实践中,必须对同一模型顺序应用多种干预措施,然而我们缺乏标准化方法来研究干预措施之间的相互作用。我们通过引入可组合干预框架填补了这一空白,该框架具备新度量标准和统一代码库,可用于研究对同一语言模型实施多重干预的效果。基于该框架,我们开展了大量实验,并组合了来自三个新兴干预类别——知识编辑、模型压缩与机器遗忘——的流行方法。通过对310种不同组合的实验,我们发现了有意义的交互规律:压缩会阻碍编辑与遗忘效果、干预措施的组合效果取决于其应用顺序、现有通用评估指标不足以衡量可组合性。综合而言,我们的研究结果揭示了可组合性方面存在的明显缺陷,表明需要开发新的多目标干预技术。所有代码均已开源:https://github.com/hartvigsen-group/composable-interventions。