$BT^2$: Backward-compatible Training with Basis Transformation

Modern retrieval system often requires recomputing the representation of every piece of data in the gallery when updating to a better representation model. This process is known as backfilling and can be especially costly in the real world where the gallery often contains billions of samples. Recently, researchers have proposed the idea of Backward Compatible Training (BCT) where the new representation model can be trained with an auxiliary loss to make it backward compatible with the old representation. In this way, the new representation can be directly compared with the old representation, in principle avoiding the need for any backfilling. However, followup work shows that there is an inherent tradeoff where a backward compatible representation model cannot simultaneously maintain the performance of the new model itself. This paper reports our ``not-so-surprising'' finding that adding extra dimensions to the representation can help here. However, we also found that naively increasing the dimension of the representation did not work. To deal with this, we propose Backward-compatible Training with a novel Basis Transformation ($BT^2$). A basis transformation (BT) is basically a learnable set of parameters that applies an orthonormal transformation. Such a transformation possesses an important property whereby the original information contained in its input is retained in its output. We show in this paper how a BT can be utilized to add only the necessary amount of additional dimensions. We empirically verify the advantage of $BT^2$ over other state-of-the-art methods in a wide range of settings. We then further extend $BT^2$ to other challenging yet more practical settings, including significant change in model architecture (CNN to Transformers), modality change, and even a series of updates in the model architecture mimicking the evolution of deep learning models.

翻译：现代检索系统在更新到更优的表征模型时，通常需要重新计算图库中每个数据样本的表征。这一过程被称为回填，在现实场景中图库常包含数十亿样本时尤其昂贵。最近，研究者提出了向后兼容训练（BCT）的概念——通过引入辅助损失函数训练新表征模型，使其与旧表征保持向后兼容性。如此，新表征可直接与旧表征进行比较，理论上可完全避免回填需求。然而后续研究表明存在固有折衷：向后兼容的表征模型无法同时保持新模型自身的性能。本文报告了我们"并不意外"的发现：为表征添加额外维度可缓解此问题。但我们同时发现，简单增加表征维度的方法并不奏效。针对此问题，我们提出基于新型基变换的向后兼容训练方法（$BT^2$）。基变换本质上是一组可学习参数，用于实现正交归一变换。该变换具有重要特性：输入中的原始信息可完整保留在输出中。本文展示了如何利用基变换仅添加必要数量的额外维度。我们在多种设置下通过实验验证了$BT^2$相较于其他先进方法的优势。此外，我们将$BT^2$进一步扩展到更具挑战性的实际场景，包括模型架构的显著变化（CNN到Transformer）、模态变化，甚至模拟深度学习模型演进的一系列模型架构更新。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日