Temporal Domain Generalization (TDG) aims to generalize across temporal distribution shifts, e.g., lexical change over time. Prior work often addresses this by predicting future model weights. However, full model prediction is prohibitively expensive for even reasonably sized models. Thus, recent methods only predict the classifier layer, limiting generalization by failing to adjust other model components. To address this, we propose Temporal Experts Averaging (TEA), a novel and scalable TDG framework that updates the entire model using weight averaging to maximize generalization potential while minimizing computational costs. Our theoretical analysis guides us to two steps that enhance generalization to future domains. First, we create expert models with functional diversity yet parameter similarity by fine-tuning a domain-agnostic base model on individual temporal domains while constraining weight changes. Second, we optimize the bias-variance tradeoff through adaptive averaging coefficients derived from modeling temporal weight trajectories in a principal component subspace. Expert's contributions are based on their projected proximity to future domains. Extensive experiments across 7 TDG benchmarks, 5 models, and 2 TDG settings shows TEA outperforms prior TDG methods by up to 69% while being up to 60x more efficient.
翻译:时序域泛化旨在应对时序分布偏移的泛化问题,例如词汇随时间演变。先前研究通常通过预测未来模型权重来解决此问题。然而,即使对于中等规模模型,完整模型预测的计算代价也极为高昂。因此,近期方法仅预测分类器层,由于未能调整其他模型组件而限制了泛化能力。为解决这一问题,我们提出时序专家平均——一种新颖且可扩展的时序域泛化框架,该框架通过权重平均更新整个模型,在最小化计算成本的同时最大化泛化潜力。我们的理论分析指导我们采用两个步骤来增强对未来域的泛化能力:首先,通过在单个时序域上微调域无关基础模型并约束权重变化,构建具有功能多样性且参数相似的专家模型;其次,通过在主成分子空间中建模时序权重轨迹,推导自适应平均系数以优化偏差-方差权衡。专家模型的贡献度取决于其在投影空间中与未来域的邻近程度。在7个时序域泛化基准测试、5种模型架构和2种时序域泛化设置上的大量实验表明,TEA方法较现有时序域泛化方法的性能提升最高达69%,同时计算效率提升最高达60倍。