A model's capacity to generalize its knowledge to interpret unseen inputs with different characteristics is crucial to build robust and reliable machine learning systems. Language model evaluation tasks lack information metrics about model generalization and their applicability in a new setting is measured using task and language-specific downstream performance, which is often lacking in many languages and tasks. In this paper, we explore a set of efficient and reliable measures that could aid in computing more information related to the generalization capability of language models in cross-lingual zero-shot settings. In addition to traditional measures such as variance in parameters after training and distance from initialization, we also measure the effectiveness of sharpness in loss landscape in capturing the success in cross-lingual transfer and propose a novel and stable algorithm to reliably compute the sharpness of a model optimum that correlates to generalization.
翻译:模型将知识泛化以解释具有不同特征的未见输入的能力,对于构建稳健可靠的机器学习系统至关重要。语言模型评估任务缺乏关于模型泛化的信息指标,其在新的设置中的适用性通过任务和语言特定的下游性能来度量,而这在许多语言和任务中往往是缺失的。本文探索了一组高效且可靠的度量方法,以帮助计算与跨语言零样本设置中语言模型泛化能力相关的更多信息。除了诸如训练后参数方差和与初始化的距离等传统度量外,我们还衡量了损失景观锐度在捕捉跨语言迁移成功率方面的有效性,并提出了一种新颖且稳定的算法,以可靠地计算与泛化相关的模型最优锐度。