Deep learning models often struggle under natural distribution shifts, a common challenge in real-world deployments. Test-Time Adaptation (TTA) addresses this by adapting models during inference without labeled source data. We present the first evaluation of TTA methods for FER under natural domain shifts, performing cross-dataset experiments with widely used FER datasets. This moves beyond synthetic corruptions to examine real-world shifts caused by differing collection protocols, annotation standards, and demographics. Results show TTA can boost FER performance under natural shifts by up to 11.34\%. Entropy minimization methods such as TENT and SAR perform best when the target distribution is clean. In contrast, prototype adjustment methods like T3A excel under larger distributional distance scenarios. Finally, feature alignment methods such as SHOT deliver the largest gains when the target distribution is noisier than our source. Our cross-dataset analysis shows that TTA effectiveness is governed by the distributional distance and the severity of the natural shift across domains.
翻译:深度学习模型在自然分布偏移下常表现不佳,这是实际部署中常见的挑战。测试时适应通过在推理阶段调整模型(无需标注源数据)来解决此问题。我们首次在自然域偏移下评估了面部表情识别的测试时适应方法,使用广泛采用的面部表情识别数据集进行了跨数据集实验。这超越了合成噪声干扰,转而研究由不同采集协议、标注标准及人口统计学特征造成的真实世界偏移。结果表明,测试时适应可在自然偏移下将面部表情识别性能提升高达11.34%。当目标分布纯净时,基于熵最小化的方法(如TENT和SAR)表现最佳。相比之下,在分布距离较大的场景中,原型调整方法(如T3A)更为出色。最后,当目标分布噪声大于源分布时,特征对齐方法(如SHOT)带来最大增益。我们的跨数据集分析表明,测试时适应的有效性取决于域间分布距离及自然偏移的严重程度。