Feature fusion plays a crucial role in unconstrained face recognition where inputs (probes) comprise of a set of $N$ low quality images whose individual qualities vary. Advances in attention and recurrent modules have led to feature fusion that can model the relationship among the images in the input set. However, attention mechanisms cannot scale to large $N$ due to their quadratic complexity and recurrent modules suffer from input order sensitivity. We propose a two-stage feature fusion paradigm, Cluster and Aggregate, that can both scale to large $N$ and maintain the ability to perform sequential inference with order invariance. Specifically, Cluster stage is a linear assignment of $N$ inputs to $M$ global cluster centers, and Aggregation stage is a fusion over $M$ clustered features. The clustered features play an integral role when the inputs are sequential as they can serve as a summarization of past features. By leveraging the order-invariance of incremental averaging operation, we design an update rule that achieves batch-order invariance, which guarantees that the contributions of early image in the sequence do not diminish as time steps increase. Experiments on IJB-B and IJB-S benchmark datasets show the superiority of the proposed two-stage paradigm in unconstrained face recognition. Code and pretrained models are available in https://github.com/mk-minchul/caface
翻译:特征融合在无约束人脸识别中扮演着关键角色,其中输入(探测样本)由一组$N$张低质量图像组成,且各图像质量存在差异。注意力机制与循环模块的发展催生了能够建模输入图像集内部关系的特征融合方法。然而,注意力机制因二次复杂度难以扩展至大规模$N$,而循环模块则受限于输入顺序敏感性。我们提出一种两阶段特征融合范式——聚类与聚合(Cluster and Aggregate),该范式既能扩展至大规模$N$,又能保持顺序不变性的序列推理能力。具体而言,聚类阶段将$N$个输入线性分配给$M$个全局聚类中心,聚合阶段则对$M$个聚类特征进行融合。当输入为序列数据时,聚类特征可作为历史特征的摘要,发挥核心作用。通过利用增量平均操作的顺序不变性,我们设计了一种实现批次顺序不变性的更新规则,确保序列中早期图像的贡献不随时间步长增加而衰减。在IJB-B和IJB-S基准数据集上的实验表明,所提两阶段范式在无约束人脸识别中具有优越性。代码与预训练模型已开源至https://github.com/mk-minchul/caface。