Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers

Multi-Channel Imaging (MCI) contains an array of challenges for encoding useful feature representations not present in traditional images. For example, images from two different satellites may both contain RGB channels, but the remaining channels can be different for each imaging source. Thus, MCI models must support a variety of channel configurations at test time. Recent work has extended traditional visual encoders for MCI, such as Vision Transformers (ViT), by supplementing pixel information with an encoding representing the channel configuration. However, these methods treat each channel equally, i.e., they do not consider the unique properties of each channel type, which can result in needless and potentially harmful redundancies in the learned features. For example, if RGB channels are always present, the other channels can focus on extracting information that cannot be captured by the RGB channels. To this end, we propose DiChaViT, which aims to enhance the diversity in the learned features of MCI-ViT models. This is achieved through a novel channel sampling strategy that encourages the selection of more distinct channel sets for training. Additionally, we employ regularization and initialization techniques to increase the likelihood that new information is learned from each channel. Many of our improvements are architecture agnostic and can be incorporated into new architectures as they are developed. Experiments on both satellite and cell microscopy datasets, CHAMMI, JUMP-CP, and So2Sat, report DiChaViT yields a 1.5 - 5.0% gain over the state-of-the-art. Our code is publicly available at https://github.com/chaudatascience/diverse_channel_vit.

翻译：多通道成像（MCI）在编码有效特征表示方面面临一系列传统图像中不存在的挑战。例如，来自两颗不同卫星的图像可能都包含RGB通道，但其余通道可能因成像源而异。因此，MCI模型必须在测试时支持多种通道配置。近期研究通过为像素信息补充表示通道配置的编码，扩展了传统视觉编码器（如视觉Transformer（ViT））以适用于MCI。然而，这些方法平等对待每个通道，即未考虑各通道类型的独特属性，可能导致学习特征中出现不必要且可能有害的冗余。例如，若RGB通道始终存在，其他通道可专注于提取RGB通道无法捕获的信息。为此，我们提出DiChaViT，旨在增强MCI-ViT模型学习特征的多样性。这通过一种新颖的通道采样策略实现，该策略鼓励在训练中选择更具差异性的通道集。此外，我们采用正则化与初始化技术以提高从各通道学习新信息的可能性。我们的多数改进与架构无关，可融入未来开发的新架构中。在卫星与细胞显微镜数据集CHAMMI、JUMP-CP和So2Sat上的实验表明，DiChaViT相较最先进方法获得1.5%-5.0%的性能提升。代码已公开于https://github.com/chaudatascience/diverse_channel_vit。