Including information from additional spectral bands (e.g., near-infrared) can improve deep learning model performance for many vision-oriented tasks. There are many possible ways to incorporate this additional information into a deep learning model, but the optimal fusion strategy has not yet been determined and can vary between applications. At one extreme, known as "early fusion," additional bands are stacked as extra channels to obtain an input image with more than three channels. At the other extreme, known as "late fusion," RGB and non-RGB bands are passed through separate branches of a deep learning model and merged immediately before a final classification or segmentation layer. In this work, we characterize the performance of a suite of multispectral deep learning models with different fusion approaches, quantify their relative reliance on different input bands and evaluate their robustness to naturalistic image corruptions affecting one or more input channels.
翻译:包含来自额外光谱波段(如近红外)的信息,可提升深度学习模型在诸多视觉任务中的性能。将此类附加信息融入深度学习模型存在多种可能途径,但最佳融合策略尚未确定,且因应用场景而异。一种极端策略称为“早期融合”,即通过堆叠额外波段作为更多通道,获得超过三个通道的输入图像;另一种极端策略称为“晚期融合”,即让RGB与非RGB波段分别通过深度学习模型的不同分支,在最终分类或分割层前合并。本研究表征了采用不同融合方法的多光谱深度学习模型集合的性能差异,量化了各模型对不同输入波段的相对依赖程度,并评估了它们对影响一个或多个输入通道的自然图像退化的鲁棒性。