Including information from additional spectral bands (e.g., near-infrared) can improve deep learning model performance for many vision-oriented tasks. There are many possible ways to incorporate this additional information into a deep learning model, but the optimal fusion strategy has not yet been determined and can vary between applications. At one extreme, known as "early fusion," additional bands are stacked as extra channels to obtain an input image with more than three channels. At the other extreme, known as "late fusion," RGB and non-RGB bands are passed through separate branches of a deep learning model and merged immediately before a final classification or segmentation layer. In this work, we characterize the performance of a suite of multispectral deep learning models with different fusion approaches, quantify their relative reliance on different input bands and evaluate their robustness to naturalistic image corruptions affecting one or more input channels.
翻译:从额外光谱波段(如近红外)纳入信息可提升深度学习模型在众多视觉任务中的性能。尽管存在多种将此类附加信息整合至深度学习模型的方法,但最优融合策略尚未确定,且可能因应用而异。一种极端方案称为"早期融合",将额外波段作为附加通道堆叠,获得通道数超过三的输入图像;另一种极端方案称为"晚期融合",将RGB与非RGB波段分别输入深度学习模型的独立分支,在最终分类或分割层前进行合并。本研究表征了采用不同融合策略的多光谱深度学习模型组的性能,量化了各模型对不同输入波段的相对依赖程度,并评估了其针对影响单个或多个输入通道的自然图像损毁的鲁棒性。