Hyperspectral Benchmark: Bridging the Gap between HSI Applications through Comprehensive Dataset and Pretraining

Hyperspectral Imaging (HSI) serves as a non-destructive spatial spectroscopy technique with a multitude of potential applications. However, a recurring challenge lies in the limited size of the target datasets, impeding exhaustive architecture search. Consequently, when venturing into novel applications, reliance on established methodologies becomes commonplace, in the hope that they exhibit favorable generalization characteristics. Regrettably, this optimism is often unfounded due to the fine-tuned nature of models tailored to specific HSI contexts. To address this predicament, this study introduces an innovative benchmark dataset encompassing three markedly distinct HSI applications: food inspection, remote sensing, and recycling. This comprehensive dataset affords a finer assessment of hyperspectral model capabilities. Moreover, this benchmark facilitates an incisive examination of prevailing state-of-the-art techniques, consequently fostering the evolution of superior methodologies. Furthermore, the enhanced diversity inherent in the benchmark dataset underpins the establishment of a pretraining pipeline for HSI. This pretraining regimen serves to enhance the stability of training processes for larger models. Additionally, a procedural framework is delineated, offering insights into the handling of applications afflicted by limited target dataset sizes.

翻译：高光谱成像（HSI）作为一种无损空间光谱技术，具有众多潜在应用。然而，一个反复出现的挑战在于目标数据集规模有限，阻碍了全面的架构搜索。因此，在探索新型应用时，人们通常依赖已有方法，期望它们展现出良好的泛化特性。遗憾的是，由于模型针对特定高光谱场景的微调特性，这种乐观预期往往落空。为解决这一困境，本研究引入了一个创新的基准数据集，涵盖三个截然不同的高光谱成像应用：食品检测、遥感与回收利用。这一综合数据集能够更精细地评估高光谱模型能力。此外，该基准有助于深入剖析现有先进技术，从而推动更优方法论的演进。更进一步，基准数据集本身所具备的增强多样性，为建立高光谱成像的预训练流程奠定了基础。这一预训练机制有助于提升大型模型训练过程的稳定性。同时，本文还勾勒出一个流程框架，为处理受限于目标数据集规模的应用提供了见解。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日