The use of hyperspectral imaging to investigate food samples has grown due to the improved performance and lower cost of spectroscopy instrumentation. Food engineers use hyperspectral images to classify the type and quality of a food sample, typically using classification methods. In order to train these methods, every pixel in each training image needs to be labelled. Typically, computationally cheap threshold-based approaches are used to label the pixels, and classification methods are trained based on those labels. However, threshold-based approaches are subjective and cannot be generalized across hyperspectral images taken in different conditions and of different foods. Here a consensus-constrained parsimonious Gaussian mixture model (ccPGMM) is proposed to label pixels in hyperspectral images using a model-based clustering approach. The ccPGMM utilizes available information on the labels of a small number of pixels and the relationship between those pixels and neighbouring pixels as constraints when clustering the rest of the pixels in the image. A latent variable model is used to represent the high-dimensional data in terms of a small number of underlying latent factors. To ensure computational feasibility, a consensus clustering approach is employed, where the data are divided into multiple randomly selected subsets of variables and constrained clustering is applied to each data subset; the clustering results are then consolidated across all data subsets to provide a consensus clustering solution. The ccPGMM approach is applied to simulated datasets and real hyperspectral images of three types of puffed cereal, corn, rice, and wheat. Improved clustering performance and computational efficiency are demonstrated when compared to other current state-of-the-art approaches.
翻译:高光谱成像在食品样本研究中的应用日益增长,这得益于光谱仪器性能的提升和成本降低。食品工程师通常采用分类方法,通过高光谱图像对食品样本的类型和品质进行分类。为了训练这些方法,需要为每张训练图像中的每个像素标注标签。通常采用计算成本较低的阈值方法来标注像素,并基于这些标签训练分类模型。然而,阈值方法具有主观性,且无法推广至不同条件和不同食品的高光谱图像中。本文提出了一种基于共识约束的简约高斯混合模型(ccPGMM),采用基于模型的聚类方法对高光谱图像中的像素进行标注。该模型利用少量像素的已知标签信息及其与相邻像素之间的关系作为约束条件,对图像中其余像素进行聚类。通过潜变量模型,将高维数据表示为少量潜在因子的组合。为确保计算可行性,采用共识聚类方法:将数据划分为多个随机选择的变量子集,对每个子集施加约束聚类,然后整合所有子集的聚类结果形成共识聚类解。将ccPGMM方法应用于模拟数据集以及三种膨化谷物(玉米、大米、小麦)的真实高光谱图像。与现有先进方法相比,该方法在聚类性能和计算效率方面均展现出优越性。