SparseEB-gMCR: A Generative Solver for Extreme Sparse Components with Application to Contamination Removal in GC-MS

Analytical chemistry instruments provide physically meaningful signals for elucidating analyte composition and play important roles in material, biological, and food analysis. These instruments are valued for strong alignment with physical principles, enabling compound identification through pattern matching with chemical libraries. More reliable instruments generate sufficiently sparse signals for direct interpretation. Generative multivariate curve resolution (gMCR) and its energy-based solver (EB-gMCR) offer powerful tools for decomposing mixed signals suitable for chemical data analysis. However, extreme signal sparsity from instruments such as GC-MS or 1H-NMR can impair EB-gMCR decomposability. To address this, a fixed EB-select module inheriting EB-gMCR's design was introduced for handling extreme sparse components. Combined with minor adjustments to energy optimization, this led to SparseEB-gMCR. In synthetic datasets, SparseEB-gMCR exhibited comparable decomposability and graceful scalability to dense-component EB-gMCR. The sparse variant was applied to real GC-MS chromatograms for unsupervised contamination removal. Analysis showed siloxane-related pollution signals were effectively eliminated, improving compound identification reliability. Results demonstrate that SparseEB-gMCR preserves the decomposability and self-determining component capability of EB-gMCR while extending adaptability to sparse and irregular chemical data. With this sparse extension, the EB-gMCR family becomes applicable to wider ranges of real-world chemical datasets, providing a general mathematical framework for signal unmixing and contamination elimination in analytical chemistry.

翻译：分析化学仪器能够提供具有物理意义的信号，用于阐明分析物组成，在材料、生物和食品分析中发挥着重要作用。这些仪器因其与物理原理的高度一致性而受到重视，可通过与化学库进行模式匹配实现化合物鉴定。更可靠的仪器能够产生足够稀疏的信号以供直接解析。生成式多元曲线分辨（gMCR）及其基于能量的求解器（EB-gMCR）为适用于化学数据分析的混合信号分解提供了强大工具。然而，来自GC-MS或1H-NMR等仪器的极端信号稀疏性可能损害EB-gMCR的可分解性。为解决此问题，本文引入了继承EB-gMCR设计的固定EB-select模块，用于处理极端稀疏成分。结合对能量优化的微调，最终形成了SparseEB-gMCR。在合成数据集中，SparseEB-gMCR表现出与稠密成分EB-gMCR相当的可分解性和良好的可扩展性。该稀疏变体被应用于真实GC-MS色谱图进行无监督污染去除。分析表明，硅氧烷相关污染信号被有效消除，提升了化合物鉴定的可靠性。结果表明，SparseEB-gMCR在保持EB-gMCR可分解性和自确定成分能力的同时，扩展了对稀疏及不规则化学数据的适应性。通过这一稀疏扩展，EB-gMCR系列方法可适用于更广泛的实际化学数据集，为分析化学中的信号解混和污染消除提供了通用的数学框架。