This paper presents an application of artificial intelligence on mass spectrometry data for detecting habitability potential of ancient Mars. Although data was collected for planet Mars the same approach can be replicated for any terrestrial object of our solar system. Furthermore, proposed methodology can be adapted to any domain that uses mass spectrometry. This research is focused in data analysis of two mass spectrometry techniques, evolved gas analysis (EGA-MS) and gas chromatography (GC-MS), which are used to identify specific chemical compounds in geological material samples. The study demonstrates the applicability of EGA-MS and GC-MS data to extra-terrestrial material analysis. Most important features of proposed methodology includes square root transformation of mass spectrometry values, conversion of raw data to 2D sprectrograms and utilization of specific machine learning models and techniques to avoid overfitting on relative small datasets. Both EGA-MS and GC-MS datasets come from NASA and two machine learning competitions that the author participated and exploited. Complete running code for the GC-MS dataset/competition is available at GitHub.1 Raw training mass spectrometry data include [0, 1] labels of specific chemical compounds, selected to provide valuable insights and contribute to our understanding of the potential past habitability of Mars.
翻译:本文提出了一种将人工智能应用于质谱数据的方法,用于探测古代火星的宜居潜力。尽管数据采集自火星,但相同方法可复用于太阳系中任何类地天体。此外,所提出的方法论可适应于任何使用质谱技术的领域。本研究聚焦于两种质谱技术的数据分析:逸出气体分析(EGA-MS)与气相色谱(GC-MS),这些技术被用于识别地质材料样品中的特定化学化合物。研究证明了EGA-MS与GC-MS数据在地外物质分析中的适用性。该方法论的关键特征包括:对质谱值进行平方根变换、将原始数据转换为二维质谱图、以及利用特定机器学习模型与技术以避免在相对较小的数据集上过拟合。EGA-MS与GC-MS数据集均源自NASA及作者参与并利用过的两次机器学习竞赛。GC-MS数据集/竞赛的完整运行代码已公开在GitHub上。原始训练质谱数据包含特定化学化合物的[0, 1]标签,这些化合物被选定用于提供宝贵见解,并增进我们对火星过去潜在宜居性的理解。