Thin-layer chromatography (TLC) is a crucial technique in molecular polarity analysis. Despite its importance, the interpretability of predictive models for TLC, especially those driven by artificial intelligence, remains a challenge. Current approaches, utilizing either high-dimensional molecular fingerprints or domain-knowledge-driven feature engineering, often face a dilemma between expressiveness and interpretability. To bridge this gap, we introduce Unsupervised Hierarchical Symbolic Regression (UHiSR), combining hierarchical neural networks and symbolic regression. UHiSR automatically distills chemical-intuitive polarity indices, and discovers interpretable equations that link molecular structure to chromatographic behavior.
翻译:薄层色谱(TLC)是分子极性分析中的关键技术。尽管其重要性不言而喻,但针对TLC的预测模型(尤其是人工智能驱动的模型)可解释性仍是一大挑战。当前方法或采用高维分子指纹,或依赖领域知识驱动的特征工程,往往陷入表现力与可解释性间的两难困境。为弥合这一鸿沟,我们提出无监督分层符号回归(UHiSR),该方法融合分层神经网络与符号回归技术。UHiSR能够自动提炼出具有化学直觉的极性指数,并发现将分子结构与其色谱行为相关联的可解释方程。