The critical micelle concentration (CMC) of surfactant molecules is an essential property for surfactant applications in industry. Recently, classical QSPR and Graph Neural Networks (GNNs), a deep learning technique, have been successfully applied to predict the CMC of surfactants at room temperature. However, these models have not yet considered the temperature dependency of the CMC, which is highly relevant for practical applications. We herein develop a GNN model for temperature-dependent CMC prediction of surfactants. We collect about 1400 data points from public sources for all surfactant classes, i.e., ionic, nonionic, and zwitterionic, at multiple temperatures. We test the predictive quality of the model for following scenarios: i) when CMC data for surfactants are present in the training of the model in at least one different temperature, and ii) CMC data for surfactants are not present in the training, i.e., generalizing to unseen surfactants. In both test scenarios, our model exhibits a high predictive performance of R$^2 \geq $ 0.94 on test data. We also find that the model performance varies by surfactant class. Finally, we evaluate the model for sugar-based surfactants with complex molecular structures, as these represent a more sustainable alternative to synthetic surfactants and are therefore of great interest for future applications in the personal and home care industries.
翻译:表面活性剂分子的临界胶束浓度(CMC)是其工业应用中的重要性质。近年来,经典QSPR与图神经网络(GNNs,一种深度学习技术)已被成功应用于室温下表面活性剂CMC的预测。然而,这些模型尚未考虑对实际应用具有高度相关性的CMC温度依赖性。本文开发了一种用于预测表面活性剂温度依赖性CMC的GNN模型。我们从公开来源收集了约1400个数据点,涵盖所有表面活性剂类别(即离子型、非离子型和两性离子型)在多个温度下的数据。我们测试了模型在以下场景中的预测质量:(i)当训练集中包含表面活性剂在至少一个不同温度下的CMC数据时,以及(ii)训练集中不包含该类表面活性剂的CMC数据时(即对未见过表面活性剂的泛化能力)。在两个测试场景中,我们的模型在测试数据上均展现出R²≥0.94的高预测性能。我们还发现模型性能因表面活性剂类别而异。最后,我们针对具有复杂分子结构的糖基表面活性剂评估了该模型,这类表面活性剂作为合成表面活性剂更可持续的替代品,在个人护理与家居护理行业的未来应用中具有重要价值。