Surfactants are key ingredients in foaming and cleansing products across various industries such as personal and home care, industrial cleaning, and more, with the critical micelle concentration (CMC) being of major interest. Predictive models for CMC of pure surfactants have been developed based on recent ML methods, however, in practice surfactant mixtures are typically used due to to performance, environmental, and cost reasons. This requires accounting for synergistic/antagonistic interactions between surfactants; however, predictive ML models for a wide spectrum of mixtures are missing so far. Herein, we develop a graph neural network (GNN) framework for surfactant mixtures to predict the temperature-dependent CMC. We collect data for 108 surfactant binary mixtures, to which we add data for pure species from our previous work [Brozos et al. (2024), J. Chem. Theory Comput.]. We then develop and train GNNs and evaluate their accuracy across different prediction test scenarios for binary mixtures relevant to practical applications. The final GNN models demonstrate very high predictive performance when interpolating between different mixture compositions and for new binary mixtures with known species. Extrapolation to binary surfactant mixtures where either one or both surfactant species are not seen before, yields accurate results for the majority of surfactant systems. We further find superior accuracy of the GNN over a semi-empirical model based on activity coefficients, which has been widely used to date. We then explore if GNN models trained solely on binary mixture and pure species data can also accurately predict the CMCs of ternary mixtures. Finally, we experimentally measure the CMC of 4 commercial surfactants that contain up to four species and industrial relevant mixtures and find a very good agreement between measured and predicted CMC values.
翻译:表面活性剂是个人及家庭护理、工业清洗等多个行业中发泡与清洁产品的关键成分,其中临界胶束浓度(CMC)是核心关注指标。基于近期机器学习方法,针对纯表面活性剂CMC的预测模型已得到发展;然而在实际应用中,出于性能、环境及成本考虑,通常使用表面活性剂混合物。这需要考量表面活性剂之间的协同/拮抗相互作用,但迄今为止仍缺乏适用于广谱混合物的预测性机器学习模型。本文中,我们开发了面向表面活性剂混合物的图神经网络(GNN)框架,用于预测温度依赖的CMC。我们收集了108种表面活性剂二元混合物的数据,并整合了先前工作[Brozos等人(2024),《化学理论与计算杂志》]中的纯物质数据。随后我们开发并训练了GNN模型,并在与实际应用相关的二元混合物不同预测测试场景中评估其准确性。最终GNN模型在不同混合组分间插值预测以及对已知组分的新型二元混合物预测中均表现出极高的预测性能。在预测至少一种或两种表面活性剂组分未曾出现过的二元混合物时,对大多数表面活性剂体系仍能获得准确结果。我们进一步发现GNN相较于基于活度系数的半经验模型(该模型迄今被广泛使用)具有更优的准确性。随后我们探究了仅使用二元混合物和纯物质数据训练的GNN模型是否也能准确预测三元混合物的CMC。最后,我们对4种包含多达四种组分且具有工业相关性的商用表面活性剂混合物进行了CMC实验测量,发现测量值与预测值之间具有高度一致性。