Copper nanoparticles (Cu NPs) have a broad applicability, yet their synthesis is sensitive to subtle changes in reaction parameters. This sensitivity, combined with the time- and resource-intensive nature of experimental optimization, poses a major challenge in achieving reproducible and size-controlled synthesis. While Machine Learning (ML) shows promise in materials research, its application is often limited by scarcity of large high-quality experimental data sets. This study explores ML to predict the size of Cu NPs from microwave-assisted polyol synthesis using a small data set of 25 in-house performed syntheses. Latin Hypercube Sampling is used to efficiently cover the parameter space while creating the experimental data set. Ensemble regression models successfully predict particle sizes with high accuracy ($R^2 = 0.74$), outperforming classical statistical approaches ($R^2 = 0.60$). Additionally, classification models using both random forests and Large Language Models (LLMs) are evaluated to distinguish between large and small particles. While random forests show moderate performance, LLMs offer no significant advantages under data-scarce conditions. Overall, this study demonstrates that carefully curated small data sets, paired with robust classical ML, can effectively predict the synthesis of Cu NPs and highlights that for lab-scale studies, complex models like LLMs may offer limited benefit over simpler techniques.
翻译:铜纳米颗粒(Cu NPs)具有广泛的应用前景,但其合成过程对反应参数的细微变化极为敏感。这种敏感性,加上实验优化过程耗时且资源密集的特性,使得实现可重复且尺寸可控的合成面临重大挑战。尽管机器学习在材料研究中展现出潜力,但其应用常受限于高质量大规模实验数据集的稀缺。本研究探索了利用机器学习,基于一个仅包含25次内部合成实验的小型数据集,来预测微波辅助多元醇法合成中Cu NPs的尺寸。在创建实验数据集时,采用拉丁超立方采样法以高效覆盖参数空间。集成回归模型成功高精度地预测了颗粒尺寸($R^2 = 0.74$),其表现优于经典统计方法($R^2 = 0.60$)。此外,本研究评估了使用随机森林和大语言模型(LLMs)的分类模型,以区分大颗粒与小颗粒。随机森林表现出中等性能,而LLMs在数据稀缺条件下未显示出显著优势。总体而言,本研究证明,精心构建的小型数据集与稳健的经典机器学习方法相结合,可以有效预测Cu NPs的合成结果,并强调对于实验室规模的研究,LLMs等复杂模型相较于更简单的技术可能益处有限。