Machine learning's application in solar-thermal desalination is limited by data shortage and inconsistent analysis. This study develops an optimized dataset collection and analysis process for the representative solar still. By ultra-hydrophilic treatment on the condensation cover, the dataset collection process reduces the collection time by 83.3%. Over 1,000 datasets are collected, which is nearly one order of magnitude larger than up-to-date works. Then, a new interdisciplinary process flow is proposed. Some meaningful results are obtained that were not addressed by previous studies. It is found that Radom Forest might be a better choice for datasets larger than 1,000 due to both high accuracy and fast speed. Besides, the dataset range affects the quantified importance (weighted value) of factors significantly, with up to a 115% increment. Moreover, the results show that machine learning has a high accuracy on the extrapolation prediction of productivity, where the minimum mean relative prediction error is just around 4%. The results of this work not only show the necessity of the dataset characteristics' effect but also provide a standard process for studying solar-thermal desalination by machine learning, which would pave the way for interdisciplinary study.
翻译:机器学习在太阳能-热法脱盐中的应用受限于数据短缺与分析标准不一。本研究针对典型太阳能蒸馏装置,开发了一套优化的数据集采集与分析流程。通过对冷凝盖进行超亲水处理,数据集采集时间缩短了83.3%。共采集超过1000组数据集,其规模较现有研究提升近一个数量级。随后,本文提出了一种新的跨学科处理流程,并获得了以往研究未涉及的重要结论。研究发现:对于超过1000组的数据集,随机森林(Radom Forest)因兼具高精度与快速计算的优势,可能是更优选择。此外,数据集范围对影响因素的量化重要性(加权值)具有显著影响,最大增幅可达115%。研究结果还表明,机器学习在生产力外推预测中具有较高精度,最小平均相对预测误差仅为4%左右。本工作的成果不仅揭示了数据集特征影响的必要性,更为机器学习研究太阳能-热法脱盐提供了标准化流程,将为跨学科研究铺平道路。