Most machine learning models require many iterations of hyper-parameter tuning, feature engineering, and debugging to produce effective results. As machine learning models become more complicated, this pipeline becomes more difficult to manage effectively. In the physical sciences, there is an ever-increasing pool of metadata that is generated by the scientific research cycle. Tracking this metadata can reduce redundant work, improve reproducibility, and aid in the feature and training dataset engineering process. In this case study, we present a tool for machine learning metadata management in dynamic radiography. We evaluate the efficacy of this tool against the initial research workflow and discuss extensions to general machine learning pipelines in the physical sciences.
翻译:大多数机器学习模型需要经过多次超参数调优、特征工程和调试才能产生有效结果。随着机器学习模型日益复杂,这一流程的有效管理变得愈发困难。在物理科学领域,科学研究周期产生的元数据池持续扩大。追踪这些元数据可以减少冗余工作、提升可复现性,并辅助特征与训练数据集构建过程。本案例研究提出了一种用于动态放射影像领域的机器学习元数据管理工具。我们通过对比原始研究流程评估了该工具的有效性,并探讨了其在物理科学通用机器学习流程中的扩展应用。