Quality classification of wood boards is an essential task in the sawmill industry, which is still usually performed by human operators in small to median companies in developing countries. Machine learning algorithms have been successfully employed to investigate the problem, offering a more affordable alternative compared to other solutions. However, such approaches usually present some drawbacks regarding the proper selection of their hyperparameters. Moreover, the models are susceptible to the features extracted from wood board images, which influence the induction of the model and, consequently, its generalization power. Therefore, in this paper, we investigate the problem of simultaneously tuning the hyperparameters of an artificial neural network (ANN) as well as selecting a subset of characteristics that better describes the wood board quality. Experiments were conducted over a private dataset composed of images obtained from a sawmill industry and described using different feature descriptors. The predictive performance of the model was compared against five baseline methods as well as a random search, performing either ANN hyperparameter tuning and feature selection. Experimental results suggest that hyperparameters should be adjusted according to the feature set, or the features should be selected considering the hyperparameter values. In summary, the best predictive performance, i.e., a balanced accuracy of $0.80$, was achieved in two distinct scenarios: (i) performing only feature selection, and (ii) performing both tasks concomitantly. Thus, we suggest that at least one of the two approaches should be considered in the context of industrial applications.
翻译:木板质量分类是锯木厂行业中的一项关键任务,在发展中国家中小型企业中通常仍由人工操作完成。机器学习算法已成功应用于该问题的研究,相比其他解决方案提供了更具经济性的替代方案。然而,这类方法在超参数的合理选择方面通常存在缺陷。此外,模型易受木板图像中提取特征的影响,这些特征会影响模型归纳能力及其泛化性能。因此,本文研究了如何同时优化人工神经网络(ANN)的超参数与选择更优描述木材质量的特征子集的问题。实验基于包含锯木厂实际图像数据的私有数据集展开,数据采用不同特征描述符进行表征。模型预测性能与五种基线方法及随机搜索方法进行对比,并分别对ANN超参数调优和特征选择进行了实验。实验结果表明,超参数需根据特征集进行调整,或特征选择应结合超参数值进行。总体而言,最佳预测性能(即平衡准确率为$0.80$)出现在两种场景中:(i)仅执行特征选择,和(ii)同时执行两项任务。因此,我们建议在工业应用中至少应考虑这两种方法之一。