Neural network models have a number of hyperparameters that must be chosen along with their architecture. This can be a heavy burden on a novice user, choosing which architecture and what values to assign to parameters. In most cases, default hyperparameters and architectures are used. Significant improvements to model accuracy can be achieved through the evaluation of multiple architectures. A process known as Neural Architecture Search (NAS) may be applied to automatically evaluate a large number of such architectures. A system integrating open source tools for Neural Architecture Search (OpenNAS), in the classification of images, has been developed as part of this research. OpenNAS takes any dataset of grayscale, or RBG images, and generates Convolutional Neural Network (CNN) architectures based on a range of metaheuristics using either an AutoKeras, a transfer learning or a Swarm Intelligence (SI) approach. Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO) are used as the SI algorithms. Furthermore, models developed through such metaheuristics may be combined using stacking ensembles. In the context of this paper, we focus on training and optimizing CNNs using the Swarm Intelligence (SI) components of OpenNAS. Two major types of SI algorithms, namely PSO and ACO, are compared to see which is more effective in generating higher model accuracies. It is shown, with our experimental design, that the PSO algorithm performs better than ACO. The performance improvement of PSO is most notable with a more complex dataset. As a baseline, the performance of fine-tuned pre-trained models is also evaluated.
翻译:神经网络模型具有多个超参数,这些参数需与架构一同选择。这对初学者而言是沉重负担,需要选择何种架构以及为参数赋予何值。多数情况下,默认超参数和架构被直接使用。而通过评估多种架构,可显著提升模型精度。神经架构搜索(NAS)技术可自动评估大量此类架构。本研究开发了一套集成开源工具的神经架构搜索系统(OpenNAS),用于图像分类任务。OpenNAS可处理任意灰度或RGB图像数据集,基于元启发式算法生成卷积神经网络(CNN)架构,采用AutoKeras、迁移学习或群体智能(SI)方法。群体智能算法采用粒子群优化(PSO)和蚁群优化(ACO)。此外,通过此类元启发式算法开发的模型可通过堆叠集成进行组合。本文重点研究利用OpenNAS的群体智能组件训练和优化CNN,比较两种主要SI算法(PSO与ACO)在生成更高模型精度方面的有效性。实验设计表明:PSO算法性能优于ACO,尤其在处理更复杂数据集时优势显著。同时,本文以微调预训练模型的性能作为基准进行了评估。