The purpose of this study is to investigate the development process for Artificial inelegance (AI) and machine learning (ML) applications in order to provide the best support environment. The main stages of ML are problem understanding, data management, model building, model deployment and maintenance. This project focuses on investigating the data management stage of ML development and its obstacles as it is the most important stage of machine learning development because the accuracy of the end model is relying on the kind of data fed into the model. The biggest obstacle found on this stage was the lack of sufficient data for model learning, especially in the fields where data is confidential. This project aimed to build and develop a framework for researchers and developers that can help solve the lack of sufficient data during data management stage. The framework utilizes several data augmentation techniques that can be used to generate new data from the original dataset which can improve the overall performance of the ML applications by increasing the quantity and quality of available data to feed the model with the best possible data. The framework was built using python language to perform data augmentation using deep learning advancements.
翻译:本研究旨在探究人工智能(AI)与机器学习(ML)应用的开发流程,以提供最佳支持环境。机器学习的主要阶段包括问题理解、数据管理、模型构建、模型部署与维护。本课题聚焦于机器学习开发中的数据管理阶段及其障碍,该阶段是机器学习开发中最重要的环节,因为最终模型的精度依赖于输入模型的数据质量。研究发现在此阶段面临的最大障碍是缺乏足够的模型学习数据,尤其是在数据具有保密性的领域。本项目旨在为研究人员和开发人员构建一个能够解决数据管理阶段数据不足问题的框架。该框架采用多种数据增强技术,可从原始数据集中生成新数据,通过增加可用数据的数量和质量,为模型提供最优数据输入,从而提升机器学习应用的整体性能。该框架基于Python语言构建,利用深度学习技术实现数据增强。