Alongside the continuous process of improving AI performance through the development of more sophisticated models, researchers have also focused their attention to the emerging concept of data-centric AI, which emphasizes the important role of data in a systematic machine learning training process. Nonetheless, the development of models has also continued apace. One result of this progress is the development of the Transformer Architecture, which possesses a high level of capability in multiple domains such as Natural Language Processing (NLP), Computer Vision (CV) and Time Series Forecasting (TSF). Its performance is, however, heavily dependent on input data preprocessing and output data evaluation, justifying a data-centric approach to future research. We argue that data-centric AI is essential for training AI models, particularly for transformer-based TSF models efficiently. However, there is a gap regarding the integration of transformer-based TSF and data-centric AI. This survey aims to pin down this gap via the extensive literature review based on the proposed taxonomy. We review the previous research works from a data-centric AI perspective and we intend to lay the foundation work for the future development of transformer-based architecture and data-centric AI.
翻译:在通过开发更复杂模型持续提升人工智能性能的同时,研究者们也开始关注新兴的数据中心化人工智能概念,该概念强调数据在系统性机器学习训练过程中的重要作用。尽管如此,模型开发也始终保持着快速发展。这一进展的成果之一是Transformer架构的诞生,该架构在自然语言处理、计算机视觉和时间序列预测等多个领域展现出卓越能力。然而,其性能高度依赖于输入数据预处理与输出数据评估,这为未来研究采用数据中心化方法提供了充分依据。我们认为,数据中心化人工智能对于高效训练AI模型至关重要,特别是对于基于Transformer的时间序列预测模型。然而,当前在基于Transformer的时间序列预测与数据中心化人工智能的融合方面仍存在研究空白。本综述旨在通过基于所提出分类法的广泛文献调研,明确界定这一空白领域。我们从数据中心化人工智能视角回顾了既往研究工作,并期望为未来基于Transformer的架构与数据中心化人工智能的发展奠定基础。