The burgeoning growth of public domain data and the increasing complexity of deep learning model architectures have underscored the need for more efficient data representation and analysis techniques. This paper is motivated by the work of Helal (2023) and aims to present a comprehensive overview of tensorization. This transformative approach bridges the gap between the inherently multidimensional nature of data and the simplified 2-dimensional matrices commonly used in linear algebra-based machine learning algorithms. This paper explores the steps involved in tensorization, multidimensional data sources, various multiway analysis methods employed, and the benefits of these approaches. A small example of Blind Source Separation (BSS) is presented comparing 2-dimensional algorithms and a multiway algorithm in Python. Results indicate that multiway analysis is more expressive. Contrary to the intuition of the dimensionality curse, utilising multidimensional datasets in their native form and applying multiway analysis methods grounded in multilinear algebra reveal a profound capacity to capture intricate interrelationships among various dimensions while, surprisingly, reducing the number of model parameters and accelerating processing. A survey of the multi-away analysis methods and integration with various Deep Neural Networks models is presented using case studies in different domains.
翻译:公共领域数据的快速增长以及深度学习模型架构日益复杂,凸显了对更高效数据表示与分析技术的迫切需求。受Helal(2023)工作的启发,本文旨在全面综述张量化方法。这种变革性方法弥合了数据固有的多维特性与线性代数机器学习算法中常用的简化二维矩阵之间的鸿沟。本文探讨了张量化的实施步骤、多维数据来源、所采用的各种多路分析方法及其优势。以盲源分离(BSS)为例,通过Python对比了二维算法与多路算法的性能差异。结果表明多路分析具有更强的表达能力。与“维度灾难”的直觉相反,将多维数据集保持原始形式,并应用基于多重线性代数的多路分析方法,不仅能大幅捕捉各维度间复杂的内在关联,还能出人意料地减少模型参数数量并加速处理过程。本文通过不同领域的案例研究,系统梳理了多路分析方法及其与各类深度神经网络模型的融合应用。