Tabular data is considered the last unconquered castle of deep learning, yet the task of data stream classification is stated to be an equally important and demanding research area. Due to the temporal constraints, it is assumed that deep learning methods are not the optimal solution for application in this field. However, excluding the entire -- and prevalent -- group of methods seems rather rash given the progress that has been made in recent years in its development. For this reason, the following paper is the first to present an approach to natural language data stream classification using the sentence space method, which allows for encoding text into the form of a discrete digital signal. This allows the use of convolutional deep networks dedicated to image classification to solve the task of recognizing fake news based on text data. Based on the real-life Fakeddit dataset, the proposed approach was compared with state-of-the-art algorithms for data stream classification based on generalization ability and time complexity.
翻译:表格数据被视为深度学习的最后一座未攻克的堡垒,而数据流分类任务则被认为是一个同等重要且具有挑战性的研究领域。由于时间约束,人们通常认为深度学习方法并非该领域应用的最佳解决方案。然而,考虑到近年来深度学习领域取得的进展,完全排除这一广泛且主流的方法群似乎过于草率。因此,本文首次提出了一种利用句子空间方法进行自然语言数据流分类的途径,该方法能够将文本编码为离散数字信号的形式。这使得我们可以使用专为图像分类设计的卷积深度网络,来解决基于文本数据的虚假新闻识别任务。基于真实场景的Fakeddit数据集,本文所提出的方法在泛化能力和时间复杂度方面,与数据流分类领域的最先进算法进行了比较。