The Tsetlin Machine (TM) has gained significant attention in Machine Learning (ML). By employing logical fundamentals, it facilitates pattern learning and representation, offering an alternative approach for developing comprehensible Artificial Intelligence (AI) with a specific focus on pattern classification in the form of conjunctive clauses. In the domain of Natural Language Processing (NLP), TM is utilised to construct word embedding and describe target words using clauses. To enhance the descriptive capacity of these clauses, we study the concept of Reasoning by Elimination (RbE) in clauses' formulation, which involves incorporating feature negations to provide a more comprehensive representation. In more detail, this paper employs the Tsetlin Machine Auto-Encoder (TM-AE) architecture to generate dense word vectors, aiming at capturing contextual information by extracting feature-dense vectors for a given vocabulary. Thereafter, the principle of RbE is explored to improve descriptivity and optimise the performance of the TM. Specifically, the specificity parameter s and the voting margin parameter T are leveraged to regulate feature distribution in the state space, resulting in a dense representation of information for each clause. In addition, we investigate the state spaces of TM-AE, especially for the forgotten/excluded features. Empirical investigations on artificially generated data, the IMDB dataset, and the 20 Newsgroups dataset showcase the robustness of the TM, with accuracy reaching 90.62\% for the IMDB.
翻译:Tsetlin Machine (TM) 在机器学习领域获得了广泛关注。通过运用逻辑基础,它促进了模式学习与表示,为开发可理解的人工智能提供了一种替代方法,特别侧重于以合取子句形式进行模式分类。在自然语言处理领域,TM被用于构建词嵌入并通过子句描述目标词。为增强这些子句的描述能力,我们研究了子句构建中的消解推理概念,即通过引入特征否定来提供更全面的表示。具体而言,本文采用Tsetlin Machine Auto-Encoder (TM-AE) 架构生成稠密词向量,旨在通过提取给定词汇的特征稠密向量来捕捉上下文信息。随后,探索消解推理原理以提升TM的描述能力并优化其性能。特别地,利用特异性参数s和投票边际参数T来调控状态空间中的特征分布,从而为每个子句生成信息的稠密表示。此外,我们研究了TM-AE的状态空间,特别是针对被遗忘/排除的特征。在人工生成数据、IMDB数据集和20 Newsgroups数据集上的实证研究表明了TM的鲁棒性,其中IMDB数据集的准确率达到90.62\%。