Nowadays, technology is rapidly advancing: bots are writing comments, articles, and reviews. Due to this fact, it is crucial to know if the text was written by a human or by a bot. This paper focuses on comparing structures of the coarse-grained partitions of semantic paths for human-written and bot-generated texts. We compare the clusterizations of datasets of n-grams from literary texts and texts generated by several bots. The hypothesis is that the structures and clusterizations are different. Our research supports the hypothesis. As the semantic structure may be different for different languages, we investigate Russian, English, German, and Vietnamese languages.
翻译:如今,技术日新月异:机器人正在撰写评论、文章和评价。因此,判断文本是由人类还是机器人所写至关重要。本文聚焦于人类撰写文本与机器人生成文本在语义路径粗粒度划分结构上的比较。我们将来自文学文本的n-gram数据集与多种机器人生成的文本数据集进行聚类分析。假设认为两者的结构与聚类存在差异,而我们的研究支持了这一假设。由于不同语言的语义结构可能存在差异,我们研究了俄语、英语、德语和越南语。