The growing popularity of neural machine translation (NMT) and LLMs represented by ChatGPT underscores the need for a deeper understanding of their distinct characteristics and relationships. Such understanding is crucial for language professionals and researchers to make informed decisions and tactful use of these cutting-edge translation technology, but remains underexplored. This study aims to fill this gap by investigating three key questions: (1) the distinguishability of ChatGPT-generated translations from NMT and human translation (HT), (2) the linguistic characteristics of each translation type, and (3) the degree of resemblance between ChatGPT-produced translations and HT or NMT. To achieve these objectives, we employ statistical testing, machine learning algorithms, and multidimensional analysis (MDA) to analyze Spokesperson's Remarks and their translations. After extracting a wide range of linguistic features, supervised classifiers demonstrate high accuracy in distinguishing the three translation types, whereas unsupervised clustering techniques do not yield satisfactory results. Another major finding is that ChatGPT-produced translations exhibit greater similarity with NMT than HT in most MDA dimensions, which is further corroborated by distance computing and visualization. These novel insights shed light on the interrelationships among the three translation types and have implications for the future advancements of NMT and generative AI.
翻译:神经机器翻译(NMT)以及以ChatGPT为代表的大语言模型的日益普及,凸显出深入理解其各自特征与相互关系的必要性。这种理解对于语言专业人员和研究人员而言至关重要,有助于他们在利用这些前沿翻译技术时做出明智决策并巧妙运用,但目前尚缺乏相关探索。本研究旨在通过探究三个关键问题来填补这一空白:(1)ChatGPT生成翻译与NMT及人工翻译的可区分性;(2)各类翻译的语言特征;(3)ChatGPT生成翻译与人工翻译或NMT之间的相似程度。为实现上述目标,我们采用统计检验、机器学习算法和多维分析方法,对发言人讲话及其翻译文本进行分析。在提取了涵盖范围广泛的语言特征后,监督分类器在区分三类翻译时展现出高准确率,而无监督聚类技术则未能取得满意结果。另一重要发现是,在多维分析的大多数维度上,ChatGPT生成翻译与NMT的相似度高于与人工翻译的相似度,这一结论通过距离计算与可视化得到进一步验证。这些新发现阐明了三类翻译之间的相互关系,并对NMT及生成式人工智能的未来发展具有启示意义。