Existing propositions often rely on logical constants for classification. Compared with Western languages that lean towards hypotaxis such as English, Chinese often relies on semantic or logical understanding rather than logical connectives in daily expressions, exhibiting the characteristics of parataxis. However, existing research has rarely paid attention to this issue. And accurately classifying these propositions is crucial for natural language understanding and reasoning. In this paper, we put forward the concepts of explicit and implicit propositions and propose a comprehensive multi-level proposition classification system based on linguistics and logic. Correspondingly, we create a large-scale Chinese proposition dataset PEACE from multiple domains, covering all categories related to propositions. To evaluate the Chinese proposition classification ability of existing models and explore their limitations, We conduct evaluations on PEACE using several different methods including the Rule-based method, SVM, BERT, RoBERTA, and ChatGPT. Results show the importance of properly modeling the semantic features of propositions. BERT has relatively good proposition classification capability, but lacks cross-domain transferability. ChatGPT performs poorly, but its classification ability can be improved by providing more proposition information. Many issues are still far from being resolved and require further study.
翻译:现有命题分类通常依赖逻辑常项。相较于英语等偏向形合的西方语言,汉语在日常生活表达中往往依赖语义或逻辑理解而非逻辑连接词,呈现出意合特征。然而,现有研究鲜有关注这一问题。准确分类这些命题对于自然语言理解与推理至关重要。本文提出显性命题与隐性命题概念,并基于语言学与逻辑学构建了涵盖多层次的综合性命题分类体系。据此,我们从多个领域创建了大规模中文命题数据集PEACE,覆盖与命题相关的所有类别。为评估现有模型的中文命题分类能力并探究其局限性,我们采用规则法、SVM、BERT、RoBERTa及ChatGPT等多种方法在PEACE上开展评估。结果表明,合理建模命题语义特征至关重要。BERT具备较优的命题分类能力,但缺乏跨领域迁移性。ChatGPT表现欠佳,但通过提供更多命题信息可提升其分类能力。诸多问题仍远未解决,需进一步研究。