Metaphor is a prominent linguistic device in human language and literature, as they add color, imagery, and emphasis to enhance effective communication. This paper introduces a large-scale high quality annotated Chinese Metaphor Corpus, which comprises around 28K sentences drawn from a diverse range of Chinese literary sources, such as poems, prose, song lyrics, etc. To ensure the accuracy and consistency of our annotations, we introduce a comprehensive set of guidelines. These guidelines address the facets of metaphor annotation, including identifying tenors, vehicles, and grounds to handling the complexities of similes, personifications, juxtapositions, and hyperboles. Breaking tradition, our approach to metaphor generation emphasizes grounds and their distinct features rather than the conventional combination of tenors and vehicles. By integrating "ground" as a CoT (Chain of Thoughts) input, we are able to generate metaphors that resonate more with real-world intuition. We test generative models such as Belle, Baichuan, and Chinese-alpaca-33B using our annotated corpus. These models are able to generate creative and fluent metaphor sentences more frequently induced by selected samples from our dataset, demonstrating the value of our corpus for Chinese metaphor research. The code is available in https://github.com/JasonShao55/Chinese_Metaphor_Explanation.
翻译:隐喻是人类语言与文学中重要的修辞手段,通过增添色彩、意象和强调来强化有效沟通。本文构建了一个大规模高质量的中文隐喻标注语料库,包含约28,000个句子,涵盖诗歌、散文、歌词等多种中文文学体裁。为确保标注的准确性与一致性,我们制定了一套全面的标注规范,涵盖本体、喻体和喻底的识别,以及明喻、拟人、对仗和夸张等复杂修辞现象的标注。与传统方法不同,我们的隐喻生成策略聚焦于喻底及其独特特征,而非单纯涉及本体与喻体的组合。通过将"喻底"作为思维链输入,我们能生成更符合现实直觉的隐喻。我们利用该标注语料库测试了Belle、Baichuan和Chinese-alpaca-33B等生成模型。实验表明,得益于数据集精选样本的引导,这些模型能更频繁地生成富有创造力且流畅的隐喻句子,验证了本语料库对中文隐喻研究的价值。代码已开源至 https://github.com/JasonShao55/Chinese_Metaphor_Explanation。