This study introduces KPoEM (Korean Poetry Emotion Mapping), a novel dataset that serves as a foundation for both emotion-centered analysis and generative applications in modern Korean poetry. Despite advancements in NLP, poetry remains underexplored due to its complex figurative language and cultural specificity. We constructed a multi-label dataset of 7,662 entries (7,007 line-level and 615 work-level), annotated with 44 fine-grained emotion categories from five influential Korean poets. The KPoEM emotion classification model, fine-tuned through a sequential strategy -- moving from general-purpose corpora to the specialized KPoEM dataset -- achieved an F1-micro score of 0.60, significantly outperforming previous models (0.43). The model demonstrates an enhanced ability to identify temporally and culturally specific emotional expressions while preserving core poetic sentiments. Furthermore, applying the structured emotion dataset to a RAG-based poetry generation model demonstrates the empirical feasibility of generating texts that reflect the emotional and cultural sensibilities of Korean literature. This integrated approach strengthens the connection between computational techniques and literary analysis, opening new pathways for quantitative emotion research and generative poetics. Overall, this study provides a foundation for advancing emotion-centered analysis and creation in modern Korean poetry.
翻译:本研究介绍了KPoEM(韩国诗歌情感映射)数据集,这是一个为韩国现代诗歌的情感中心分析和生成应用奠定基础的新型数据集。尽管自然语言处理领域取得了进展,但由于诗歌复杂的比喻语言和文化特殊性,其研究仍显不足。我们构建了一个包含7,662个条目(7,007个行级和615个作品级)的多标签数据集,标注了来自五位有影响力的韩国诗人的44个细粒度情感类别。通过从通用语料库到专门的KPoEM数据集的顺序微调策略,KPoEM情感分类模型的F1-micro分数达到了0.60,显著优于先前模型(0.43)。该模型在保持核心诗意情感的同时,表现出更强的识别时间和文化特定情感表达的能力。此外,将结构化情感数据集应用于基于RAG的诗歌生成模型,证明了生成反映韩国文学情感与文化感知的文本在经验上的可行性。这种综合方法加强了计算技术与文学分析之间的联系,为定量情感研究和生成诗学开辟了新途径。总体而言,本研究为推进韩国现代诗歌的情感中心分析与创作提供了基础。