Rhetorical figures play an important role in our communication. They are used to convey subtle, implicit meaning, or to emphasize statements. We notice them in hate speech, fake news, and propaganda. By improving the systems for computational detection of rhetorical figures, we can also improve tasks such as hate speech and fake news detection, sentiment analysis, opinion mining, or argument mining. Unfortunately, there is a lack of annotated data, as well as qualified annotators that would help us build large corpora to train machine learning models for the detection of rhetorical figures. The situation is particularly difficult in languages other than English, and for rhetorical figures other than metaphor, sarcasm, and irony. To overcome this issue, we develop a web application called "Find your Figure" that facilitates the identification and annotation of German rhetorical figures. The application is based on the German Rhetorical ontology GRhOOT which we have specially adapted for this purpose. In addition, we improve the user experience with Retrieval Augmented Generation (RAG). In this paper, we present the restructuring of the ontology, the development of the web application, and the built-in RAG pipeline. We also identify the optimal RAG settings for our application. Our approach is one of the first to practically use rhetorical ontologies in combination with RAG and shows promising results.
翻译:修辞格在我们的交流中扮演着重要角色。它们被用来传达微妙、隐含的意义,或强调陈述。我们在仇恨言论、虚假新闻和宣传中都能注意到它们的存在。通过改进修辞格的计算检测系统,我们也能提升诸如仇恨言论与虚假新闻检测、情感分析、观点挖掘或论证挖掘等任务的效果。遗憾的是,目前缺乏足够的标注数据以及合格的标注人员来帮助我们构建大规模语料库,以训练用于检测修辞格的机器学习模型。对于英语以外的语言,以及隐喻、讽刺和反讽之外的修辞格,这一情况尤为困难。为克服此问题,我们开发了一款名为"Find your Figure"的Web应用程序,旨在促进德语修辞格的识别与标注。该应用基于我们为此专门调整的德语修辞学本体GRhOOT。此外,我们通过检索增强生成(RAG)技术提升了用户体验。本文介绍了本体的重构、Web应用的开发以及内置的RAG流程。我们还确定了适用于本应用的最佳RAG设置。我们的方法是首批将修辞学本体与RAG技术结合并投入实际应用的尝试之一,并展现了良好的应用前景。