Poetry holds immense significance within the cultural and traditional fabric of any nation. It serves as a vehicle for poets to articulate their emotions, preserve customs, and convey the essence of their culture. Arabic poetry is no exception, having played a cherished role in the heritage of the Arabic community throughout history and maintaining its relevance in the present era. Typically, comprehending Arabic poetry necessitates the expertise of a linguist who can analyze its content and assess its quality. This paper presents the introduction of a framework called \textit{Ashaar} https://github.com/ARBML/Ashaar, which encompasses a collection of datasets and pre-trained models designed specifically for the analysis and generation of Arabic poetry. The pipeline established within our proposed approach encompasses various aspects of poetry, such as meter, theme, and era classification. It also incorporates automatic poetry diacritization, enabling more intricate analyses like automated extraction of the \textit{Arudi} style. Additionally, we explore the feasibility of generating conditional poetry through the pre-training of a character-based GPT model. Furthermore, as part of this endeavor, we provide four datasets: one for poetry generation, another for diacritization, and two for Arudi-style prediction. These datasets aim to facilitate research and development in the field of Arabic poetry by enabling researchers and enthusiasts to delve into the nuances of this rich literary tradition.
翻译:诗歌在任何民族的文化与传统中都具有重要意义。它是诗人表达情感、保存习俗并传递文化精髓的载体。阿拉伯诗歌亦不例外——它作为阿拉伯社区文化遗产中珍视的组成部分贯穿历史,并在当代仍保持其重要性。通常,理解阿拉伯诗歌需要语言学家凭借专业知识分析其内容并评估质量。本文提出一个名为 \textit{Ashaar} (https://github.com/ARBML/Ashaar) 的框架,包含专为阿拉伯诗歌分析与生成设计的一系列数据集与预训练模型。该框架建立的流程涵盖诗歌的格律、主题及时代分类等多个维度,并实现自动化诗歌变音标注功能,从而支持更复杂的分析任务(如自动提取 \textit{Arudi} 风格)。此外,我们通过预训练基于字符的 GPT 模型探索条件诗歌生成的可行性。作为本研究的一部分,我们提供了四个数据集:一个用于诗歌生成,一个用于变音标注,两个用于 Arudi 风格预测。这些数据集旨在通过帮助研究者和爱好者深入探索这一丰富文学传统的微妙之处,推动阿拉伯诗歌领域的研究与发展。