Large Language Models (LLMs), despite their remarkable capabilities across NLP tasks, struggle with phonologically-grounded phenomena like rhyme detection and generation. This is even more evident in lower-resource languages such as Modern Greek. In this paper, we present a hybrid system that combines LLMs with deterministic phonological algorithms to achieve accurate rhyme identification/analysis and generation. Our approach implements a comprehensive taxonomy of Greek rhyme types, including Pure, Rich, Imperfect, Mosaic, and Identical Pre-rhyme Vowel (IDV) patterns, and employs an agentic generation pipeline with phonological verification. We evaluate multiple prompting strategies (zero-shot, few-shot, Chain-of-Thought, and RAG-augmented) across several LLMs including Claude 3.7 and 4.5, GPT-4o, Gemini 2.0 and open-weight models like Llama 3.1 8B and 70B and Mistral Large. Results reveal a significant "Reasoning Gap": while native-like models (Claude 3.7) perform intuitively (40\% accuracy in identification), reasoning-heavy models (Claude 4.5) achieve state-of-the-art performance (54\%) only when prompted with Chain-of-Thought. Most critically, pure LLM generation fails catastrophically (under 4\% valid poems), while our hybrid verification loop restores performance to 73.1\%. We release our system and a corpus of 40,000+ rhymes, derived from the Anemoskala and Interwar Poetry corpora, to support future research.
翻译:尽管大型语言模型(LLMs)在自然语言处理任务中展现出卓越能力,但在处理基于音系的现象(如韵律检测与生成)时仍面临挑战。这一现象在资源相对匮乏的语言(如现代希腊语)中尤为明显。本文提出一种混合系统,通过将LLMs与确定性音系算法相结合,实现精确的韵律识别/分析与生成。我们的方法实现了希腊语韵律类型的完整分类体系,包括纯韵、富韵、非完美韵、马赛克韵及相同前元音韵等模式,并采用具备音系验证功能的智能生成流程。我们评估了多种提示策略(零样本、少样本、思维链及检索增强生成)在多个LLM上的表现,包括Claude 3.7与4.5、GPT-4o、Gemini 2.0以及开源模型如Llama 3.1 8B/70B和Mistral Large。结果揭示了显著的“推理鸿沟”:类原生模型(Claude 3.7)仅凭直觉达到40%的识别准确率,而强推理模型(Claude 4.5)仅在思维链提示下才能实现54%的顶尖性能。最关键的是,纯LLM生成完全失败(有效诗歌生成率低于4%),而我们的混合验证循环将性能恢复至73.1%。我们开源了本系统及包含4万余条韵律的语料库(源自Anemoskala与战间期诗歌语料库),以支持未来研究。