Large Language Models (LLMs), despite their remarkable capabilities across NLP tasks, struggle with phonologically-grounded phenomena like rhyme detection and generation. This is even more evident in lower-resource languages such as Modern Greek. In this paper, we present a hybrid system that combines LLMs with deterministic phonological algorithms to achieve accurate rhyme identification/analysis and generation. Our approach implements a comprehensive taxonomy of Greek rhyme types, including Pure, Rich, Imperfect, Mosaic, and Identical Pre-rhyme Vowel (IDV) patterns, and employs an agentic generation pipeline with phonological verification. We evaluate multiple prompting strategies (zero-shot, few-shot, Chain-of-Thought, and RAG-augmented) across several LLMs including Claude 3.7 and 4.5, GPT-4o, Gemini 2.0 and open-weight models like Llama 3.1 8B and 70B and Mistral Large. Results reveal a significant "Reasoning Gap": while native-like models (Claude 3.7) perform intuitively (40\% accuracy in identification), reasoning-heavy models (Claude 4.5) achieve state-of-the-art performance (54\%) only when prompted with Chain-of-Thought. Most critically, pure LLM generation fails catastrophically (under 4\% valid poems), while our hybrid verification loop restores performance to 73.1\%. We release our system and a crucial, rigorously cleaned corpus of 40,000+ rhymes, derived from the Anemoskala and Interwar Poetry corpora, to support future research.
翻译:尽管大语言模型(LLMs)在众多自然语言处理任务中展现出卓越能力,但在押韵检测与生成这类基于音系的现象上仍存在明显不足。这一局限在如现代希腊语等低资源语言中尤为突出。本文提出一种混合系统,将大语言模型与确定性音系算法相结合,以实现精确的押韵识别/分析与生成。我们的方法实现了希腊语押韵类型的完整分类体系,包括纯韵、富韵、非完美韵、马赛克韵及相同预元音(IDV)模式,并采用具备音系验证功能的智能生成流程。我们评估了多种提示策略(零样本、少样本、思维链及检索增强生成)在多个大语言模型上的表现,包括Claude 3.7与4.5、GPT-4o、Gemini 2.0以及开源模型如Llama 3.1 8B/70B和Mistral Large。结果揭示了显著的"推理鸿沟":类人推理模型(Claude 3.7)仅凭直觉表现平平(识别准确率40%),而强推理模型(Claude 4.5)仅在思维链提示下才能达到最优性能(54%)。最关键的是,纯大语言模型生成完全失败(有效诗歌低于4%),而我们的混合验证循环将性能恢复至73.1%。我们开源了本系统及一个包含40,000余条押韵的关键性严格清洗语料库(源自Anemoskala与战间期诗歌语料库),以支持未来研究。