Code-switching in contact varieties like Singaporean English (Singlish) challenges natural language generation due to limited parallel data and rapid lexical evolution. We propose a retrieval-augmented generation (RAG) framework that externalizes dialectal knowledge into a curated lexicon, enabling controlled lexical code-switching without fine-tuning. Our approach retrieves candidate Singlish expressions and guides generation through sparse lexical substitution. Human evaluation with 164 Singaporean participants found RAG and zero-shot prompting equally natural and appropriate. Automatic analyses reveal different transformation regimes: zero-shot prompting induces extensive paraphrasing (median 23 token edits), whereas RAG performs minimal substitutions (median 1 edit) with higher semantic preservation (mean cosine similarity 0.978 vs. 0.926). Our results demonstrate that externalizing code-switching into lexical resources enables control and auditability without sacrificing perceived quality, offering practical advantages for rapidly evolving contact varieties.
翻译:诸如新加坡英语(Singlish)这类接触变体中的语码转换,因平行数据有限且词汇演变快速,对自然语言生成构成挑战。我们提出一种检索增强生成(RAG)框架,将方言知识外化到策展词典中,无需微调即可实现受控的词汇级语码转换。该方法通过检索候选新式英语表达,并通过稀疏词汇替换引导生成过程。基于164名新加坡参与者的人工评估发现,RAG与零样本提示在自然度和恰当性方面表现相当。自动分析揭示了不同的转换模式:零样本提示引发大量释义(中位数23次词元编辑),而RAG仅进行最小替换(中位数1次编辑)且语义保留度更高(平均余弦相似度0.978对比0.926)。实验结果表明,将语码转换外化到词汇资源可在不牺牲感知质量的前提下实现可控性与可审计性,为快速演变的接触变体提供了实用优势。