In this work, we use language modeling to investigate the factors that influence code-switching. Code-switching occurs when a speaker alternates between one language variety (the primary language) and another (the secondary language), and is widely observed in multilingual contexts. Recent work has shown that code-switching is often correlated with areas of high information load in the primary language, but it is unclear whether high primary language load only makes the secondary language relatively easier to produce at code-switching points (speaker-driven code-switching), or whether code-switching is additionally used by speakers to signal the need for greater attention on the part of listeners (audience-driven code-switching). In this paper, we use bilingual Chinese-English online forum posts and transcripts of spontaneous Chinese-English speech to replicate prior findings that high primary language (Chinese) information load is correlated with switches to the secondary language (English). We then demonstrate that the information load of the English productions is even higher than that of meaning equivalent Chinese alternatives, and these are therefore not easier to produce, providing evidence of audience-driven influences in code-switching at the level of the communication channel, not just at the sociolinguistic level, in both writing and speech.
翻译:本研究利用语言建模方法探究影响语码转换的因素。语码转换指说话者在主要语言变体与次要语言变体之间交替使用的现象,在多语环境中广泛存在。近期研究表明,语码转换常与主要语言中信息负载较高的区域相关,但尚不清楚高主要语言负载是否仅使次要语言在转换点更易产出(说话者驱动的语码转换),抑或说话者还通过语码转换向听者传递需要更高注意力的信号(受众驱动的语码转换)。本文通过分析中英双语网络论坛帖子和自发中英语音转录材料,复现了先前发现:主要语言(中文)的高信息负载与转向次要语言(英文)的转换行为存在相关性。我们进一步证明,英语产出的信息负载甚至高于语义对应的中文替代表达,因此这些转换并非更易产出。这为语码转换中受众驱动的影响提供了证据——这种影响不仅存在于社会语言学层面,更体现在书面与口语交流渠道的信息论层面。