Large Language Models (LLMs) are increasingly used to ``professionalize'' workplace communication, often at the cost of linguistic identity. We introduce "Cultural Ghosting", the systematic erasure of linguistic markers unique to non-native English varieties during text processing. Through analysis of 22,350 LLM outputs generated from 1,490 culturally marked texts (Indian, Singaporean,& Nigerian English) processed by five models under three prompt conditions, we quantify this phenomenon using two novel metrics: Identity Erasure Rate (IER) & Semantic Preservation Score (SPS). Across all prompts, we find an overall IER of 10.26%, with model-level variation from 3.5% to 20.5% (5.9x range). Crucially, we identify a Semantic Preservation Paradox: models maintain high semantic similarity (mean SPS = 0.748) while systematically erasing cultural markers. Pragmatic markers (politeness conventions) are 1.9x more vulnerable than lexical markers (71.5% vs. 37.1% erasure). Our experiments demonstrate that explicit cultural-preservation prompts reduce erasure by 29% without sacrificing semantic quality.
翻译:大语言模型(LLMs)正日益被用于“专业化”职场沟通,但这往往以牺牲语言身份为代价。我们提出“文化幽灵化”这一概念,指在文本处理过程中对非母语英语变体特有语言标记的系统性消除。通过对五种模型在三种提示条件下处理1,490篇文化标记文本(印度、新加坡和尼日利亚英语)所产生的22,350个LLM输出进行分析,我们使用两个新颖指标——身份消除率(IER)和语义保持分数(SPS)——量化了这一现象。在所有提示条件下,我们测得总体IER为10.26%,模型层面的IER变异范围为3.5%至20.5%(相差5.9倍)。关键的是,我们发现了语义保持悖论:模型在保持高语义相似性(平均SPS = 0.748)的同时,系统性地消除了文化标记。语用标记(礼貌规范)的消除率是词汇标记的1.9倍(71.5% vs. 37.1%)。我们的实验表明,明确的文化保护提示可在不牺牲语义质量的前提下,将消除率降低29%。