Large Language Models (LLMs) are increasingly deployed in diverse cultural contexts, yet their ability to master aesthetic stylistics, i.e., the strategic use of language to evoke cultural resonance, remains underexplored. We curate C4STYLI, a benchmark of highly stylized translated movie titles and advertising slogans from Hong Kong and the Chinese Mainland, to evaluate LLMs via the lens of behavioral recognition and productive competence. Extensive evaluations show that LLMs differ from humans in stylistic recognition, and this recognition ability varies across text domains. In addition, stylistic recognition and generation performance in LLMs are not consistently aligned. To further examine whether LLMs genuinely capture stylistic information in stylistic recognition, we conduct structural ablation with logistic regression probes. We find that, in the Hong Kong setting, stylistic recognition in LLMs relies primarily on surface-level linguistic information rather than stylistic structure. This suggests limited sensitivity to Hong Kong-specific stylistic structure.
翻译:大型语言模型(LLMs)正越来越多地部署在多元文化环境中,但其掌握美学风格学(即通过语言策略性运用唤起文化共鸣)的能力仍未被充分探索。我们构建了C4STYLI基准数据集,包含来自香港和中国大陆的高度风格化电影译名与广告标语,通过行为识别与生成能力两个维度评估LLMs。广泛评估表明,LLMs在风格识别上与人类存在差异,且这种识别能力在不同文本领域间有所变化。此外,LLMs的风格识别与生成能力并不一致对齐。为进一步检验LLMs在风格识别中是否真正捕捉到风格信息,我们采用逻辑回归探针进行结构消融实验。研究发现,在香港场景下,LLMs的风格识别主要依赖表层语言信息而非风格结构,这表明其对香港特有风格结构的敏感性有限。