Predicting nationality from personal names has practical value in marketing, demographic research, and genealogical studies. Conventional neural models learn statistical correspondences between names and nationalities from task-specific training data, posing challenges in generalizing to low-frequency nationalities and distinguishing similar nationalities within the same region. Large language models (LLMs) have the potential to address these challenges by leveraging world knowledge acquired during pre-training. In this study, we comprehensively compare neural models and LLMs on nationality prediction, evaluating six neural models and six LLM prompting strategies across three granularity levels (nationality, region, and continent), with frequency-based stratified analysis and error analysis. Results show that LLMs outperform neural models at all granularity levels, with the gap narrowing as granularity becomes coarser. Simple machine learning methods exhibit the highest frequency robustness, while pre-trained models and LLMs show degradation for low-frequency nationalities. Error analysis reveals that LLMs tend to make ``near-miss'' errors, predicting the correct region even when nationality is incorrect, whereas neural models exhibit more cross-regional errors and bias toward high-frequency classes. These findings indicate that LLM superiority stems from world knowledge, model selection should consider required granularity, and evaluation should account for error quality beyond accuracy.
翻译:从个人姓名预测国籍在市场营销、人口统计研究和家谱学研究中具有实用价值。传统神经网络模型从特定任务的训练数据中学习姓名与国籍之间的统计对应关系,但在泛化至低频国籍以及区分同一区域内相似国籍方面面临挑战。大语言模型(LLMs)通过利用预训练期间获得的世界知识,有潜力应对这些挑战。本研究在国籍预测任务上对神经网络模型与大语言模型进行了全面比较,评估了六种神经网络模型和六种LLM提示策略,涵盖三个粒度级别(国籍、地区和大陆),并进行了基于频率的分层分析和错误分析。结果表明,LLMs在所有粒度级别上均优于神经网络模型,且随着粒度变粗,性能差距缩小。简单的机器学习方法表现出最高的频率鲁棒性,而预训练模型和LLMs在低频国籍上性能下降。错误分析表明,LLMs倾向于产生“近失”错误,即即使国籍预测错误,也常能预测出正确的地域;而神经网络模型则表现出更多的跨地域错误以及对高频类别的偏向。这些发现表明,LLMs的优越性源于其世界知识,模型选择应考虑所需的预测粒度,并且评估应超越准确率,考虑错误的质量。