Employees often struggle to identify ``who knows what,'' leading to organizational productivity losses. We investigate whether Large Language Models (LLMs) can infer individual domain knowledge directly from long-term Slack logs. Analyzing 27,188 messages from 43 users, we evaluated seven models (including Gemini, Claude, and GPT families) by comparing their zero-shot estimates against self-reported skill ratings from 27 participants. Gemini 2.5 Flash achieved the lowest error (MAE 21.13%), while GPT models showed significantly larger discrepancies. Notably, estimation accuracy depended only weakly on message volume, indicating that more text alone does not guarantee better inference. These findings demonstrate the feasibility and current limits of automated expertise mapping, highlighting the need for privacy-preserving deployments and richer, structure-aware representations of human knowledge.
翻译:员工常难以识别“谁了解什么”,导致组织生产率下降。本研究探究大型语言模型能否直接从长期Slack日志推断个体领域知识。通过分析43名用户共27,188条消息,我们评估了七种模型(包括Gemini、Claude及GPT系列),将其零样本估计与27名参与者的自评技能等级进行对比。Gemini 2.5 Flash取得最低误差(MAE 21.13%),而GPT模型显示出显著较大的偏差。值得注意的是,估计准确率对消息量的依赖度较弱,表明更多文本并不必然带来更优推断。这些发现揭示了自动化专长映射的可行性与当前局限,强调了隐私保护部署及对人类知识进行更丰富、结构化表示的必要性。