Large Language Models have found application in various mundane and repetitive tasks including Human Resource (HR) support. We worked with the domain experts of SAP SE to develop an HR support chatbot as an efficient and effective tool for addressing employee inquiries. We inserted a human-in-the-loop in various parts of the development cycles such as dataset collection, prompt optimization, and evaluation of generated output. By enhancing the LLM-driven chatbot's response quality and exploring alternative retrieval methods, we have created an efficient, scalable, and flexible tool for HR professionals to address employee inquiries effectively. Our experiments and evaluation conclude that GPT-4 outperforms other models and can overcome inconsistencies in data through internal reasoning capabilities. Additionally, through expert analysis, we infer that reference-free evaluation metrics such as G-Eval and Prometheus demonstrate reliability closely aligned with that of human evaluation.
翻译:大型语言模型已广泛应用于包括人力资源支持在内的各种日常重复性任务。我们与SAP SE的领域专家合作,开发了一款人力资源支持聊天机器人,作为高效处理员工咨询的有效工具。我们在开发周期的多个环节引入了人机协同机制,包括数据集收集、提示优化以及生成输出的评估。通过提升LLM驱动聊天机器人的响应质量并探索替代检索方法,我们为人力资源专业人员创建了一个高效、可扩展且灵活的工具,以有效应对员工咨询。实验与评估结果表明,GPT-4在性能上优于其他模型,并能通过内部推理能力克服数据不一致性问题。此外,通过专家分析,我们推断无参考评估指标(如G-Eval和Prometheus)展现出的可靠性与人机评估高度吻合。