This study applies BERTopic, a transformer-based topic modeling technique, to the lmsys-chat-1m dataset, a multilingual conversational corpus built from head-to-head evaluations of large language models (LLMs). Each user prompt is paired with two anonymized LLM responses and a human preference label, used to assess user evaluation of competing model outputs. The main objective is uncovering thematic patterns in these conversations and examining their relation to user preferences, particularly if certain LLMs are consistently preferred within specific topics. A robust preprocessing pipeline was designed for multilingual variation, balancing dialogue turns, and cleaning noisy or redacted data. BERTopic extracted over 29 coherent topics including artificial intelligence, programming, ethics, and cloud infrastructure. We analysed relationships between topics and model preferences to identify trends in model-topic alignment. Visualization techniques included inter-topic distance maps, topic probability distributions, and model-versus-topic matrices. Our findings inform domain-specific fine-tuning and optimization strategies for improving real-world LLM performance and user satisfaction.
翻译:本研究将基于Transformer的主题建模技术BERTopic应用于lmsys-chat-1m数据集——一个通过大语言模型(LLMs)对抗性评估构建的多语言对话语料库。每个用户提示均与两个匿名化的大语言模型回复及一个人工偏好标签配对,用于评估用户对竞争模型输出的评判。主要研究目标是揭示这些对话中的主题模式,并探究其与用户偏好的关联,特别是检验特定大语言模型是否在特定主题中持续获得偏好。我们设计了鲁棒的多语言数据预处理流程,以处理语言变体、平衡对话轮次并清理噪声或脱敏数据。BERTopic技术提取出超过29个连贯主题,涵盖人工智能、编程、伦理和云基础设施等领域。通过分析主题与模型偏好的关联,我们识别出模型-主题匹配度的趋势模式。可视化方法包括主题间距离映射图、主题概率分布图以及模型-主题关联矩阵。本研究结论可为领域特异性微调与优化策略提供参考,以提升大语言模型在真实场景中的性能表现与用户满意度。