From Classification to Ranking: Enhancing LLM Reasoning Capabilities for MBTI Personality Detection

Personality detection aims to measure an individual's corresponding personality traits through their social media posts. The advancements in Large Language Models (LLMs) offer novel perspectives for personality detection tasks. Existing approaches enhance personality trait analysis by leveraging LLMs to extract semantic information from textual posts as prompts, followed by training classifiers for categorization. However, accurately classifying personality traits remains challenging due to the inherent complexity of human personality and subtle inter-trait distinctions. Moreover, prompt-based methods often exhibit excessive dependency on expert-crafted knowledge without autonomous pattern-learning capacity. To address these limitations, we view personality detection as a ranking task rather than a classification and propose a corresponding reinforcement learning training paradigm. First, we employ supervised fine-tuning (SFT) to establish personality trait ranking capabilities while enforcing standardized output formats, creating a robust initialization. Subsequently, we introduce Group Relative Policy Optimization (GRPO) with a specialized ranking-based reward function. Unlike verification tasks with definitive solutions, personality assessment involves subjective interpretations and blurred boundaries between trait categories. Our reward function explicitly addresses this challenge by training LLMs to learn optimal answer rankings. Comprehensive experiments have demonstrated that our method achieves state-of-the-art performance across multiple personality detection benchmarks.

翻译：人格检测旨在通过个体社交媒体帖子测量其对应的人格特质。大型语言模型（LLM）的发展为人格检测任务提供了新的视角。现有方法通过利用LLM从文本帖子中提取语义信息作为提示，随后训练分类器进行归类，以增强人格特质分析。然而，由于人格固有的复杂性及特质间微妙的差异，准确分类人格特质仍具挑战性。此外，基于提示的方法常过度依赖专家构建的知识，缺乏自主模式学习能力。为应对这些局限，我们将人格检测视为排序任务而非分类任务，并提出相应的强化学习训练范式。首先，我们采用监督微调（SFT）建立人格特质排序能力，同时强制标准化输出格式，以创建稳健的初始化模型。随后，我们引入基于群组相对策略优化（GRPO）的专用排序奖励函数。与具有确定解的验证任务不同，人格评估涉及主观解释及特质类别间的模糊边界。我们的奖励函数通过训练LLM学习最优答案排序，明确应对这一挑战。综合实验表明，我们的方法在多项人格检测基准测试中实现了最先进的性能。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

基于大语言模型（LLM）的智能体推理框架：从方法到场景的综述

专知会员服务

53+阅读 · 2025年8月26日

142页DeepSeek-R1 思维链技术：让我们一起<思考>大语言模型（LLM）的推理能力

专知会员服务

48+阅读 · 2025年4月12日

LLM后训练：深入探讨推理大语言模型

专知会员服务

40+阅读 · 2025年3月3日

迈向大型推理模型：基于大型语言模型的强化推理综述

专知会员服务

50+阅读 · 2025年1月17日