Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs

Unsupervised methods are widely used to induce latent semantic structure from large text collections, yet their outputs often contain incoherent, redundant, or poorly grounded clusters that are difficult to validate without labeled data. We propose a reasoning-based refinement framework that leverages large language models (LLMs) not as embedding generators, but as semantic judges that validate and restructure the outputs of arbitrary unsupervised clustering algorithms. Our framework introduces three reasoning stages: (i) coherence verification, where LLMs assess whether cluster summaries are supported by their member texts; (ii) redundancy adjudication, where candidate clusters are merged or rejected based on semantic overlap; and (iii) label grounding, where clusters are assigned interpretable labels through a two-stage process that generates and consolidates semantically similar labels in a fully unsupervised manner. This design decouples representation learning from structural validation and mitigates the common failure modes of embedding-only approaches. We evaluate the framework in real-world social media corpora from two platforms with distinct interaction models, demonstrating consistent improvements in cluster coherence and human-aligned labeling quality over classical topic models and recent representation-based baselines. Human evaluation shows strong agreement with LLM-generated labels, despite the absence of gold-standard annotations. We further conduct robustness analysis under matched temporal and volume conditions to assess cross-platform stability. Beyond empirical gains, our results suggest that LLM-based reasoning can serve as a general mechanism for validating and refining unsupervised semantic structure, enabling more reliable and interpretable analysis of large text collections without supervision.

翻译：无监督方法被广泛用于从大规模文本集合中归纳潜在语义结构，但其输出常包含内容不连贯、冗余或缺乏依据的聚类，且难以在无标注数据条件下验证。我们提出一种基于推理的优化框架，该框架将大语言模型（LLM）作为语义判别器而非嵌入生成器，用于验证和重构任意无监督聚类算法的输出结果。该框架引入三个推理阶段：（i）连贯性验证，由LLM评估聚类摘要是否被其成员文本所支撑；（ii）冗余判定，基于语义重叠度对候选聚类进行合并或剔除；（iii）标签归因，通过两阶段流程为聚类分配可解释标签——首先生成标签，继而以完全无监督方式整合语义相近的标签。这种设计将表征学习与结构验证解耦，有效缓解了仅依赖嵌入方法常见的失效模式。我们在具有不同交互模型的两个社交媒体平台真实语料上进行评估，结果显示该方法在聚类连贯性和符合人类认知的标签质量上，相较于经典主题模型和近期基于表征的基线方法均有持续提升。尽管缺乏黄金标准标注，人工评估表明LLM生成的标签与人类判断高度一致。我们进一步在匹配时间跨度和数据规模的条件下进行鲁棒性分析，以评估跨平台稳定性。除实证改进外，研究结果表明基于LLM的推理可作为验证和优化无监督语义结构的通用机制，使大规模文本集合在无监督条件下实现更可靠、更可解释的分析。