Hierarchical text classification (HTC) depends on taxonomies that organize labels into structured hierarchies. However, many real-world taxonomies introduce ambiguities, such as identical leaf names under similar parent nodes, which prevent language models (LMs) from learning clear decision boundaries. In this paper, we present TaxMorph, a framework that uses large language models (LLMs) to transform entire taxonomies through operations such as renaming, merging, splitting, and reordering. Unlike prior work, our method revises the full hierarchy to better match the semantics encoded by LMs. Experiments across three HTC benchmarks show that LLM-refined taxonomies consistently outperform human-curated ones in various settings up to +2.9pp. in F1. To better understand these improvements, we compare how well LMs can assign leaf nodes to parent nodes and vice versa across human-curated and LLM-refined taxonomies. We find that human-curated taxonomies lead to more easily separable clusters in embedding space. However, the LLM-refined taxonomies align more closely with the model's actual confusion patterns during classification. In other words, even though they are harder to separate, they better reflect the model's inductive biases. These findings suggest that LLM-guided refinement creates taxonomies that are more compatible with how models learn, improving HTC performance.
翻译:层次化文本分类(HTC)依赖于将标签组织成结构化层级体系的分类体系。然而,现实中的许多分类体系存在歧义,例如相似父节点下出现相同的叶子节点名称,这阻碍了语言模型(LMs)学习清晰的决策边界。本文提出TaxMorph框架,该框架利用大语言模型(LLMs)通过重命名、合并、拆分和重排序等操作对整个分类体系进行重构。与先前工作不同,我们的方法通过修订完整层级结构,使其更贴合语言模型所编码的语义。在三个HTC基准测试上的实验表明,经LLM优化的分类体系在不同设置下始终优于人工构建的分类体系,F1分数最高可提升+2.9个百分点。为深入理解这些改进,我们比较了语言模型在人工构建与LLM优化两种分类体系下,将叶子节点分配到父节点(及反向分配)的能力。研究发现,人工构建的分类体系在嵌入空间中形成更易分离的聚类簇,而LLM优化的分类体系则与模型在分类过程中的实际混淆模式更为契合。换言之,尽管后者在分离难度上更高,但能更好地反映模型的归纳偏好。这些发现表明,LLM引导的优化过程所构建的分类体系更符合模型的学习机制,从而提升了HTC的性能。