Hierarchical multi-label text classification aims to classify the input text into multiple labels, among which the labels are structured and hierarchical. It is a vital task in many real world applications, e.g. scientific literature archiving. In this paper, we survey the recent progress of hierarchical multi-label text classification, including the open sourced data sets, the main methods, evaluation metrics, learning strategies and the current challenges. A few future research directions are also listed for community to further improve this field.
翻译:层级多标签文本分类旨在将输入文本分类为多个标签,其中这些标签具有结构化和层级化的关系。这一任务在诸多实际应用中至关重要,例如科学文献归档。本文综述了层级多标签文本分类的最新进展,包括开源数据集、主要方法、评估指标、学习策略以及当前面临的挑战。同时,为促进该领域的进一步发展,还列出了一些未来研究方向。