As the volume of unstructured text continues to grow across domains, there is an urgent need for scalable methods that enable interpretable organization, summarization, and retrieval of information. This work presents a unified framework for interpretable topic modeling, zero-shot topic labeling, and topic-guided semantic retrieval over large agricultural text corpora. Leveraging BERTopic, we extract semantically coherent topics. Each topic is converted into a structured prompt, enabling a language model to generate meaningful topic labels and summaries in a zero-shot manner. Querying and document exploration are supported via dense embeddings and vector search, while a dedicated evaluation module assesses topical coherence and bias. This framework supports scalable and interpretable information access in specialized domains where labeled data is limited.
翻译:随着各领域非结构化文本数据量的持续增长,亟需可扩展的方法来实现信息的可解释组织、摘要与检索。本文提出一个统一框架,用于对大规模农业文本语料进行可解释主题建模、零样本主题标注以及主题引导的语义检索。我们借助BERTopic提取语义连贯的主题,并将每个主题转化为结构化提示,使语言模型能够以零样本方式生成有意义的主题标签与摘要。通过稠密嵌入与向量搜索支持查询与文档探索,同时利用专用评估模块分析主题连贯性与偏差。该框架为标注数据有限的特定领域提供了可扩展且可解释的信息访问方案。