We propose InsightNet, a novel approach for the automated extraction of structured insights from customer reviews. Our end-to-end machine learning framework is designed to overcome the limitations of current solutions, including the absence of structure for identified topics, non-standard aspect names, and lack of abundant training data. The proposed solution builds a semi-supervised multi-level taxonomy from raw reviews, a semantic similarity heuristic approach to generate labelled data and employs a multi-task insight extraction architecture by fine-tuning an LLM. InsightNet identifies granular actionable topics with customer sentiments and verbatim for each topic. Evaluations on real-world customer review data show that InsightNet performs better than existing solutions in terms of structure, hierarchy and completeness. We empirically demonstrate that InsightNet outperforms the current state-of-the-art methods in multi-label topic classification, achieving an F1 score of 0.85, which is an improvement of 11% F1-score over the previous best results. Additionally, InsightNet generalises well for unseen aspects and suggests new topics to be added to the taxonomy.
翻译:我们提出InsightNet,一种从客户评论中自动提取结构化洞见的新方法。我们的端到端机器学习框架旨在克服现有解决方案的局限性,包括识别主题缺乏结构化、方面命名不规范以及训练数据不足等问题。该方案通过原始评论构建半监督多层级分类体系,采用语义相似度启发式方法生成标注数据,并利用微调大语言模型(LLM)实现多任务洞见提取架构。InsightNet能够识别细粒度可操作主题,并附带每个主题对应的客户情感及原始表述。对真实客户评论数据的评估表明,InsightNet在结构、层级和完整性方面均优于现有解决方案。我们通过实验证明,InsightNet在多标签主题分类任务中超越当前最先进方法,F1分数达到0.85,较此前最优结果提升11%。此外,InsightNet对未见方面具有良好的泛化能力,并能建议需新增至分类体系的新主题。