Tax code prediction is a crucial yet underexplored task in automating invoicing and compliance management for large-scale e-commerce platforms. Each product must be accurately mapped to a node within a multi-level taxonomic hierarchy defined by national standards, where errors lead to financial inconsistencies and regulatory risks. This paper presents Taxon, a semantically aligned and expert-guided framework for hierarchical tax code prediction. Taxon integrates (i) a feature-gating mixture-of-experts architecture that adaptively routes multi-modal features across taxonomy levels, and (ii) a semantic consistency model distilled from large language models acting as domain experts to verify alignment between product titles and official tax definitions. To address noisy supervision in real business records, we design a multi-source training pipeline that combines curated tax databases, invoice validation logs, and merchant registration data to provide both structural and semantic supervision. Extensive experiments on the proprietary TaxCode dataset and public benchmarks demonstrate that Taxon achieves state-of-the-art performance, outperforming strong baselines. Further, an additional full hierarchical paths reconstruction procedure significantly improves structural consistency, yielding the highest overall F1 scores. Taxon has been deployed in production within Alibaba's tax service system, handling an average of over 500,000 tax code queries per day and reaching peak volumes above five million requests during business event with improved accuracy, interpretability, and robustness.
翻译:税码预测是大型电子商务平台实现发票开具与合规管理自动化的关键任务,但目前研究尚不充分。每个商品必须准确映射至国家标准定义的多层级分类体系中的节点,任何错误均会导致财务不一致与监管风险。本文提出Taxon——一种基于语义对齐与专家指导的层次化税码预测框架。Taxon整合了(i)特征门控专家混合架构,可自适应地在分类层级间路由多模态特征;(ii)从大型语言模型蒸馏的语义一致性模型,该模型作为领域专家验证商品标题与官方税目定义的对齐关系。针对实际业务记录中的噪声监督问题,我们设计了多源训练流程,整合了精校税目数据库、发票验证日志及商户注册数据,以同时提供结构与语义监督。在自有的TaxCode数据集及公开基准上的大量实验表明,Taxon实现了最先进的性能,显著优于现有基线方法。此外,补充的全层次路径重构程序大幅提升了结构一致性,获得了最高的综合F1分数。Taxon已在阿里巴巴税务服务系统中投入生产应用,日均处理超过50万次税码查询,在业务高峰期单日请求量超五百万次,并在准确性、可解释性与鲁棒性方面均获得提升。