Large language models (LLMs) are increasingly used to create content in regulated domains such as pharmaceuticals, where outputs must be scientifically accurate and legally compliant. Manual quality control (QC) is slow, error prone, and can become a publication bottleneck. We introduce LRBTC, a modular LLM and vision language model (VLM) driven QC architecture covering Language, Regulatory, Brand, Technical, and Content Structure checks. LRBTC combines a Student-Teacher dual model architecture, human in the loop (HITL) workflow with waterfall rule filtering to enable scalable, verifiable content validation and optimization. On AIReg-Bench, our approach achieves 83.0% F1 and 97.5% recall, reducing missed violations by 5x compared with Gemini 2.5 Pro. On CSpelling, it improves mean accuracy by 26.7%. Error analysis further reveals that while current models are strong at detecting misspellings (92.5 recall), they fail to identify complex medical grammatical (25.0 recall) and punctuation (41.7 recall) errors, highlighting a key area for future work. This work provides a practical, plug and play solution for reliable, transparent quality control of content in high stakes, compliance critical industries. We also provide access to our Demo under MIT Licenses.
翻译:大型语言模型(LLM)在制药等受监管领域的内容生成中应用日益广泛,其输出必须确保科学准确性与法规合规性。人工质量控制(QC)流程存在效率低、易出错的问题,可能成为内容发布的瓶颈。本文提出LRBTC——一个涵盖语言、法规、品牌、技术与内容结构检查的模块化LLM与视觉语言模型(VLM)驱动质量控制架构。LRBTC融合师生双模型架构、人机协同(HITL)工作流及瀑布式规则过滤机制,实现了可扩展、可验证的内容审核与优化。在AIReg-Bench评测中,本方法取得83.0%的F1分数与97.5%的召回率,相比Gemini 2.5 Pro将违规漏检率降低至1/5。在CSpelling数据集上,平均准确率提升26.7%。错误分析进一步表明:当前模型虽能有效检测拼写错误(92.5%召回率),但在复杂医学语法错误(25.0%召回率)与标点错误(41.7%召回率)识别方面存在明显不足,这指明了未来研究的关键方向。本研究为高风险、强合规要求行业的内容质量控制提供了一种即插即用、可靠透明的实践方案。相关演示系统已依据MIT许可证开放访问。