The increasing frequency and complexity of regulatory updates present a significant burden for multinational pharmaceutical companies. Compliance teams must interpret evolving rules across jurisdictions, formats, and agencies, often manually, at high cost and risk of error. We introduce RegGuard, an industrial-scale AI assistant designed to automate the interpretation of heterogeneous regulatory texts and align them with internal corporate policies. The system ingests heterogeneous document sources through a secure pipeline and enhances retrieval and generation quality with two novel components: HiSACC (Hierarchical Semantic Aggregation for Contextual Chunking) semantically segments long documents into coherent units while maintaining consistency across non-contiguous sections. ReLACE (Regulatory Listwise Adaptive Cross-Encoder for Reranking), a domain-adapted cross-encoder built on an open-source model, jointly models user queries and retrieved candidates to improve ranking relevance. Evaluations in enterprise settings demonstrate that RegGuard improves answer quality specifically in terms of relevance, groundedness, and contextual focus, while significantly mitigating hallucination risk. The system architecture is built for auditability and traceability, featuring provenance tracking, access control, and incremental indexing, making it highly responsive to evolving document sources and relevant for any domain with stringent compliance demands.
翻译:监管更新日益频繁且复杂,为跨国制药企业带来了沉重负担。合规团队通常需要以高昂成本和错误风险,人工解读不同司法管辖区、格式和机构的动态变化规则。本文介绍RegGuard,一个工业级AI助手,旨在自动化解析异构监管文本并将其与企业内部政策对齐。该系统通过安全管道摄取异构文档源,并借助两个创新组件提升检索与生成质量:HiSACC(面向上下文分块的层次化语义聚合)将长文档语义分割为连贯单元,同时保持非连续章节间的一致性;ReLACE(面向重排序的监管列表式自适应交叉编码器)基于开源模型构建的领域自适应交叉编码器,联合建模用户查询与检索候选,以提升排序相关性。企业环境评估表明,RegGuard在相关性、事实依据性和上下文聚焦性方面显著提升答案质量,同时大幅降低幻觉风险。该系统架构专为可审计性与可追溯性设计,具备溯源追踪、访问控制和增量索引功能,使其能够灵敏响应动态变化的文档源,适用于任何具有严格合规需求的领域。