Generative models can produce synthetic patient records for analytical tasks when real data is unavailable or limited. However, current methods struggle with adhering to domain-specific knowledge and removing invalid data. We present ConSequence, an effective approach to integrating domain knowledge into sequential generative neural network outputs. Our rule-based formulation includes temporal aggregation and antecedent evaluation modules, ensured by an efficient matrix multiplication formulation, to satisfy hard and soft logical constraints across time steps. Existing constraint methods often fail to guarantee constraint satisfaction, lack the ability to handle temporal constraints, and hinder the learning and computational efficiency of the model. In contrast, our approach efficiently handles all types of constraints with guaranteed logical coherence. We demonstrate ConSequence's effectiveness in generating electronic health records, outperforming competitors in achieving complete temporal and spatial constraint satisfaction without compromising runtime performance or generative quality. Specifically, ConSequence successfully prevents all rule violations while improving the model quality in reducing its test perplexity by 5% and incurring less than a 13% slowdown in generation speed compared to an unconstrained model.
翻译:摘要:当真实数据不可用或有限时,生成模型可合成用于分析任务的虚拟患者记录。然而,现有方法在遵循领域特定知识和移除无效数据方面存在困难。我们提出ConSequence,一种将领域知识有效整合到序列生成神经网络输出的方法。该基于规则的框架包含时间聚合与前件评估模块,通过高效的矩阵乘法形式确保跨时间步长满足硬性及软性逻辑约束。现有约束方法常无法保证约束满足性、缺乏处理时间约束的能力,且阻碍模型的学习与计算效率。相比之下,我们的方法能高效处理所有类型约束,同时保证逻辑连贯性。我们通过电子健康记录生成任务验证了ConSequence的有效性:在实现完全时间与空间约束满足的同时,不损害运行性能或生成质量。具体而言,ConSequence成功杜绝了所有规则违反行为,并将模型测试困惑度降低5%,相比无约束模型,生成速度仅下降不到13%。