Combining discrete and continuous data is an important capability for generative models. We present Discrete Flow Models (DFMs), a new flow-based model of discrete data that provides the missing link in enabling flow-based generative models to be applied to multimodal continuous and discrete data problems. Our key insight is that the discrete equivalent of continuous space flow matching can be realized using Continuous Time Markov Chains. DFMs benefit from a simple derivation that includes discrete diffusion models as a specific instance while allowing improved performance over existing diffusion-based approaches. We utilize our DFMs method to build a multimodal flow-based modeling framework. We apply this capability to the task of protein co-design, wherein we learn a model for jointly generating protein structure and sequence. Our approach achieves state-of-the-art co-design performance while allowing the same multimodal model to be used for flexible generation of the sequence or structure.
翻译:结合离散与连续数据是生成模型的重要能力。我们提出离散流模型(DFMs),这是一种基于离散数据的新型流模型,为将基于流的生成模型应用于多模态连续与离散数据问题提供了关键缺失环节。我们的核心见解是:连续空间流匹配的离散等价形式可通过连续时间马尔可夫链实现。DFMs具有简洁的推导过程,其将离散扩散模型作为特例包含在内,同时相较于现有基于扩散的方法实现了性能提升。我们利用DFMs方法构建了多模态流建模框架,并将其应用于蛋白质协同设计任务——学习联合生成蛋白质结构与序列的模型。该方法在实现最先进的协同设计性能的同时,允许使用同一多模态模型灵活生成蛋白质序列或结构。