Combining discrete and continuous data is an important capability for generative models. We present Discrete Flow Models (DFMs), a new flow-based model of discrete data that provides the missing link in enabling flow-based generative models to be applied to multimodal continuous and discrete data problems. Our key insight is that the discrete equivalent of continuous space flow matching can be realized using Continuous Time Markov Chains. DFMs benefit from a simple derivation that includes discrete diffusion models as a specific instance while allowing improved performance over existing diffusion-based approaches. We utilize our DFMs method to build a multimodal flow-based modeling framework. We apply this capability to the task of protein co-design, wherein we learn a model for jointly generating protein structure and sequence. Our approach achieves state-of-the-art co-design performance while allowing the same multimodal model to be used for flexible generation of the sequence or structure.
翻译:结合离散与连续数据是生成模型的一项重要能力。本文提出离散流模型(DFMs),这是一种基于流的离散数据新型模型,为基于流的生成模型应用于多模态连续与离散数据问题提供了关键连接。我们的核心见解在于,连续空间流匹配的离散等价形式可以通过连续时间马尔可夫链实现。DFMs具有简洁的推导过程,其中离散扩散模型可作为特例被包含,同时在性能上超越了现有基于扩散的方法。我们利用DFMs方法构建了多模态基于流的建模框架,并将该能力应用于蛋白质协同设计任务——通过学习联合生成蛋白质结构与序列的模型。我们的方法在实现最先进协同设计性能的同时,允许使用同一多模态模型灵活生成序列或结构。