Molecular datasets often suffer from a lack of data. It is well-known that gathering data is difficult due to the complexity of experimentation or simulation involved. Here, we leverage mutual information across different tasks in molecular data to address this issue. We extend an algorithm that utilizes the geometric characteristics of the encoding space, known as the Geometrically Aligned Transfer Encoder (GATE), to a multi-task setup. Thus, we connect multiple molecular tasks by aligning the curved coordinates onto locally flat coordinates, ensuring the flow of information from source tasks to support performance on target data.
翻译:分子数据集常面临数据匮乏的问题。众所周知,由于涉及实验或模拟的复杂性,数据收集十分困难。本文通过利用分子数据中不同任务间的互信息来解决这一问题。我们将一种利用编码空间几何特性的算法——几何对齐迁移编码器——扩展至多任务设置。通过将弯曲坐标对齐到局部平坦坐标,我们连接了多个分子任务,确保信息从源任务流向目标任务,从而提升目标数据的性能表现。