A novel framework for designing the molecular structure of chemical compounds with a desired chemical property has recently been proposed. The framework infers a desired chemical graph by solving a mixed integer linear program (MILP) that simulates the computation process of a feature function defined by a two-layered model on chemical graphs and a prediction function constructed by a machine learning method. To improve the learning performance of prediction functions in the framework, we design a method that splits a given data set $\mathcal{C}$ into two subsets $\mathcal{C}^{(i)},i=1,2$ by a hyperplane in a chemical space so that most compounds in the first (resp., second) subset have observed values lower (resp., higher) than a threshold $\theta$. We construct a prediction function $\psi$ to the data set $\mathcal{C}$ by combining prediction functions $\psi_i,i=1,2$ each of which is constructed on $\mathcal{C}^{(i)}$ independently. The results of our computational experiments suggest that the proposed method improved the learning performance for several chemical properties to which a good prediction function has been difficult to construct.
翻译:近期提出了一种新型框架,用于设计具有特定化学性质的化合物分子结构。该框架通过求解模拟化学图双层模型特征函数计算过程及机器学习方法构建的预测函数的混合整数线性规划(MILP),推导出目标化学图。为提升框架中预测函数的学习性能,我们设计了一种方法:在化学空间中使用超平面将给定数据集$\mathcal{C}$划分为两个子集$\mathcal{C}^{(i)},i=1,2$,使得第一(第二)子集中大部分化合物的观测值低于(高于)阈值$\theta$。通过独立构建于$\mathcal{C}^{(i)}$的预测函数$\psi_i,i=1,2$进行组合,得到针对数据集$\mathcal{C}$的预测函数$\psi$。计算实验结果表明,对于多个难以构建优质预测函数的化学性质,该方法有效提升了学习性能。