We propose the use of a conjecturing machine that suggests feature relationships in the form of bounds involving nonlinear terms for numerical features and boolean expressions for categorical features. The proposed Conjecturing framework recovers known nonlinear and boolean relationships among features from data. In both settings, true underlying relationships are revealed. We then compare the method to a previously-proposed framework for symbolic regression on the ability to recover equations that are satisfied among features in a dataset. The framework is then applied to patient-level data regarding COVID-19 outcomes to suggest possible risk factors that are confirmed in the medical literature.
翻译:我们提出使用一种猜想机器,以涉及数值特征非线性项的不等式形式以及分类特征布尔表达式的形式,建议特征间的关联。所提出的猜想框架可从数据中恢复已知的非线性及布尔特征关系。在两种设定下,均揭示了真实的潜在关系。随后,我们将该方法与先前提出的符号回归框架进行对比,评估其从数据集中恢复满足特征间等式的能力。最后,将该框架应用于COVID-19患者预后数据,提出经医学文献证实的可能风险因素。