Deep neural networks are powerful tools for modelling non-linear patterns and are very effective when the input data is homogeneous such as images and texts. In recent years, there have been attempts to apply neural nets to heterogeneous data, such as tabular and multimodal data with mixed categories. Transformation methods, specialised architectures such as hybrid models, and regularisation models are three approaches to applying neural nets to this type of data. In this study, first, we apply K-modes clustering algorithm to define different levels of disability based on responses related to mobility impairments, difficulty in performing Activities of Daily Livings (ADLs), and Instrumental Activities of Daily Livings (IADLs). We consider three cases, namely binary, 3-level, and 4-level disability. We then try Wide & Deep, TabTransformer, and TabNet models to predict these levels using socio-demographic, health, and lifestyle factors. We show that all models predict different levels of disability reasonably well with TabNet outperforming other models in the case of binary disability and in terms of 4 metrics. We also find that factors such as urinary incontinence, ever smoking, exercise, and education are important features selected by TabNet that affect disability.
翻译:深度神经网络是建模非线性模式的强大工具,在图像和文本等输入数据同质化的场景下表现尤为出色。近年来,研究者尝试将神经网络应用于异构数据(如混合类别的表格数据与多模态数据)。变换方法、专用架构(如混合模型)和正则化模型是三类将神经网络应用于此类数据的技术路径。本研究首先采用K-modes聚类算法,基于行动障碍、日常生活活动能力及工具性日常生活活动能力相关的响应数据,定义不同残疾水平。我们设置三种分类场景:二分类、三级分类和四级分类。继而应用Wide & Deep、TabTransformer和TabNet模型,结合社会人口学、健康及生活方式因素进行残疾水平预测。结果表明,所有模型均能合理预测不同残疾水平,其中TabNet在二分类场景及四项评价指标上表现最优。研究还发现,TabNet筛选出的重要特征(如尿失禁、吸烟史、运动习惯及教育程度)对残疾水平具有显著影响。