Contemporary empirical applications frequently require flexible regression models for complex response types and large tabular or non-tabular, including image or text, data. Classical regression models either break down under the computational load of processing such data or require additional manual feature extraction to make these problems tractable. Here, we present deeptrafo, a package for fitting flexible regression models for conditional distributions using a tensorflow backend with numerous additional processors, such as neural networks, penalties, and smoothing splines. Package deeptrafo implements deep conditional transformation models (DCTMs) for binary, ordinal, count, survival, continuous, and time series responses, potentially with uninformative censoring. Unlike other available methods, DCTMs do not assume a parametric family of distributions for the response. Further, the data analyst may trade off interpretability and flexibility by supplying custom neural network architectures and smoothers for each term in an intuitive formula interface. We demonstrate how to set up, fit, and work with DCTMs for several response types. We further showcase how to construct ensembles of these models, evaluate models using inbuilt cross-validation, and use other convenience functions for DCTMs in several applications. Lastly, we discuss DCTMs in light of other approaches to regression with non-tabular data.
翻译:当代实证应用通常需要针对复杂响应类型及大规模表格或非表格(包括图像或文本)数据的灵活回归模型。经典回归模型在处理此类数据时,要么因计算负荷过大而失效,要么需要额外的人工特征提取才能使问题可解。本文介绍deeptrafo包,该包以TensorFlow为后端,集成神经网络、惩罚项及平滑样条等多种附加处理器,用于拟合条件分布的灵活回归模型。deeptrafo包实现了适用于二值、有序、计数、生存、连续及时间序列响应的深度条件转换模型(DCTMs),并可处理潜在的无信息删失。与其他现有方法不同,DCTMs不预设响应数据的参数分布族。此外,数据分析师可通过直观的公式接口为每个项自定义神经网络架构和平滑器,在可解释性与灵活性之间进行权衡。本文演示了如何针对多种响应类型设置、拟合及运用DCTMs,进一步展示了如何构建这些模型的集成、利用内置交叉验证评估模型,以及使用DCTMs的其他便捷函数。最后,本文讨论了DCTMs相较于其他非表格数据回归方法的特性。