Time series of counts occurring in various applications are often overdispersed, meaning their variance is much larger than the mean. This paper proposes a novel variable selection approach for processing such data. Our approach consists in modelling them using sparse negative binomial GLARMA models. It combines estimating the autoregressive moving average (ARMA) coefficients of GLARMA models and the overdispersion parameter with performing variable selection in regression coefficients of Generalized Linear Models (GLM) with regularised methods. We describe our three-step estimation procedure, which is implemented in the NBtsVarSel package. We evaluate the performance of the approach on synthetic data and compare it to other methods. Additionally, we apply our approach to RNA sequencing data. Our approach is computationally efficient and outperforms other methods in selecting variables, i.e. recovering the non-null regression coefficients.
翻译:各种应用中出现的计数时间序列通常存在过度离散现象,即其方差远大于均值。本文提出一种新颖的变量选择方法用于处理此类数据。该方法采用稀疏负二项GLARMA模型对数据进行建模,将GLARMA模型自回归移动平均(ARMA)系数与过度离散参数的估计,与采用正则化方法进行广义线性模型(GLM)回归系数的变量选择相结合。我们描述了分三步实现的估计过程,并已将其封装在NBtsVarSel软件包中。通过合成数据评估了该方法的性能,并将其与其他方法进行了比较。此外,我们将该方法应用于RNA测序数据。实验表明,该方法计算高效,且在变量选择(即恢复非零回归系数)方面优于其他方法。