Bayesian Feature Selection in Joint Quantile Time Series Analysis

Quantile feature selection over correlated multivariate time series data has always been a methodological challenge and is an open problem. In this paper, we propose a general Bayesian dimension reduction methodology for feature selection in high-dimensional joint quantile time series analysis, under the name of the quantile feature selection time series (QFSTS) model. The QFSTS model is a general structural time series model, where each component yields an additive contribution to the time series modeling with direct interpretations. Its flexibility is compound in the sense that users can add/deduct components for each time series and each time series can have its own specific valued components of different sizes. Feature selection is conducted in the quantile regression component, where each time series has its own pool of contemporaneous external predictors allowing nowcasting. Bayesian methodology in extending feature selection to the quantile time series research area is developed using multivariate asymmetric Laplace distribution, spike-and-slab prior setup, the Metropolis-Hastings algorithm, and the Bayesian model averaging technique, all implemented consistently in the Bayesian paradigm. The QFSTS model requires small datasets to train and converges fast. Extensive examinations confirmed that the QFSTS model has superior performance in feature selection, parameter estimation, and forecast.

翻译：相关多变量时间序列数据上的分位数特征选择一直是方法论上的挑战，也是一个未解决的问题。本文提出了一种通用的贝叶斯降维方法，用于高维联合分位数时间序列分析中的特征选择，称之为分位数特征选择时间序列（QFSTS）模型。QFSTS模型是一种通用的结构性时间序列模型，其中每个分量对时间序列建模产生可解释的加性贡献。其灵活性体现在：用户可以为每个时间序列添加或删减分量，且每个时间序列可拥有不同大小的特定值分量。特征选择在分位数回归分量中进行，每个时间序列拥有各自的同期外部预测因子池，支持临近预报。我们利用多元非对称拉普拉斯分布、spike-and-slab先验设置、Metropolis-Hastings算法和贝叶斯模型平均技术，在贝叶斯范式下一致地实现了将特征选择扩展到分位数时间序列研究领域的贝叶斯方法。QFSTS模型训练所需数据集较小且收敛速度快。大量实验证实，该模型在特征选择、参数估计和预测方面均展现出优越性能。

相关内容

特征选择

关注 5940

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日