This paper describes an approach to simultaneously identify clusters and estimate cluster-specific regression parameters from the given data. Such an approach can be useful in learning the relationship between input and output when the regression parameters for estimating output are different in different regions of the input space. Variational Inference (VI), a machine learning approach to obtain posterior probability densities using optimization techniques, is used to identify clusters of explanatory variables and regression parameters for each cluster. From these results, one can obtain both the expected value and the full distribution of predicted output. Other advantages of the proposed approach include the elegant theoretical solution and clear interpretability of results. The proposed approach is well-suited for financial forecasting where markets have different regimes (or clusters) with different patterns and correlations of market changes in each regime. In financial applications, knowledge about such clusters can provide useful insights about portfolio performance and identify the relative importance of variables in different market regimes. An illustrative example of predicting one-day S&P change is considered to illustrate the approach and compare the performance of the proposed approach with standard regression without clusters. Due to the broad applicability of the problem, its elegant theoretical solution, and the computational efficiency of the proposed algorithm, the approach may be useful in a number of areas extending beyond the financial domain.
翻译:本文提出了一种从给定数据中同时识别聚类并估计各聚类特定回归参数的方法。当估计输出的回归参数在输入空间的不同区域存在差异时,该方法有助于学习输入与输出之间的关系。变分推断(Variational Inference, VI)是一种利用优化技术获取后验概率密度的机器学习方法,被用于识别解释变量的聚类及各聚类的回归参数。基于这些结果,既可获得预测输出的期望值,也可得到其完整分布。该方法的其他优势包括优雅的理论解与清晰的结果可解释性。所提方法特别适用于金融市场存在不同体制(或聚类)、且各体制内市场变化模式与相关性各异的金融预测场景。在金融应用中,关于此类聚类的认知可提供有关投资组合绩效的有用见解,并识别不同市场体制中变量的相对重要性。本文以标准普尔指数单日变动预测为例,展示了该方法并与不包含聚类的标准回归进行了性能对比。鉴于该问题的广泛适用性、优雅的理论解以及所提算法的计算效率,此方法可能在金融领域之外的多个场景具有应用价值。