Change point detection is a commonly used technique in time series analysis, capturing the dynamic nature in which many real-world processes function. With the ever increasing troves of multivariate high-dimensional time series data, especially in neuroimaging and finance, there is a clear need for scalable and data-driven change point detection methods. Currently, change point detection methods for multivariate high-dimensional data are scarce, with even less available in high-level, easily accessible software packages. To this end, we introduce the R package fabisearch, available on the Comprehensive R Archive Network (CRAN), which implements the factorized binary search (FaBiSearch) methodology. FaBiSearch is a novel statistical method for detecting change points in the network structure of multivariate high-dimensional time series which employs non-negative matrix factorization (NMF), an unsupervised dimension reduction and clustering technique. Given the high computational cost of NMF, we implement the method in C++ code and use parallelization to reduce computation time. Further, we also utilize a new binary search algorithm to efficiently identify multiple change points and provide a new method for network estimation for data between change points. We show the functionality of the package and the practicality of the method by applying it to a neuroimaging and a finance data set. Lastly, we provide an interactive, 3-dimensional, brain-specific network visualization capability in a flexible, stand-alone function. This function can be conveniently used with any node coordinate atlas, and nodes can be color coded according to community membership (if applicable). The output is an elegantly displayed network laid over a cortical surface, which can be rotated in the 3-dimensional space.
翻译:变点检测是时间序列分析中常用的技术,能够捕捉许多现实世界过程运作的动态特性。随着神经影像学和金融等领域中多元高维时间序列数据日益增多,对可扩展且数据驱动的变点检测方法的需求愈发明确。目前,针对多元高维数据的变点检测方法较为稀缺,而在高级、易于访问的软件包中可用的方法则更少。为此,我们介绍了R语言包fabisearch(可从综合R档案网络CRAN获取),该包实现了因子化二分搜索(FaBiSearch)方法。FaBiSearch是一种新颖的统计方法,用于检测多元高维时间序列网络结构中的变点,它采用非负矩阵分解(NMF)——一种无监督降维与聚类技术。鉴于NMF计算成本较高,我们使用C++代码实现该方法,并采用并行化策略以减少计算时间。此外,我们还利用一种新的二分搜索算法高效识别多个变点,并提供了一种新方法用于估计变点间数据的网络结构。通过将方法应用于神经影像和金融数据集,我们展示了该包的功能及方法的实用性。最后,我们提供了一种交互式、三维、面向大脑网络的独立可视化函数,该函数可方便地配合任意节点坐标图谱使用,节点可根据社区归属(若适用)进行颜色编码。输出结果是一个优雅显示在皮层表面上的网络图像,可在三维空间中旋转。