Change point detection is a commonly used technique in time series analysis, capturing the dynamic nature in which many real-world processes function. With the ever increasing troves of multivariate high-dimensional time series data, especially in neuroimaging and finance, there is a clear need for scalable and data-driven change point detection methods. Currently, change point detection methods for multivariate high-dimensional data are scarce, with even less available in high-level, easily accessible software packages. To this end, we introduce the R package fabisearch, available on the Comprehensive R Archive Network (CRAN), which implements the factorized binary search (FaBiSearch) methodology. FaBiSearch is a novel statistical method for detecting change points in the network structure of multivariate high-dimensional time series which employs non-negative matrix factorization (NMF), an unsupervised dimension reduction and clustering technique. Given the high computational cost of NMF, we implement the method in C++ code and use parallelization to reduce computation time. Further, we also utilize a new binary search algorithm to efficiently identify multiple change points and provide a new method for network estimation for data between change points. We show the functionality of the package and the practicality of the method by applying it to a neuroimaging and a finance data set. Lastly, we provide an interactive, 3-dimensional, brain-specific network visualization capability in a flexible, stand-alone function. This function can be conveniently used with any node coordinate atlas, and nodes can be color coded according to community membership (if applicable). The output is an elegantly displayed network laid over a cortical surface, which can be rotated in the 3-dimensional space.
翻译:变点检测是时间序列分析中常用的一种技术,能够捕捉许多现实世界过程运行中的动态特性。随着神经影像学和金融等领域中多元高维时间序列数据量的持续增长,对可扩展且数据驱动的变点检测方法的需求日益迫切。目前,针对多元高维数据的变点检测方法较为稀缺,而在易于使用的高级软件包中更是鲜有实现。为此,我们推出了R语言包fabisearch(可在综合R档案网络(CRAN)上获取),该包实现了因子化二分搜索(FaBiSearch)方法。FaBiSearch是一种新颖的统计方法,用于检测多元高维时间序列网络结构中的变点,它采用了非负矩阵分解(NMF)——一种无监督降维和聚类技术。鉴于NMF计算成本较高,我们使用C++代码实现该方法,并通过并行计算来减少运算时间。此外,我们还利用一种新的二分搜索算法高效识别多个变点,并为变点之间的数据提供了一种新的网络估计方法。通过将其应用于神经影像和金融数据集,我们展示了该包的功能及方法的实用性。最后,我们提供了一个灵活、独立的函数,用于交互式、三维、面向脑网络的视觉化展示。该函数可方便地配合任何节点坐标图谱使用,节点可根据社区归属(如适用)进行颜色编码。输出结果是在皮质表面上优雅呈现的网络,可在三维空间中旋转。