This paper introduces a novel changepoint detection framework that combines ensemble statistical methods with Large Language Models (LLMs) to enhance both detection accuracy and the interpretability of regime changes in time series data. Two critical limitations in the field are addressed. First, individual detection methods exhibit complementary strengths and weaknesses depending on data characteristics, making method selection non-trivial and prone to suboptimal results. Second, automated, contextual explanations for detected changes are largely absent. The proposed ensemble method aggregates results from ten distinct changepoint detection algorithms, achieving superior performance and robustness compared to individual methods. Additionally, an LLM-powered explanation pipeline automatically generates contextual narratives, linking detected changepoints to potential real-world historical events. For private or domain-specific data, a Retrieval-Augmented Generation (RAG) solution enables explanations grounded in user-provided documents. The open source Python framework demonstrates practical utility in diverse domains, including finance, political science, and environmental science, transforming raw statistical output into actionable insights for analysts and decision-makers.
翻译:本文提出了一种新颖的变点检测框架,该框架将集成统计方法与大型语言模型(LLMs)相结合,旨在提升时间序列数据中状态转变的检测精度与可解释性。该研究解决了该领域的两个关键局限性:首先,个体检测方法根据数据特性展现出互补的优势与不足,导致方法选择非平凡且易产生次优结果;其次,针对检测到的变化缺乏自动化的情境解释。所提出的集成方法聚合了十种不同变点检测算法的结果,相较于单一方法,实现了更优的性能与鲁棒性。此外,一个由LLM驱动的解释管道能自动生成情境化叙述,将检测到的变点与潜在的现实历史事件相关联。针对私有或特定领域数据,检索增强生成(RAG)解决方案支持基于用户提供文档的接地解释。这一开源Python框架在金融、政治学及环境科学等多个领域展现了其实用价值,能够将原始统计输出转化为可供分析师和决策者参考的可操作见解。