Understanding temporal patterns in online search behavior is crucial for real-time marketing and trend forecasting. Google Trends offers a rich proxy for public interest, yet the high dimensionality and noise of its time-series data present challenges for effective clustering. This study evaluates three unsupervised clustering approaches, Symbolic Aggregate approXimation (SAX), enhanced SAX (eSAX), and Topological Data Analysis (TDA), applied to 20 Google Trends keywords representing major consumer categories. Our results show that while SAX and eSAX offer fast and interpretable clustering for stable time series, they struggle with volatility and complexity, often producing ambiguous ``catch-all'' clusters. TDA, by contrast, captures global structural features through persistent homology and achieves more balanced and meaningful groupings. We conclude with practical guidance for using symbolic and topological methods in consumer analytics and suggest that hybrid approaches combining both perspectives hold strong potential for future applications.
翻译:理解在线搜索行为中的时间模式对于实时营销与趋势预测至关重要。Google Trends 为公众兴趣提供了丰富的代理指标,但其时间序列数据的高维性与噪声特性给有效聚类带来了挑战。本研究评估了三种无监督聚类方法——符号聚合近似法(SAX)、增强型SAX(eSAX)以及拓扑数据分析(TDA),并将其应用于代表主要消费品类的20个Google Trends关键词。结果表明,虽然SAX和eSAX能为稳定时间序列提供快速且可解释的聚类,但面对波动性与复杂性时表现欠佳,常产生模糊的“笼统类”聚类簇。相比之下,TDA通过持续同调捕捉全局结构特征,实现了更均衡且意义明确的聚类分组。最后,我们为消费者分析中符号化与拓扑方法的应用提供实践指导,并提出结合两种视角的混合方法在未来应用中具有巨大潜力。