Temporal range filtering is a critical operation in large-scale search systems, particularly for location-based services that need to filter businesses by operating hours. Traditional approaches either suffer from poor query performance (scope filtering) or index size explosion (minute-level indexing). We present Timehash, a novel hierarchical time indexing algorithm that achieves over 99% reduction in index size compared to minute-level indexing while maintaining 100% precision. Timehash employs a flexible multi-resolution strategy with customizable hierarchical levels. Through empirical analysis on distributions from 12.6 million business records of a production location search service, we demonstrate a data-driven methodology for selecting optimal hierarchies tailored to specific data distributions. We evaluated Timehash on up to 12.6 million synthetic POIs generated from production distributions. Experimental results show that a five-level hierarchy reduces index terms to 5.6 per document (99.1% reduction versus minute-level indexing), with zero false positives and zero false negatives. Scalability benchmarks confirm constant per-document cost from 100K to 12.6M POIs, while supporting complex scenarios such as break times and irregular schedules. Our approach is generalizable to various temporal filtering problems in search systems, e-commerce, and reservation platforms.
翻译:时间范围过滤是大规模搜索系统中的关键操作,尤其对于需要根据营业时间筛选商家的基于位置的服务。传统方法要么存在查询性能不佳(范围过滤)的问题,要么面临索引大小爆炸(分钟级索引)的挑战。本文提出Timehash,一种新颖的层次化时间索引算法,与分钟级索引相比,在保持100%精度的同时实现了超过99%的索引大小缩减。Timehash采用灵活的多分辨率策略,支持可定制的层次级别。通过对来自生产环境位置搜索服务的1260万条商业记录分布进行实证分析,我们展示了一种数据驱动的方法,用于选择适应特定数据分布的最优层次结构。我们在基于生产分布生成的最高达1260万个合成POI上评估了Timehash。实验结果表明,五层层次结构将每个文档的索引项减少至5.6个(相比分钟级索引减少99.1%),且误报率和漏报率均为零。可扩展性基准测试证实了从10万到1260万个POI的恒定单文档成本,同时支持休息时间和不规则营业时间等复杂场景。我们的方法可推广至搜索系统、电子商务和预约平台中的各类时间过滤问题。