Personal information retrieval fails when systems ignore how human memory works. While existing platforms force keyword searches across isolated silos, humans naturally recall through episodic cues like when, where, and in what context information was encountered. This dissertation presents the Unified Personal Index (UPI), a memory-aligned architecture that bridges this fundamental gap. The Indaleko prototype demonstrates the UPI's feasibility on a 31-million file dataset spanning 160TB across eight storage platforms. By integrating temporal, spatial, and activity metadata into a unified graph database, Indaleko enables natural language queries like "photos near the conference venue last spring" that existing systems cannot process. The implementation achieves sub-second query responses through memory anchor indexing, eliminates cross-platform search fragmentation, and maintains perfect precision for well-specified memory patterns. Evaluation against commercial systems (Google Drive, OneDrive, Dropbox, Windows Search) reveals that all fail on memory-based queries, returning overwhelming result sets without contextual filtering. In contrast, Indaleko successfully processes multi-dimensional queries combining time, location, and activity patterns. The extensible architecture supports rapid integration of new data sources (10 minutes to 10 hours per provider) while preserving privacy through UUID-based semantic decoupling. The UPI's architectural synthesis bridges cognitive theory with distributed systems design, as demonstrated through the Indaleko prototype and rigorous evaluation. This work transforms personal information retrieval from keyword matching to memory-aligned finding, providing immediate benefits for existing data while establishing foundations for future context-aware systems.
翻译:当系统忽视人类记忆工作原理时,个人信息检索就会失效。现有平台强制用户在相互隔离的数据孤岛中进行关键词搜索,而人类自然通过情景线索(如信息获取的时间、地点和上下文情境)进行回忆。本论文提出统一个人索引(UPI),这是一种符合记忆机制的架构,旨在弥合这一根本性差距。Indaleko原型系统基于跨越八个存储平台、包含3100万个文件(总计160TB)的数据集,验证了UPI的可行性。通过将时间、空间和活动元数据整合到统一的图数据库中,Indaleko能够处理现有系统无法实现的自然语言查询(例如“去年春天会议场馆附近的照片”)。该实现通过记忆锚点索引实现亚秒级查询响应,消除了跨平台搜索的碎片化问题,并对明确指定的记忆模式保持完全精确的检索效果。与商业系统(Google Drive、OneDrive、Dropbox、Windows Search)的对比评估显示,所有现有系统均无法处理基于记忆的查询,其返回的结果集数量庞大且缺乏上下文过滤。相比之下,Indaleko能成功处理结合时间、位置和活动模式的多维度查询。其可扩展架构支持快速集成新数据源(每个数据提供商仅需10分钟至10小时),并通过基于UUID的语义解耦机制保障隐私安全。UPI的架构融合了认知理论与分布式系统设计,这一特性已通过Indaleko原型系统和严格评估得到验证。本研究成果将个人信息检索从关键词匹配转变为符合记忆机制的智能查找,在为现有数据提供即时价值的同时,为未来情境感知系统奠定了理论基础。