Lifelogs are descriptions of experiences that a person had during their life. Lifelogs are created by fusing data from the multitude of digital services, such as online photos, maps, shopping and content streaming services. Question answering over lifelogs can offer personal assistants a critical resource when they try to provide advice in context. However, obtaining answers to questions over lifelogs is beyond the current state of the art of question answering techniques for a variety of reasons, the most pronounced of which is that lifelogs combine free text with some degree of structure such as temporal and geographical information. We create and publicly release TimelineQA1, a benchmark for accelerating progress on querying lifelogs. TimelineQA generates lifelogs of imaginary people. The episodes in the lifelog range from major life episodes such as high school graduation to those that occur on a daily basis such as going for a run. We describe a set of experiments on TimelineQA with several state-of-the-art QA models. Our experiments reveal that for atomic queries, an extractive QA system significantly out-performs a state-of-the-art retrieval-augmented QA system. For multi-hop queries involving aggregates, we show that the best result is obtained with a state-of-the-art table QA technique, assuming the ground truth set of episodes for deriving the answer is available.
翻译:生命日志记录了个体在其一生中的经历描述。生命日志通过融合在线照片、地图、购物及内容流服务等多类数字服务的数据生成。当个人助手尝试提供上下文相关建议时,基于生命日志的问答系统可为其提供关键资源。然而,由于多种原因,当前最先进的问答技术尚无法有效回答生命日志相关问题,其中最显著的原因在于生命日志融合了自由文本与时间、地理位置等结构化信息。为此,我们创建并公开发布了TimelineQA基准数据集,旨在加速生命日志查询技术的发展。该数据集生成虚构人物的生命日志,其中包含从高中毕业等重大人生事件到日常跑步等生活片段。我们通过多个最先进的问答模型在TimelineQA上开展了一系列实验。实验结果表明:对于原子型查询,抽取式问答系统的性能显著优于当前最优的检索增强型问答系统;对于涉及聚合量的多跳查询,若可获得推导答案所需的真实事件集合,则采用最先进的表格问答技术可获得最优结果。