Production Machine Learning involves continuous training: hosting multiple versions of models over time, often with many model versions running at once. When model performance does not meet expectations, Machine Learning Engineers (MLEs) debug issues by exploring and analyzing numerous prior versions of code and training data to identify root causes and mitigate problems. Traditional debugging and logging tools often fall short in managing this experimental, multi-version context. FlorDB introduces Multiversion Hindsight Logging, which allows engineers to use the most recent version's logging statements to query past versions, even when older versions logged different data. Log statement propagation enables consistent injection of logging statements into past code versions, regardless of changes to the codebase. Once log statements are propagated across code versions, the remaining challenge in Multiversion Hindsight Logging is to efficiently replay the new log statements based on checkpoints from previous runs. Finally, a coherent user experience is required to help MLEs debug across all versions of code and data. To this end, FlorDB presents a unified relational model for efficient handling of historical queries, offering a comprehensive view of the log history to simplify the exploration of past code iterations. We present a performance evaluation on diverse benchmarks confirming its scalability and the ability to deliver real-time query responses, leveraging query-based filtering and checkpoint-based parallelism for efficient replay.
翻译:生产级机器学习涉及持续训练:长期托管多个模型版本,且常需同时运行多个版本。当模型性能未达预期时,机器学习工程师通过探索分析大量历史代码版本与训练数据来定位根因并缓解问题。传统调试与日志工具在处理这种实验性多版本场景时往往力不从心。FlorDB提出多版本事后日志记录方法,使工程师能够利用最新版本的日志语句查询历史版本,即使旧版本记录的数据不同。日志语句传播机制可在代码库变动的情况下,将日志语句一致注入历史代码版本。当日志语句跨版本传播后,多版本事后日志记录的核心挑战在于如何基于先前运行的检查点高效重放新的日志语句。最终,需构建连贯的用户体验以协助工程师跨代码与数据版本进行调试。为此,FlorDB提出统一关系模型高效处理历史查询,通过提供日志历史的全局视图简化历史代码迭代的探索过程。我们在多样化基准上的性能评估证实了其可扩展性,以及通过基于查询的过滤与基于检查点的并行机制实现实时查询响应的能力。