Production Machine Learning involves hosting multiple versions of models over time, often with many model versions running at once. When model performance does not meet expectations, Machine Learning Engineers (MLEs) debug issues by exploring and analyzing numerous prior versions of code and training data to identify root causes and mitigate problems. Traditional debugging and logging tools often fall short in managing this experimental, multi-version context. To address the challenges in this domain, novel approaches are required for logging and log data management. FlorDB introduces Multiversion Hindsight Logging, which allows engineers to use the most recent version's logging statements to explore past versions, even when older versions logged different data. Log statement propagation enables consistent injection of logging statements into past code versions, regardless of changes to the codebase. Once log statements are propagated across code versions, the remaining challenges in Multiversion Hindsight Logging relate to efficiently replaying the new log statements based on checkpoints from previous runs. Finally, a coherent user experience is required to help MLEs debug across all versions of code and data. To this end, FlorDB presents a unified relational model for efficient handling of historical queries, offering a comprehensive view of the log history to simplify the exploration of past code iterations. In sum, FlorDB provides a robust tool tailored to the specific needs of MLEs, significantly enhancing their ability to navigate the intricate landscape of ML experimentation.
翻译:生产级机器学习需要随时间托管多个版本的模型,且常有多版本模型同时运行。当模型性能未达预期时,机器学习工程师(MLE)通过探索分析大量历史版本的代码与训练数据来定位根因并解决问题。传统调试与日志工具在处理这种实验性、多版本的场景时往往力不从心。为应对该领域挑战,需要创新性的日志记录与日志数据管理方法。FlorDB提出了多版本回溯日志技术,允许工程师使用最新版本的日志语句探索历史版本,即使旧版本记录了不同的数据。日志语句传播机制能够将一致的日志注入语句应用于过往代码版本,而不受代码库变更影响。当日志语句跨版本传播后,多版本回溯日志技术面临的核心挑战在于如何基于历史运行检查点高效重放新日志语句。此外,还需要协调统一的用户体验以帮助MLE跨所有代码与数据版本进行调试。为此,FlorDB构建了统一的关系模型以高效处理历史查询,提供日志历史的全局视图以简化对过往代码迭代的探索。综上所述,FlorDB提供了专为MLE需求定制的强大工具,显著增强了其驾驭复杂机器学习实验生态的能力。