Flow with FlorDB: Incremental Context Maintenance for the Machine Learning Lifecycle

In this paper we present techniques to incrementally harvest and query arbitrary metadata from machine learning pipelines, without disrupting agile practices. We center our approach on the developer-favored technique for generating metadata -- log statements -- leveraging the fact that logging creates context. We show how hindsight logging allows such statements to be added and executed post-hoc, without requiring developer foresight. Relational views of incomplete metadata can be queried to dynamically materialize new metadata in bulk and on demand across multiple versions of workflows. This is done in a "metadata later" style, off the critical path of agile development. We realize these ideas in a system called FlorDB and demonstrate how the data context framework covers a range of both ad-hoc metadata as well as special cases treated today by bespoke feature stores and model repositories. Through a usage scenario -- including both ML and human feedback -- we illustrate how the component techniques come together to resolve classic software engineering trade-offs between agility and discipline.

翻译：本文提出了一种在不干扰敏捷实践的前提下，从机器学习流水线中增量式采集与查询任意元数据的技术。我们的方法以开发者偏好的元数据生成技术——日志语句——为核心，利用日志记录创建上下文的特性。我们展示了后见日志技术如何允许此类语句在事后添加与执行，无需开发者的预先规划。不完整元数据的关系视图可被查询，从而跨工作流的多个版本动态、批量、按需地物化新的元数据。这一过程以“元数据后置”的风格实现，脱离了敏捷开发的关键路径。我们在名为FlorDB的系统中实现了这些理念，并展示了数据上下文框架如何覆盖从临时元数据到当前由定制化特征存储和模型仓库处理的各类特例。通过一个包含机器学习与人工反馈的使用场景，我们阐释了各组件技术如何协同工作，以解决敏捷性与规范性之间的经典软件工程权衡。

相关内容

Machine Learning

关注 2251

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日