Data-driven algorithm design is a promising, learning-based approach for beyond worst-case analysis of algorithms with tunable parameters. An important open problem is the design of computationally efficient data-driven algorithms for combinatorial algorithm families with multiple parameters. As one fixes the problem instance and varies the parameters, the "dual" loss function typically has a piecewise-decomposable structure, i.e. is well-behaved except at certain sharp transition boundaries. In this work we initiate the study of techniques to develop efficient ERM learning algorithms for data-driven algorithm design by enumerating the pieces of the sum dual loss functions for a collection of problem instances. The running time of our approach scales with the actual number of pieces that appear as opposed to worst case upper bounds on the number of pieces. Our approach involves two novel ingredients -- an output-sensitive algorithm for enumerating polytopes induced by a set of hyperplanes using tools from computational geometry, and an execution graph which compactly represents all the states the algorithm could attain for all possible parameter values. We illustrate our techniques by giving algorithms for pricing problems, linkage-based clustering and dynamic-programming based sequence alignment.
翻译:数据驱动算法设计是一种有前景的、基于学习的方法,用于对具有可调参数的算法进行超越最坏情况的分析。一个重要开放问题是为具有多个参数的组合算法族设计计算高效的数据驱动算法。当固定问题实例并变化参数时,“对偶”损失函数通常具有分段可分解结构,即在某些尖锐的转变边界之外表现良好。本文中,我们首次系统研究了通过枚举一组问题实例的总对偶损失函数的分段来开发数据驱动算法设计的有效经验风险最小化学习算法的技术。我们方法的运行时间与实际出现的分段数量成比例,而非分段数量的最坏情况上界。我们的方法包含两个创新要素——利用计算几何工具对由超平面集合诱导的多面体进行输出敏感枚举的算法,以及一个紧凑表示算法在所有可能参数值下可能达到的所有状态的执行图。我们通过为定价问题、基于链接的聚类和基于动态规划的序列对齐设计算法来展示我们的技术。