Dynamic Batching of Online Arrivals to Leverage Economies of Scale

Many settings, such as medical testing of patients in hospitals or matching riders to drivers in ride-hailing platforms, require handling arrivals over time. In such applications, it is often beneficial to group the arriving orders, samples, or requests into batches and process the larger batches rather than individual arrivals. However, waiting too long to create larger batches incurs a waiting cost for past arrivals. On the other hand, processing the arrivals too soon leads to higher processing costs by missing the economies of scale of grouping larger numbers of arrivals into larger batches. Moreover, the timing of the next arrival is often unknown, meaning that fixed-size batches or fixed wait times tend to be suboptimal. In this work, we consider the problem of finding the optimal batching schedule to minimize the average wait time plus the average processing cost under both offline and online settings. In the offline problem in which all arrival times are known a priori, we show that the optimal batching schedule can be found in polynomial time by reducing it to a shortest path problem on a weighted acyclic graph. For the online problem with unknown arrival times, we develop online algorithms that are provably competitive for a broad range of processing-cost functions. We also provide a lower bound on the competitive ratio that no online algorithm can beat. Finally, we run extensive numerical experiments on simulated and real data to demonstrate the effectiveness of our proposed algorithms against the optimal offline benchmark.

翻译：许多场景，例如医院中对患者的医学检测或网约车平台中司机与乘客的匹配，都需要随时间处理到达的请求。在这类应用中，将到达的订单、样本或请求分组为批次，并处理较大的批次而非单个到达通常是有益的。然而，等待过久以形成较大批次会导致之前到达产生的等待成本。另一方面，过早处理到达会因无法利用将更多到达分组为较大批次所带来的规模经济而导致更高的处理成本。此外，下一次到达的时间通常是未知的，这意味着固定大小的批次或固定的等待时间往往不是最优的。在本工作中，我们考虑在离线和在线两种设置下，寻找最优批处理调度以最小化平均等待时间与平均处理成本之和的问题。在已知所有到达时间的离线问题中，我们证明通过将其简化为加权无环图上的最短路径问题，可以在多项式时间内找到最优批处理调度。对于到达时间未知的在线问题，我们开发了在线算法，这些算法在广泛的处理成本函数下具有可证明的竞争性。我们还给出了一个竞争比的下界，表明任何在线算法都无法超越。最后，我们在模拟数据和真实数据上进行了大量数值实验，证明了我们提出的算法相对于最优离线基准的有效性。