Data acquisition is a major bottleneck for learning in real-time streams: analysts must decide on the fly which labels to purchase while respecting a rolling budget. However, existing online active learning rarely unifies pricing, information gain, and rolling budget constraints under concept drift. We introduce QueryMarket, a market-inspired framework that queries each incoming data point based on its estimated utility to the model and its price. Within this framework, we propose OVBAL (online variance-based active learning), which integrates data pricing with information-driven selection by estimating each sample's marginal utility via a D-optimality criterion with exponential forgetting and executing cost-aware purchases under rolling budget constraints. OVBAL yields a simple, fully online decision rule that adapts to nonstationary streams and heterogeneous label costs. Experiments on synthetic data and a real-world solar power generation forecasting task show that OVBAL is particularly effective under seller-centric pricing and yields a more favorable long-run error-cost trade-off in the real-world task under both pricing schemes.
翻译:数据采集是实时流学习中主要瓶颈:分析师必须在滚动预算约束下在线决定购买哪些标签。然而,现有在线主动学习很少在概念漂移情况下统一考虑定价、信息增益和滚动预算约束。我们提出QueryMarket——一种受市场启发的框架,该框架根据每个输入数据点对模型的估计效用及其价格进行查询。在此框架内,我们提出OVBAL(基于方差的在线主动学习),该方法通过D-最优性准则结合指数遗忘机制估算每个样本的边际效用,并在滚动预算约束下执行成本感知采购,从而将数据定价与信息驱动选择相结合。OVBAL产生了一种简单、完全在线的决策规则,可自适应非平稳流数据和异质性标签成本。在合成数据及真实世界太阳能发电预测任务上的实验表明,OVBAL在卖家主导定价下尤为有效,且在两种定价方案下均在真实任务中实现了更优的长期误差-成本权衡。