We introduce and analyse active learning markets as a way to purchase labels, in situations where analysts aim to acquire additional data to improve model fitting, or to better train models for predictive analytics applications. This comes in contrast to the many proposals that already exist to purchase features and examples. By originally formalising the market clearing as an optimisation problem, we integrate budget constraints and improvement thresholds into the label acquisition process. We focus on a single-buyer-multiple-seller setup and propose the use of two active learning strategies (variance based and query-by-committee based), paired with distinct pricing mechanisms. They are compared to benchmark baselines including random sampling and a greedy knapsack heuristic. The proposed strategies are validated on real-world datasets from two critical application domains: real estate pricing and energy forecasting. Results demonstrate the robustness of our approach, consistently achieving superior performance with fewer labels acquired compared to conventional methods. Our proposal comprises an easy-to-implement practical solution for optimising data acquisition in resource-constrained environments.
翻译:本文提出并分析了主动学习市场作为一种标签采购方法,适用于分析人员旨在获取额外数据以改进模型拟合或优化预测分析应用模型训练的场景。这与现有众多采购特征和样本的方案形成鲜明对比。通过将市场出清形式化为优化问题,我们将预算约束与改进阈值整合到标签获取流程中。研究聚焦于单买方-多卖方架构,提出采用两种主动学习策略(基于方差与基于委员会查询),并搭配差异化定价机制。这些策略与包括随机采样和贪婪背包启发式算法在内的基准方法进行比较验证。所提策略在房地产定价和能源预测两个关键应用领域的真实数据集上得到验证。实验结果表明,与传统方法相比,该方案在获取更少标签的情况下能持续实现更优性能,展现出良好的鲁棒性。本研究提出的解决方案易于实施,为资源受限环境下的数据采集优化提供了实用方法。