We introduce and analyse active learning markets as a way to purchase labels, in situations where analysts aim to acquire additional data to improve model fitting, or to better train models for predictive analytics applications. This comes in contrast to the many proposals that already exist to purchase features and examples. By originally formalising the market clearing as an optimisation problem, we integrate budget constraints and improvement thresholds into the label acquisition process. We focus on a single-buyer-multiple-seller setup and propose the use of two active learning strategies (variance based and query-by-committee based), paired with distinct pricing mechanisms. They are compared to benchmark baselines including random sampling and a greedy knapsack heuristic. The proposed strategies are validated on real-world datasets from two critical application domains: real estate pricing and energy forecasting. Results demonstrate the robustness of our approach, consistently achieving superior performance with fewer labels acquired compared to conventional methods. Our proposal comprises an easy-to-implement practical solution for optimising data acquisition in resource-constrained environments.
翻译:本文提出并分析了主动学习市场作为一种购买标签的方法,适用于分析人员旨在获取额外数据以改进模型拟合或更好地训练预测分析应用模型的情境。这与现有大量购买特征和样本的提案形成对比。通过将市场出清形式化为优化问题,我们将预算约束和改进阈值整合到标签获取过程中。我们聚焦于单一买家-多卖家设置,并提出使用两种主动学习策略(基于方差和基于委员会查询),配合不同的定价机制。这些策略与包括随机抽样和贪婪背包启发式在内的基准方法进行比较。所提策略在房地产定价和能源预测这两个关键应用领域的真实数据集上得到验证。结果表明,与传统方法相比,我们的方法具有鲁棒性,能够以更少的标签获取量持续实现更优性能。我们的方案为资源受限环境下的数据获取优化提供了一种易于实施的实用解决方案。