How to Purchase Labels? A Cost-Effective Approach Using Active Learning Markets

We introduce and analyse active learning markets as a way to purchase labels, in situations where analysts aim to acquire additional data to improve model fitting, or to better train models for predictive analytics applications. This comes in contrast to the many proposals that already exist to purchase features and examples. By originally formalising the market clearing as an optimisation problem, we integrate budget constraints and improvement thresholds into the label acquisition process. We focus on a single-buyer-multiple-seller setup and propose the use of two active learning strategies (variance based and query-by-committee based), paired with distinct pricing mechanisms. They are compared to benchmark baselines including random sampling and a greedy knapsack heuristic. The proposed strategies are validated on real-world datasets from two critical application domains: real estate pricing and energy forecasting. Results demonstrate the robustness of our approach, consistently achieving superior performance with fewer labels acquired compared to conventional methods. Our proposal comprises an easy-to-implement practical solution for optimising data acquisition in resource-constrained environments.

翻译：本文提出并分析了主动学习市场作为一种标签采购方法，适用于分析人员旨在获取额外数据以改进模型拟合或优化预测分析应用模型训练的场景。这与现有众多采购特征和样本的方案形成鲜明对比。通过将市场出清形式化为优化问题，我们将预算约束与改进阈值整合到标签获取流程中。研究聚焦于单买方-多卖方架构，提出采用两种主动学习策略（基于方差与基于委员会查询），并搭配差异化定价机制。这些策略与包括随机采样和贪婪背包启发式算法在内的基准方法进行比较验证。所提策略在房地产定价和能源预测两个关键应用领域的真实数据集上得到验证。实验结果表明，与传统方法相比，该方案在获取更少标签的情况下能持续实现更优性能，展现出良好的鲁棒性。本研究提出的解决方案易于实施，为资源受限环境下的数据采集优化提供了实用方法。

相关内容

主动学习

关注 243

主动学习是机器学习（更普遍的说是人工智能）的一个子领域，在统计学领域也叫查询学习、最优实验设计。“学习模块”和“选择策略”是主动学习算法的2个基本且重要的模块。主动学习是“一种学习方法，在这种方法中，学生会主动或体验性地参与学习过程，并且根据学生的参与程度，有不同程度的主动学习。” （Bonwell＆Eison 1991）Bonwell＆Eison（1991）指出：“学生除了被动地听课以外，还从事其他活动。” 在高等教育研究协会（ASHE）的一份报告中，作者讨论了各种促进主动学习的方法。他们引用了一些文献，这些文献表明学生不仅要做听，还必须做更多的事情才能学习。他们必须阅读，写作，讨论并参与解决问题。此过程涉及三个学习领域，即知识，技能和态度（KSA）。这种学习行为分类法可以被认为是“学习过程的目标”。特别是，学生必须从事诸如分析，综合和评估之类的高级思维任务。

DARPA少标签学习项目成果《学会用更少的标签学习和适应》2023最新90页技术报告

专知会员服务

40+阅读 · 2023年12月13日

DARPA "少标签学习 "项目《利用任务和领域结构从小型标签集学习》2023最新报告

专知会员服务

57+阅读 · 2023年12月6日

标签高效深度学习的医学图像分析:挑战与未来方向

专知会员服务

35+阅读 · 2023年4月3日

【SIGMOD教程】高效数据标签的众包实践:聚合、增量重标签和定价，附180页slides

专知会员服务

11+阅读 · 2022年10月20日