Decentralized data markets can provide more equitable forms of data acquisition for machine learning. However, to realize practical marketplaces, efficient techniques for seller selection need to be developed. We propose and benchmark federated data measurements to allow a data buyer to find sellers with relevant and diverse datasets. Diversity and relevance measures enable a buyer to make relative comparisons between sellers without requiring intermediate brokers and training task-dependent models.
翻译:去中心化数据市场可以为机器学习提供更公平的数据获取形式。然而,要实现实用的市场平台,需要开发高效的卖方选择技术。我们提出并基准测试了联邦数据测量方法,使数据买方能够找到具有相关性和多样性的数据集卖方。多样性和相关性测量使买方能够在不依赖中间经纪商和训练任务相关模型的情况下,对卖方进行相对比较。