Data quality or data evaluation is sometimes a task as important as collecting a large volume of data when it comes to generating accurate artificial intelligence models. In fact, being able to evaluate the data can lead to a larger database that is better suited to a particular problem because we have the ability to filter out data obtained automatically of dubious quality. In this paper we present RLBoost, an algorithm that uses deep reinforcement learning strategies to evaluate a particular dataset and obtain a model capable of estimating the quality of any new data in order to improve the final predictive quality of a supervised learning model. This solution has the advantage that of being agnostic regarding the supervised model used and, through multi-attention strategies, takes into account the data in its context and not only individually. The results of the article show that this model obtains better and more stable results than other state-of-the-art algorithms such as LOO, DataShapley or DVRL.
翻译:数据质量或数据评估有时与收集大量数据同等重要,尤其是在生成精确的人工智能模型时。实际上,能够评估数据有助于构建更适用于特定问题的大型数据库,因为我们可以过滤掉自动获取的存疑数据。本文提出RLBoost算法,该算法利用深度强化学习策略评估特定数据集,并生成一个能够估计任何新数据质量的模型,从而提升监督学习模型的最终预测质量。这一解决方案的优势在于,它对所使用的监督模型保持不可知性,并通过多重注意力策略,在数据所处的上下文中而非仅仅个体层面考量数据。实验结果表明,该模型相较于LOO、DataShapley或DVRL等其他先进算法,能够获得更优且更稳定的结果。