We present experiences and lessons learned from increasing data readiness of heterogeneous data for artificial intelligence projects using visual analysis methods. Increasing the data readiness level involves understanding both the data as well as the context in which it is used, which are challenges well suitable to visual analysis. For this purpose, we contribute a mapping between data readiness aspects and visual analysis techniques suitable for different data types. We use the defined mapping to increase data readiness levels in use cases involving time-varying data, including numerical, categorical, and text. In addition to the mapping, we extend the data readiness concept to better take aspects of the task and solution into account and explicitly address distribution shifts during data collection time. We report on our experiences in using the presented visual analysis techniques to aid future artificial intelligence projects in raising the data readiness level.
翻译:本文介绍了利用可视化分析方法提升人工智能项目中异构数据就绪度的实践经验与启示。提升数据就绪度需要同时理解数据本身及其应用场景,这些挑战非常适合通过可视化分析来解决。为此,我们构建了适用于不同数据类型的数据就绪度维度与可视化分析技术之间的映射关系。在涉及时序数据(包括数值型、分类型和文本数据)的用例中,我们运用已定义的映射关系来提升数据就绪度。除映射关系外,本文还拓展了数据就绪度的概念,以更好地纳入任务与解决方案的考量维度,并明确处理数据收集期间的分布偏移问题。我们报告了运用所述可视化分析技术的实践经验,以期为未来人工智能项目提升数据就绪度提供参考。