In this paper, we present the data preparation activities that we performed for the Digital Experience Platform (DXP) project, commissioned and supervised by Doxee S.p.A.. DXP manages the billing data of the users of different companies operating in various sectors (electricity and gas, telephony, pay TV, etc.). This data has to be processed to provide services to the users (e.g., interactive billing), but mainly to provide analytics to the companies (e.g., churn prediction or user segmentation). We focus on the design of the data preparation pipeline, describing the challenges that we had to overcome in order to get the billing data ready to perform analysis on it. We illustrate the lessons learned by highlighting the key points that could be transferred to similar projects. Moreover, we report some interesting results and considerations derived from the preliminary analysis of the prepared data, also pointing out some possible future directions for the ongoing project, spacing from big data integration to privacy-preserving temporal record linkage.
翻译:本文介绍了我们为Doxee S.p.A.委托并监督的数字体验平台(DXP)项目所执行的数据准备工作。DXP管理着不同行业(电力与天然气、电信、付费电视等)企业用户的计费数据。这些数据需经处理以为用户提供服务(如交互式计费),但主要目的是为企业提供分析服务(如客户流失预测或用户分群)。我们聚焦于数据准备流程的设计,阐述了为使计费数据达到可分析状态所必须克服的挑战。通过突出可迁移至类似项目的关键要点,我们总结了经验教训。此外,我们报告了基于已准备数据的初步分析中获得的一些有趣结果与思考,并指出了该项目未来可能的发展方向——涉及从大数据集成到隐私保护时序记录关联等多个领域。