This study aims to enrich and leverage data from the Informatics Europe Higher Education (IEHE) data portal to extract and analyze trends in female participation in Informatics across Europe. The research examines the proportion of female students, first-year enrollments, and degrees awarded to women in the field. The issue of low female participation in Informatics has long been recognized as a persistent challenge and remains a critical area of scholarly inquiry. Furthermore, existing literature indicates that socio-economic factors can unpredictably influence female participation, complicating efforts to address the gender gap. The analysis focuses on participation data from research universities at various academic levels, including Bachelors, Masters, and PhD programs, and seeks to uncover potential correlations between female participation and geographical or economic zones. The dataset was first enriched by integrating additional information, such as each country's GDP and relevant geographical data, sourced from various online repositories. Subsequently, the data was cleaned to ensure consistency and eliminate incomplete time series. A final set of complete time series was selected for further analysis. We then used the data collected from the internet to assign countries to different clusters. Specifically, we employed Economic Zone, Geographical Area, and GDP quartile to cluster countries and compare their temporal trends both within and between clusters. We analyze the results for each classification and derive conclusions based on the available data.
翻译:本研究旨在丰富并利用欧洲信息学高等教育(IEHE)数据门户的数据,以提取和分析欧洲信息学领域女性参与的趋势。研究考察了该领域女性学生的比例、一年级入学人数以及授予女性的学位数量。信息学领域女性参与度低的问题长期以来被认为是一个持续存在的挑战,并且仍然是学术研究的一个关键领域。此外,现有文献表明,社会经济因素可能以不可预测的方式影响女性参与,这为解决性别差距的努力带来了复杂性。分析侧重于研究型大学在学士、硕士和博士等多个学术层次的参与数据,并试图揭示女性参与度与地理或经济区域之间的潜在相关性。数据集首先通过整合额外信息得到丰富,例如每个国家的国内生产总值(GDP)和相关地理数据,这些数据来源于各种在线存储库。随后,对数据进行了清理以确保一致性并剔除不完整的时间序列。最终选择了一组完整的时间序列进行进一步分析。然后,我们利用从互联网收集的数据将国家分配到不同的聚类中。具体而言,我们采用经济区、地理区域和GDP四分位数对国家进行聚类,并比较聚类内部和聚类之间的时间趋势。我们分析了每种分类的结果,并根据现有数据得出了结论。