The acquisition of survey responses is a crucial component in conducting research aimed at comprehending public opinion. However, survey data collection can be arduous, time-consuming, and expensive, with no assurance of an adequate response rate. In this paper, we propose a pioneering approach for predicting survey responses by examining quotations using machine learning. Our investigation focuses on evaluating the degree of favorability towards the United States, a topic of interest to many organizations and governments. We leverage a vast corpus of quotations from individuals across different nationalities and time periods to extract their level of favorability. We employ a combination of natural language processing techniques and machine learning algorithms to construct a predictive model for survey responses. We investigate two scenarios: first, when no surveys have been conducted in a country, and second when surveys have been conducted but in specific years and do not cover all the years. Our experimental results demonstrate that our proposed approach can predict survey responses with high accuracy. Furthermore, we provide an exhaustive analysis of the crucial features that contributed to the model's performance. This study has the potential to impact survey research in the field of data science by substantially decreasing the cost and time required to conduct surveys while simultaneously providing accurate predictions of public opinion.
翻译:调查回应的获取是理解公众舆论研究中的关键组成部分。然而,调查数据收集可能耗时、昂贵且费力,且无法保证足够的回应率。本文提出了一种开创性的方法,通过机器学习分析引文来预测调查回应。我们的研究聚焦于评估对美好感度——这一众多组织和政府关注的主题。我们利用涵盖不同国籍和时期的大量个人引文语料库,提取其好感度水平。我们结合自然语言处理技术与机器学习算法,构建了一个用于预测调查回应的模型。我们探讨了两种情境:其一,当某国尚未开展任何调查时;其二,当某国仅在特定年份开展调查但未覆盖所有年份时。实验结果表明,我们提出的方法能够以高精度预测调查回应。此外,我们对模型性能的关键特征进行了详尽分析。本研究有望通过显著降低开展调查的成本和时间,同时提供准确的舆论预测,从而影响数据科学领域的调查研究。