The acquisition of survey responses is a crucial component in conducting research aimed at comprehending public opinion. However, survey data collection can be arduous, time-consuming, and expensive, with no assurance of an adequate response rate. In this paper, we propose a pioneering approach for predicting survey responses by examining quotations using machine learning. Our investigation focuses on evaluating the degree of favorability towards the United States, a topic of interest to many organizations and governments. We leverage a vast corpus of quotations from individuals across different nationalities and time periods to extract their level of favorability. We employ a combination of natural language processing techniques and machine learning algorithms to construct a predictive model for survey responses. We investigate two scenarios: first, when no surveys have been conducted in a country, and second when surveys have been conducted but in specific years and do not cover all the years. Our experimental results demonstrate that our proposed approach can predict survey responses with high accuracy. Furthermore, we provide an exhaustive analysis of the crucial features that contributed to the model's performance. This study has the potential to impact survey research in the field of data science by substantially decreasing the cost and time required to conduct surveys while simultaneously providing accurate predictions of public opinion.
翻译:调查回应的获取是理解公众舆论的研究中至关重要的组成部分。然而,调查数据收集可能耗时、费力且昂贵,且无法保证足够的回应率。本文提出了一种开创性方法,通过机器学习分析引文来预测调查回应。我们的研究聚焦于评估对美国的好感度——这是许多组织和政府关注的议题。我们利用包含不同国籍和时间段个体引文的大型语料库,提取其好感度水平。我们结合自然语言处理技术与机器学习算法,构建了调查回应的预测模型。我们探讨了两种情景:其一,当某国从未进行过调查时;其二,当调查已在某些年份进行但未覆盖所有年份时。实验结果表明,我们提出的方法能够以高准确率预测调查回应。此外,我们对提升模型性能的关键特征进行了详尽分析。本研究有望通过大幅降低开展调查的成本与时间,同时提供准确的舆论预测,从而影响数据科学领域的调查研究。