Visual Question Answering (VQA) is an emerging area of interest for researches, being a recent problem in natural language processing and image prediction. In this area, an algorithm needs to answer questions about certain images. As of the writing of this survey, 25 recent studies were analyzed. Besides, 6 datasets were analyzed and provided their link to download. In this work, several recent pieces of research in this area were investigated and a deeper analysis and comparison among them were provided, including results, the state-of-the-art, common errors, and possible points of improvement for future researchers.
翻译:视觉问答(VQA)是一个新兴的研究领域,也是自然语言处理与图像预测中的前沿问题。在该领域中,算法需要回答关于特定图像的问题。截至本文撰写时,共分析了25项近期研究。此外,还分析了6个数据集并提供了其下载链接。本文调研了该领域的多项最新研究,并对其进行了深入分析与比较,涵盖实验结果、当前最优方法、常见错误以及未来研究者可能的改进方向。