Tag Prediction of Competitive Programming Problems using Deep Learning Techniques

In the past decade, the amount of research being done in the fields of machine learning and deep learning, predominantly in the area of natural language processing (NLP), has risen dramatically. A well-liked method for developing programming abilities like logic building and problem solving is competitive programming. It can be tough for novices and even veteran programmers to traverse the wide collection of questions due to the massive number of accessible questions and the variety of themes, levels of difficulty, and questions offered. In order to help programmers find questions that are appropriate for their knowledge and interests, there is a need for an automated method. This can be done using automated tagging of the questions using Text Classification. Text classification is one of the important tasks widely researched in the field of Natural Language Processing. In this paper, we present a way to use text classification techniques to determine the domain of a competitive programming problem. A variety of models, including are implemented LSTM, GRU, and MLP. The dataset has been scraped from Codeforces, a major competitive programming website. A total of 2400 problems were scraped and preprocessed, which we used as a dataset for our training and testing of models. The maximum accuracy reached using our model is 78.0% by MLP(Multi Layer Perceptron).

翻译：过去十年间，机器学习和深度学习领域的研究大幅增长，尤其是在自然语言处理方面。竞赛编程作为一种广受欢迎的培养逻辑构建与问题解决等编程技能的方法，其可获取的问题数量庞大、主题、难度及题型多样，使得新手乃至经验丰富的程序员都难以在浩瀚题集中精准定位目标。为帮助程序员找到符合其知识水平和兴趣的问题，亟需一种自动化方法。这可通过基于文本分类的自动标注技术实现。文本分类是自然语言处理领域广泛研究的重要任务之一。本文提出利用文本分类技术判定竞赛编程问题所属领域的方法。我们实现了LSTM、GRU和MLP等多种模型。数据集从主流竞赛编程网站Codeforces爬取，共收集并预处理了2400道题目，用于模型训练与测试。其中，MLP（多层感知机）模型取得了最高78.0%的准确率。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日