Evaluating AI Vocational Skills Through Professional Testing

Using a novel professional certification survey, the study focuses on assessing the vocational skills of two highly cited AI models, GPT-3 and Turbo-GPT3.5. The approach emphasizes the importance of practical readiness over academic performance by examining the models' performances on a benchmark dataset consisting of 1149 professional certifications. This study also includes a comparison with human test scores, providing perspective on the potential of AI models to match or even surpass human performance in professional certifications. GPT-3, even without any fine-tuning or exam preparation, managed to achieve a passing score (over 70% correct) on 39% of the professional certifications. It showcased proficiency in computer-related fields, including cloud and virtualization, business analytics, cybersecurity, network setup and repair, and data analytics. Turbo-GPT3.5, on the other hand, scored a perfect 100% on the highly regarded Offensive Security Certified Professional (OSCP) exam. This model also demonstrated competency in diverse professional fields, such as nursing, licensed counseling, pharmacy, and aviation. Turbo-GPT3.5 exhibited strong performance on customer service tasks, indicating potential use cases in enhancing chatbots for call centers and routine advice services. Both models also scored well on sensory and experience-based tests outside a machine's traditional roles, including wine sommelier, beer tasting, emotional quotient, and body language reading. The study found that OpenAI's model improvement from Babbage to Turbo led to a 60% better performance on the grading scale within a few years. This progress indicates that addressing the current model's limitations could yield an AI capable of passing even the most rigorous professional certifications.

翻译：采用一项新颖的专业认证调查，本研究聚焦于评估两个高被引AI模型——GPT-3和Turbo-GPT3.5的职业技能。该方法通过考察模型在包含1149项专业认证的基准数据集上的表现，强调实践能力相较于学术成绩的重要性。本研究还包含与人类测试成绩的对比，为AI模型在专业认证中达到甚至超越人类表现提供了前瞻性视角。即使未经过任何微调或考试准备，GPT-3在39%的专业认证中仍能达到及格分数（正确率超过70%）。它在计算机相关领域展现出专业能力，包括云与虚拟化、商业分析、网络安全、网络搭建与维修以及数据分析。另一方面，Turbo-GPT3.5在极具权威性的“进攻性安全认证专业人士（OSCP）”考试中获得了完美的100%分数。该模型还在护理、持证咨询、药学与航空等多个专业领域展现出胜任能力。Turbo-GPT3.5在客户服务任务中表现优异，表明其在增强呼叫中心聊天机器人与常规咨询服务方面具有潜在应用价值。两个模型在传统机器领域之外的感官与经验型测试中同样表现出色，包括品酒师、啤酒品鉴、情商测试与肢体语言解读。研究发现，OpenAI从Babbage到Turbo的模型改进在短短几年内使评分等级性能提升了60%。这一进展表明，解决当前模型局限性有望催生能够通过最严格专业认证的人工智能。