ProCoT: Stimulating Critical Thinking and Writing of Students through Engagement with Large Language Models (LLMs)

We introduce a novel writing method called Probing Chain of Thought (ProCoT), which prevents students from cheating using a Large Language Model (LLM), such as ChatGPT, while enhancing their active learning through such models. LLMs have disrupted education and many other feilds. For fear of students cheating, many educationists have resorted to banning their use, as their outputs can be human-like and hard to detect in some cases. These LLMs are also known for hallucinations (i.e. fake facts). We conduct studies with ProCoT in two different courses with a combined total of about 66 students. The students in each course were asked to prompt an LLM of their choice with one question from a set of four and required to affirm or refute statements in the LLM output by using peer reviewed references. The results show two things: (1) ProCoT stimulates creative/critical thinking and writing of students through engagement with LLMs when we compare the LLM solely output to ProCoT output and (2) ProCoT can prevent cheating because of clear limitations in existing LLMs when we compare students ProCoT output to LLM ProCoT output. We also discover that most students prefer to give answers in fewer words than LLMs, which are typically verbose. The average word counts for students, ChatGPT (v3.5) and Phind (v8) are 208, 391 and 383, respectively.

翻译：我们提出一种名为“探针式思维链”（Probing Chain of Thought, ProCoT）的新型写作方法，该方法在防止学生利用ChatGPT等大语言模型作弊的同时，通过此类模型增强其主动学习。LLMs已对教育及诸多领域造成颠覆性影响。为防范学生作弊，许多教育工作者倾向于禁止其使用，因为这些模型的输出有时与人类无异且难以检测。此外，这些LLMs常存在“幻觉”（即生成虚假事实）问题。我们在两门课程中共计约66名学生中开展了ProCoT研究。每门课程的学生被要求选择一组四个问题中的一个，自主选择LLM进行提问，并需通过同行评审文献来验证或反驳LLM输出中的陈述。研究结果揭示了两点：（1）将LLM的单独输出与ProCoT输出对比表明，ProCoT通过引导学生与LLM互动，激发了其创造性/批判性思维和写作能力；（2）将学生的ProCoT输出与LLM的ProCoT输出对比发现，现有LLM存在明显局限性，因此ProCoT可有效防止作弊。我们还发现，多数学生倾向于用比LLM更少的词汇作答（LLM的输出通常冗余）。学生、ChatGPT（v3.5）及Phind（v8）的平均词数分别为208、391和383。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日