Recent advances in deep learning have promoted the advent of many computational systems capable of performing intelligent actions that, until then, were restricted to the human intellect. In the particular case of human languages, these advances allowed the introduction of applications like ChatGPT that are capable of generating coherent text without being explicitly programmed to do so. Instead, these models use large volumes of textual data to learn meaningful representations of human languages. Associated with these advances, concerns about copyright and data privacy infringements caused by these applications have emerged. Despite these concerns, the pace at which new natural language processing applications continued to be developed largely outperformed the introduction of new regulations. Today, communication barriers between legal experts and computer scientists motivate many unintentional legal infringements during the development of such applications. In this paper, a multidisciplinary team intends to bridge this communication gap and promote more compliant Portuguese NLP research by presenting a series of everyday NLP use cases, while highlighting the Portuguese legislation that may arise during its development.
翻译:深度学习的近期进展推动了许多能够执行智能行为的计算系统的出现,这些能力此前仅限于人类智力。在人类语言这一特定领域,这些进展催生了ChatGPT等应用,它们无需显式编程即可生成连贯文本。这些模型通过大量文本数据学习人类语言的有意义表征。与这些进展相伴而生的是对版权和数据隐私侵犯的担忧,这些担忧源于此类应用的开发。然而,新自然语言处理应用的开发速度仍大幅领先于新法规的引入。当前,法律专家与计算机科学家之间的沟通障碍导致此类应用开发过程中出现许多无意的法律违规行为。本文由多学科团队撰写,旨在通过展示一系列日常自然语言处理用例,同时强调开发过程中可能涉及的葡萄牙立法,弥合这一沟通鸿沟,促进更合规的葡萄牙语自然语言处理研究。