Interacting with the legal system and the government requires the assembly and analysis of various pieces of information that can be spread across different (paper) documents, such as forms, certificates and contracts (e.g. leases). This information is required in order to understand one's legal rights, as well as to fill out forms to file claims in court or obtain government benefits. However, finding the right information, locating the correct forms and filling them out can be challenging for laypeople. Large language models (LLMs) have emerged as a powerful technology that has the potential to address this gap, but still rely on the user to provide the correct information, which may be challenging and error-prone if the information is only available in complex paper documents. We present an investigation into utilizing multi-modal LLMs to analyze images of handwritten paper forms, in order to automatically extract relevant information in a structured format. Our initial results are promising, but reveal some limitations (e.g., when the image quality is low). Our work demonstrates the potential of integrating multi-modal LLMs to support laypeople and self-represented litigants in finding and assembling relevant information.
翻译:与法律体系和政府机构互动需要收集和分析散布在不同纸质文件(如表格、证书和合同,例如租赁协议)中的各类信息。理解个人法定权利、填写法庭索赔表格或获取政府福利均需此类信息。然而,对非专业人士而言,查找准确信息、定位正确表格并完成填写可能颇具挑战。大语言模型作为一种强大技术应运而生,有望弥合这一鸿沟,但其仍依赖用户提供准确信息——当信息仅存在于复杂纸质文件中时,这一过程既困难又易出错。本研究探索利用多模态大语言模型分析手写纸质表格图像,以结构化格式自动提取相关信息。初步实验结果令人鼓舞,但也揭示了若干局限性(如图像质量较低时)。本工作展现了整合多模态大语言模型辅助非专业人士及自诉当事人查找与整合相关信息的潜力。