Large language models (LLMs) are typically evaluated on the basis of task-based benchmarks such as MMLU. Such benchmarks do not examine responsible behaviour of LLMs in specific contexts. This is particularly true in the LGBTI+ context where social stereotypes may result in variation in LGBTI+ terminology. Therefore, domain-specific lexicons or dictionaries may be useful as a representative list of words against which the LLM's behaviour needs to be evaluated. This paper presents a methodology for evaluation of LLMs using an LGBTI+ lexicon in Indian languages. The methodology consists of four steps: formulating NLP tasks relevant to the expected behaviour, creating prompts that test LLMs, using the LLMs to obtain the output and, finally, manually evaluating the results. Our qualitative analysis shows that the three LLMs we experiment on are unable to detect underlying hateful content. Similarly, we observe limitations in using machine translation as means to evaluate natural language understanding in languages other than English. The methodology presented in this paper can be useful for LGBTI+ lexicons in other languages as well as other domain-specific lexicons. The work done in this paper opens avenues for responsible behaviour of LLMs, as demonstrated in the context of prevalent social perception of the LGBTI+ community.
翻译:大语言模型(LLMs)通常基于任务型基准(如MMLU)进行评估。此类基准未检验LLMs在特定情境中的负责任行为,尤其是在LGBTI+语境中——社会刻板印象可能导致LGBTI+术语的变体。因此,特定领域的词汇表或词典可作为代表性词语清单,用以评估LLM的行为。本文提出一种利用印度语言LGBTI+词汇表评估LLMs的方法论。该方法包括四个步骤:制定与预期行为相关的自然语言处理任务、构建测试LLMs的提示、使用LLMs获取输出,以及最终人工评估结果。我们的定性分析显示,实验所用的三个LLMs均无法检测潜在仇恨内容。同样,我们观察到机器翻译作为评估非英语语言自然语言理解手段的局限性。本文提出的方法论可适用于其他语言的LGBTI+词汇表及其他领域特定词汇表。本研究为LLMs的负责任行为开辟了道路,正如在LGBTI+群体普遍社会认知背景下所展示的那样。