Speakers
Description
ONLINE PRESENTATION
Taboo words present a challenge for a lexicographer to include and describe in a language resource, as they are forms of verbal violence. However, discarding offensive words from general-purpose lexicographic wordlists disregards the representation of an integral part of the mental lexicon. The present study aims at using lexicographic scenarios to jailbreak four GPT variants into the retrieval of offensive words that are frequently used yet undocumented in most lexicographic resources. While Large Language Models (LLMs) can be used to document a headword, the presence of taboo items may prevent these systems from providing an answer. Our results reveal that the type of the model and the lexicographic framing of the extraction task improved the responses of the models and increased the success rate, with the optimal configuration reaching 87.5% success rate. The AI-generated lexicon of offensive words currently contains approximately 250 headwords grouped into gender, age, religion and race categories. The words also vary in their inherently or contextually offensive types. A searchable user-friendly version is accessible through https://arabic-studies.com/Elex/index.html. The main contributions of this lexicon are detecting lexicographically undocumented offensive terms, pointing to the negative context of several headwords and discovering new senses of apparently neutral ones. In addition, LLMs provide very useful morphological, semantic and socio-cultural information in the definitions, despite the inconsistencies and some overgeneralizations in the definitions. Although corpus evidence proved the success of LLMs in detecting offensive words and senses, the automatic evaluation of AI-generated example sentences showed their limited value from a pedagogical perspective.