Speaker
Description
Large language models (LLMs) have attracted much attention in lexicography since the release of ChatGPT in late 2022. Several studies have explored their use in dictionary-writing tasks (e.g., Lew, 2023); others have also raised concerns about their limitations and risks (e.g., McKean and Fitzgerald, 2023); de Schryver (2023) provides a useful overview and analysis.
This paper explores how LLMs can support and enrich the Oxford English Dictionary (OED), a large historical dictionary with over half a million entries. The OED has a history of adopting pioneering technologies, such as computerization and the use of electronic text archives and corpora (Gilliver, 2016, p. 542, pp. 559–63); and in recent years it has benefited from machine-learning projects such as the semi-automated expansion of the OED’s Historical Thesaurus (McCracken, 2015). It is in this spirit that OED staff have approached possible uses of LLMs. Like other lexicographers, we have been exploring the potential of LLMs to accelerate drafting of dictionary content, but we are particularly interested in cases where LLMs could work at scale across the whole dictionary, for example generating draft definitions for undefined derivatives, modernizing unrevised definitions, and assigning illustrative quotations to senses. We present our findings from experiments with various prompts, and report especially on the importance of well-structured prompts and high-quality examples, and on the essential role of human editors in refining prompts and reviewing output. We also discuss our findings regarding the outputs of different LLMs and parameters.
We identify areas where LLMs perform relatively well (for example some types of definition and usage notes), areas where they currently perform less well (especially with historical data), and areas for future investigation. We also discuss other opportunities that LLMs could create for the OED. For example, we are currently in the early stages of planning a new, large corpus of historical English, and investigating some of the ways in which machine learning and LLMs could be used in tagging and annotating the data. We are also exploring the ways in which LLMs could transform the user experience of the OED: for example, a conversational interface powered by an LLM would allow users to search the OED using natural language queries, obviating the need for complex advanced searches and enabling novel possibilities for interacting with the dictionary or its data. If successful, such an approach might be extended to other tasks, such as the natural language querying of corpora.
There are particular challenges for the OED in all of these propositions, not least the fact that the LLMs available at the time of writing (such as GPT-4, Claude 3, and LLaMA 3) are trained on modern texts and not well suited to the analysis of historical data. While some historical language models have been developed (e.g., MacBERTh: see Manjavacas and Fonteyn, 2021), the field of historical LLMs is at a very early stage. Another challenge is the presence of third-party filters, which limit the use of LLMs in handling sensitive material. Furthermore, we share the widely-expressed concerns about errors and hallucinations generated by LLMs, and we are keen to find ways to mitigate these, for example through retrieval-augmented generation.
LLMs and other AI tools also offer unique opportunities for the OED, given its size and scope. Beyond the explorations summarized above, we discuss ways that an AI tool could carry out tasks on a scale that would not be possible for a human lexicographer: for example, identifying significant antedatings, gaps, omissions, or discrepancies across the whole text, which could help prioritize entries for revision; or identifying and visualizing patterns and connections across the dictionary. We anticipate that AI will revolutionize lexicography in a similar way as corpora have in previous decades: not by replacing but by enhancing the work of human editors.