8–12 Oct 2024
Hotel Croatia
Europe/Warsaw timezone

ChatGPT and the Dawn of the Post-Dictionary World: Evaluating ChatGPT’s Effectiveness Against Benchmarks Ffrom Dictionary User Studies

10 Oct 2024, 12:40
30m
Ragusa Hall (Hotel Croatia)

Ragusa Hall

Hotel Croatia

Speaker

Tomasz Michta

Description

The arrival of generative language models such as ChatGPT, developed by OpenAI, has sparked considerable interest among the general public, linguists, and lexicographers. As the roundtable discussion at eLex 2023 in Brno clearly showed, the lexicographic community is split between excitement and scepticism. Although there is growing recognition of the benefits that generative AI can bring to lexicography, the jury is still out regarding the full scope of its impact. For some, ChatGPT “does not herald ‘the end of lexicography’” (Rundell, 2023, p. 9), but for others it makes dictionaries, lexicographers and post-editing lexicographic tools redundant (de Schryver, 2023).
Initial research into the potential of ChatGPT for lexicography (Jakubíček & Rundell, 2023; McKean & Fitzgerald, 2023; Rundell, 2023) has largely relied on the authors’ own expert evaluations. Lew’s (2023) study is unique in this context as it employed a blind review process, where human experts assessed dictionary entries taken from Collins COBUILD Advanced Online and those generated by ChatGPT-3.5. The study found that the quality of definitions generated by ChatGPT was “practically indistinguishable” (Lew, 2023, p. 8) from those produced by COBUILD lexicographers. Further evidence of ChatGPT’s potential comes from dictionary user studies. The first such study, carried out by Rees & Lew (2023), used a multiple-choice reading task to compare the effectiveness of ChatGPTgenerated definitions with those from the Macmillan English Dictionary (MED) in
helping users understand unknown vocabulary. The study revealed that students with access to MED definitions significantly outperformed those without any definitions, yet no significant differences were noted between students using ChatGPT-generated definitions and those with MED definitions or without any definitions.
Investigating ChatGPT’s effectiveness in acting as a lexicographer and in producing dictionary entries represents one approach to exploring its potential. Yet, it is possible to imagine a post-dictionary future, where dictionaries “will at best be subsumed within, at worst gobbled up by, other digital tools” (de Schryver, 2023, p. 380) and traditional ways of presenting lexical knowledge in dictionary-like format will no longer be needed (de Schryver, 2023, p. 380). In such a future, users may no longer have to struggle with the various challenges inherent in consulting existing dictionaries such as finding, interpreting and applying information. Although dictionaries are unlikely to soon disappear completely, there may come a point when they are outperformed by large language models (LLMs) in providing language assistance. Given this possibility, it is worth investigating ChatGPT’s capacity to act not as a lexicographer producing dictionary entries, but as a language support tool capable of performing various language tasks in response to user prompts.
The present study aims to contribute to the existing lexicographic literature by addressing a key question for the emerging lexicographic landscape: Can ChatGPT effectively perform language tasks that would traditionally be performed with the aid of a dictionary? A secondary aim of the study is to compare the effectiveness of three ChatGPT models: 3.5, 4 and 4o, thus providing additional insights into the usefulness of the tool. To test the effectiveness of ChatGPT vis-à-vis dictionary consultation, the present investigation draws upon 10 published user studies (see the references) involving dictionaries, where experiment participants performed specific language tasks with a dictionary’s assistance. All of the studies appeared in the International Journal of Lexicography or as monographs, with a key selection criterion being the availability of the instrument used for measuring participants’ performance. Additionally, care was taken to ensure that both productive and
receptive skills were investigated. The tasks from those studies were then submitted to ChatGPT. To ensure comparability between the performance of ChatGPT and that of dictionary users in the original studies, the prompts given to ChatGPT closely mirrored the instructions provided to the participants in the original studies. However, dictionary entries used in the original experiments were not included in the prompts, the only exception being the study by Lew (2004), as the words tested there were pseudowords. The results reported in the original studies served as benchmarks against which ChatGPT’s performance was compared. The study’s findings, to be presented for the first time at the
EURALEX congress, will demonstrate ChatGPT’s strong potential as an alternative to traditional dictionaries in a range of tasks typically associated will them. While its performance varies depending on the task, ChatGPT sometimes achieves a perfect score, outperforming traditional dictionary consultations. This suggests that the dawn of the post-dictionary world may soon be upon us.

Primary author

Presentation materials

There are no materials yet.