Speakers
Description
The use of corpora is well established in lexicography, also in Estonia, but since the analysis of corpus data and the post-editing of automatically generated data from the corpus is labour-intensive, the use of large language models (LLMs) has led to growing interest in lexicography (e.g., Evert et al. 2024; Kosem, Gantar et al. 2024; Tiberius et al. 2024). In 2024, the Institute of the Estonian Language launched a project in which we explore how LLMs can assist in compiling dictionary entries (e.g., definitions, register labels, examples).
In the first year, we tested whether LLMs can help lexicographers in the task of explaining word meanings in Estonian, a language with around 1 million speakers and underrepresented in LLMs. The results showed that lexicographers rated 85% of the GPT-4o (highest rated LLM in the study) generated meaning descriptions as useful or somewhat useful for their work. While our first study focused on lexicographers’ preferences and requirements for LLM-generated definitions, in the current study we concentrate on users’ preferences and requirements for both, LLM-generated and lexicographer-compiled definitions.
According to a survey conducted in 2023 (Langemets et al. 2024: 750-751), the Estonian Language Institute's language portal Sõnaveeb (Koppel et al. 2019) is searched most for information on meanings. This coincides with the results of a pan-European study (Kosem et al., 2019), according to which meanings in general are the most searched units in dictionaries. However, both studies were carried out before the wider use of LLMs. No research has been carried out on the Estonian language to investigate whether and how preferences for obtaining information about meanings have changed with the increasing use of LLMs. In the presentation, we will introduce the results of a survey carried out among the users of Sõnaveeb, where LLM generated definitions were presented side-by-side to lexicographer compiled definitions, and users had to mark their preference and list the reasons for it. The evaluation is conducted blindly, with users not being informed which explanation is human-made. The lexicographic meaning descriptions used in the survey are the definitions from the the EKI Combined Dictionary (Tavast et al. 2020), which is the backbone of Sõnaveeb and presents a monolingual detailed description of meaning that defines the content of the concept as exhaustively as possible. Words from different parts of speech and with varying degrees of polysemy were included in the study.
We tested the following LLMs: GPT-4o, o1mini, Claude 3 Opus, Claude 3.5 Sonnet, Gemini 1.5 Pro, Gemini 2.0 ja Euro LLM. Based on expert evaluations, the best-performing model was selected for the final user test. In the presentation, we introduce the tested prompts and examine how users’ dictionary and LLM usage habits relate to their preferences. But mainly, how do users rate the LLM-generated definitions, and do they prefer them to the ones lexicographers compiled? What do lexicographers still do better than LLMs, and what, intriguingly, do users believe LLMs do better than lexicographers?