Nov 17 – 20, 2025
Bled, Slovenia
Europe/Ljubljana timezone

Accelerating the lexicographic process with automatic methods and AI

Nov 19, 2025, 12:00 PM
1h
Lobby

Lobby

Speakers

Nathalie Norman Nicolai Hartvig Sørensen Jonas Jensen Kirsten Appel Sanni Nimb

Description

POSTER

Writing dictionary entries is not only time-consuming but also an expensive process due to the highly specialized knowledge and experience required of the lexicographer. To facilitate the task of compiling the Danish monolingual dictionary DDO (ordnet.dk/ddo), we aim to establish an automatic assistant based on applied language technology (e.g. n-gram analysis, word embeddings, etc.) and generative AI. DDO contains 105,000 lemmas and is continuously updated with new lemmas twice a year. In this presentation, we focus on morphological and phonetic information in the dictionary, on synonyms and finally on an experiment with automatic writing of definitions.

The assistant, which we have named the Article Accelerator, automatically generates XML-tagged drafts of the subsections of a complete dictionary article in DDO. When the assistant gets a new word for the dictionary as input, it will automatically present suggestions for inflection, phonetic transcription, and synonyms. We assume that most new words in our case are compound nouns. In Danish, these are usually written together as a single word, and we therefore base the suggestions on a compound splitter. If the final part of the compound is already described in the dictionary, the assistant extracts the conjugation paradigms from the relevant entry or entries, and the user (i.e. the lexicographer) can then choose the appropriate one. Likewise, the assistant extracts the phonetic transcription for all subparts of a compound word that can be found in the dictionary. Lastly, synonyms are found by using both word embeddings and an LLM to get a list of synonym candidates. If a selected candidate already exists in the dictionary, the assistant can help create the necessary links and ID numbers.

The core of the Article Accelerator, however, is the module that generates suggestions for sense definitions based on existing definitions for semantically similar or related senses in the dictionary. These are found by combining compound splitting with a word embedding model. However, it is the user (i.e. the lexicographer) who selects the final list of senses, which are then included in the input to a generative model.

The goal is for the model to produce new definitions that reflect the style of the dictionary and require only minimal post-editing by the lexicographer. To find the optimal combination of prompt and generative model, we perform an experiment with fully edited but unpublished monosemous lemmas from DDO. We test two different prompts on three models (ChatGPT 4o, Claude 3.7 Sonnet, Llama 4 Scout) and manually compare the model's output with the definition written by a lexicographer.

The manual evaluation is carried out by two experienced lexicographers. This gives us knowledge about the quality of the automatic definitions and gives us the best conditions for choosing the ideal prompt and model.

Presentation materials

There are no materials yet.