Speakers
Description
The Dutch Language Institute (INT) has a long tradition compiling historic and contemporary dictionaries and other types of lexicographic databases, mainly for Dutch but also for some other languages with a relation to Dutch. Lexicographic work at the institute is computer-supported but there is still a great deal of manual work involved. Therefore, INT is exploring how new technologies (including LLMs) can be used for optimising different parts of the lexicographic work without compromising data quality and reliability. After a brief overview of various pilot studies conducted at the institute, we will take a closer look at how we can make the implementation of Hanks’ Corpus Pattern Analysis procedure (as it is used in the context of the project Woordcombinaties) more intelligent. This way, we hope to ultimately realise Patrick Hanks’ vision that “it seems likely that a large part of the work that is currently being carried out by hand will be automated in the not-too-distant future” (Hanks 2013;247).