Nov 17 – 20, 2025
Bled, Slovenia
Europe/Ljubljana timezone

Exploring the constructicographic potential of lexicographic data and language models: The case of the Estonian Nominal Quantifier Construction

Nov 18, 2025, 2:30 PM
30m
Zrak hall

Zrak hall

Speakers

Heete Sahkai Geda Paulsen Ene Vainik Jelena Kallas Ahto Kiil Katrin Tsepelina Kertu Saul Arvi Tavast

Description

Constructicography, or the description of grammatical constructions in a lexicographic format, is an emerging field currently in the stage of developing and automating methods for treating large numbers of (semi-)schematic constructions. This study explores how existing lexicographic data and language models can be used to facilitate the constructicographic workflow. Our results suggest that (1) collocations and semantic relations represented in a lexicographic database can be used to identify the collexemes of constructions, that is, the lexemes occurring in the open slot(s) of schematic constructions, (2) BERT-based language models can be trained to identify instances of constructions in corpora, using collocations as the starting point to create appropriate training data, and (3) commercial large language models can be prompted to identify constructional instances, using a small number of examples. The identification of the collexemes and corpus instances of constructions provide several pieces of information that can be represented in constructicon entries: the meaning, form, frequency and productivity of constructions, the frequency and association strength of particular collexemes, the CEFR-level of the construction, etc.

Presentation materials

There are no materials yet.