Speakers
Description
Constructicography, or the description of grammatical constructions in a lexicographic format, is an emerging field currently in the stage of developing and automating methods for treating large numbers of (semi-)schematic constructions. This study explores how existing lexicographic data and language models can be used to facilitate the constructicographic workflow. Our results suggest that (1) collocations and semantic relations represented in a lexicographic database can be used to identify the collexemes of constructions, that is, the lexemes occurring in the open slot(s) of schematic constructions, (2) BERT-based language models can be trained to identify instances of constructions in corpora, using collocations as the starting point to create appropriate training data, and (3) commercial large language models can be prompted to identify constructional instances, using a small number of examples. The identification of the collexemes and corpus instances of constructions provide several pieces of information that can be represented in constructicon entries: the meaning, form, frequency and productivity of constructions, the frequency and association strength of particular collexemes, the CEFR-level of the construction, etc.