Nov 17 – 20, 2025
Bled, Slovenia
Europe/Ljubljana timezone

Identifying the Most Representative Phraseological Units Using Language Corpora and Artificial Intelligence for Lexicography: The Case of Slovenian Comparative Phrasemes

Not scheduled
30m
Sonce hall

Sonce hall

Speakers

Matej Meterc Nataša Jakop

Description

In preparing phraseological units for the third edition of the Standard Slovenian Dictionary (eSSKJ), the authors aimed to identify the most relevant comparative phrasemes in the contemporary standard language using objective corpus-based criteria. A key goal was to determine how representative specific phrasemes and their variants are in actual use. Two lists of the hundred most frequent comparative phrasemes with the structure adjective + kot ‘as’ + noun (e.g., bel kot sneg ‘white as snow’) were extracted from the metaFida v1.0 corpus and CLASSLA-web.sl 1.0 corpora. The twenty most frequent were analyzed in greater detail. The results were compared with the Database of Comparative Phrasemes compiled from older dictionaries and collections, as well as with entries in eSSKJ. Artificial intelligence was also used experimentally to identify representative comparative phrasemes, with up to 80% alignment with expert choices.

Presentation materials

There are no materials yet.