Euralex 2024

Name: Euralex 2024
Start: 2024-10-08T08:30:00+02:00
End: 2024-10-12T19:00:00+02:00
Location: Hotel Croatia

8–12 Oct 2024

Hotel Croatia

Europe/Warsaw timezone

Creating the Dataset of Croatian Verbal Idioms: Automatic Identification in a Corpus and Lexicographic Implementation

11 Oct 2024, 14:00

30m

Šipun Hall (Hotel Croatia)

Šipun Hall

Hotel Croatia

Paralel Sessions

Ivana Filipović Petrović (Croatian Academy of Sciences and Arts) Kristina Kocijan

This research proposes a step forward in the automatic identification and analysis of verbal idioms in Croatian. The use of the NooJ automated text processing tool, along with the MaCoCu corpus and the Online Dictionary of Croatian Idioms (ODCI), provides a robust framework for recognizing and categorizing these multi-word expressions (MWEs). The research comprises two parts: (a) creation of a dataset by utilizing the ODCI that allowed for a set of 898 verbal idioms to be compiled and annotated with linguistic features, including structure, morphological features, and variation patterns; (b) analysis of extracted data that provides insights into the lexicographical and linguistic significance of the idioms, such as variability, modification, and frequency of use. The study highlights the challenges posed by idiomatic variations and the verb’s role as the most variable component in idioms. For instance, the idiom “soliti pamet komu” (to give unsolicited advice) is often modified for expressiveness, such as in the phrase “having a big saltshaker to salt everyone’s mind.” The dataset aims for lexicographic integration into ODCI and supports the creation of electronic language resources. It also contributes to theoretical and cross-lingual research, with the CLARIN repository expected to enhance data reusability in NLP. The study’s findings offer a deeper understanding of verbal idioms’ dynamics and their computational processing.

Ivana Filipović Petrović (Croatian Academy of Sciences and Arts) Kristina Kocijan

There are no materials yet.

Euralex 2024

Creating the Dataset of Croatian Verbal Idioms: Automatic Identification in a Corpus and Lexicographic Implementation

Šipun Hall

Hotel Croatia

Speakers

Description

Co-authors

Presentation materials

Choose timezone

Euralex 2024

Speakers

Description

Co-authors

Presentation materials