Speakers
Description
The paper introduces a hybrid methodology for cross-linguistic identification of phraseme constructions, developed within the scope of a pilot study on Croatian repetitive constructions. The study explores how artificial intelligence and corpus technologies can be systematically combined to uncover functionally equivalent patterns across languages. The proposed strategy rests on three interdependent layers: (1) the AI layer, which harnesses large language models to generate candidate constructions, paraphrases, and corpus query formulations; (2) the corpus layer, which provides empirical validation through frequency data, authentic usage, and syntactic patterns; (3) and the human expert layer, which supervises prompt engineering, interprets outputs, and ensures linguistic adequacy. These layers operate in an iterative workflow, enabling dynamic interaction between computational and expert insights. The methodology is exemplified through the analysis of the German construction X über X ‘X after X’, for which the Croatian equivalent X za X-om (e.g., dan za danom ‘day after day’) is identified as structurally and semantically appropriate. The study compares outputs of two LLMs (GPT-4o and o3), revealing performance differences in idiomatic sensitivity. It also demonstrates how LLMs can assist in filtering corpus concordances to identify phraseologically valid examples. The study highlights both the strengths (e.g., scalability, reduced expert workload) and limitations (e.g., LLMs’ sensitivity to prompt design and formal syntax) of the approach. It concludes that this layered strategy offers a viable path toward the semi-automatic processing of additional constructions and the development of multilingual phraseological resources.