Speaker
Description
The task of automatic detection of idiomatic expressions such as proverbs is an established problem in natural language processing. Before the advent of large language models, attempts were made to describe proverbs by modelling their syntactic structure (Rassi et al., 2014). Later, others employed contextual embeddings and neural networks to identify idioms (Škvorc et al., 2022) which is a task closely related to proverb detection.
This research effort aims to analyse the performance of the ChatGPT large language model (ChatGPT 4o) in the task of detecting proverbs and proverb-related expressions. As proverbs are often used in political discourse to underscore messages or augment arguments and points of view (G¡ndara, 2004), the research presented here will use the minutes of the Croatian parliament sessions made available by the Croatian parliamentary corpus ParlaMeter-hr (Dobranić et al., 2019) to build a list of proverbs occurring in contemporary discourse.
A list of 151 Croatian proverbs used in contemporary speech and texts was obtained from (Varga & Matovac, 2016) and other sources. Proverbs are mostly used as idiomatic expressions, with little variation. This fact was used to create a custom simple fuzzy search algorithm, which was then applied to a small section of the ParlaMeter-hr corpus to extract sentences which contain proverbs. The extracted list was further manually checked and verified. This simple search technique yielded 126 confirmed occurrences of sentences which contained proverbs.
The next step included prompting GPT-4o with a combination of prompts to determine its ability to detect proverbs, using both the chat and API interface. The prompts ranged from a very simple zero-shot to elaborate instructions with accompanying list of proverbs.
It was discovered that GPT-4o created a list of Croatian proverbs as a response to the chat based zero-prompt which contained only 12 items. Uploading the list of proverbs resulted in only 54% accuracy. API prompt returned better results, the zero-shot prompt reached 79% accuracy in under 5 minutes, while the most elaborate many-shot prompt using the curated list of proverbs reached 94% accuracy, but took over 120 minutes at an increased financial cost.