Conveners
Parallel sessions 2 (Sonce hall)
- Mojca Stritar Kuฤuk
Parallel sessions 2 (Sonce hall)
- Valeria Caruso
Parallel sessions 2 (Sonce hall)
- Philipp Stรถckle
Parallel sessions 2 (Sonce hall)
- Philipp Stรถckle
Parallel sessions 2 (Sonce hall)
- Slobodan Beliga
Parallel sessions 2 (Sonce hall)
- David Lindemann
Parallel sessions 2 (Sonce hall)
- Janoลก Jeลพovnik
Parallel sessions 2 (Sonce hall)
- Ana Frankenberg
The present research explores the use of large language models (LLMs) in digital lexicography, specifically for translating Italian multiword expressions (MWEs) into English and French.
The study aims to assess the capability of contemporary LLMs in providing accurate and reliable translation equivalents, examples and definitions of Italian MWEs into English and French, while also...
The COST Action โEuropean Network on Lexical Innovationโ (ENEOLI) has conducted a comprehensive survey in October-November 2024 regarding the methods, practices, tools, and resources used in the study and documentation of lexical innovations, including neologisms and novel senses. The 249 respondents from 50 countries represented linguists, lexicographers, terminologists, translators, software...
The purpose of the presentation is to explore the design and development of an innovative online pedagogical dictionary of Greek Sign Language, specifically tailored to the linguistic and educational needs of Deaf and Hard-of-Hearing (DHH) learners in Greece. Emphasizing accessibility and pedagogical usability, the dictionary integrates Artificial Intelligence (AI) technologies to support...
The paper introduces a hybrid methodology for cross-linguistic identification of phraseme constructions, developed within the scope of a pilot study on Croatian repetitive constructions. The study explores how artificial intelligence and corpus technologies can be systematically combined to uncover functionally equivalent patterns across languages. The proposed strategy rests on three...
The focus of this paper is on Generative Artificial Intelligence (GenAI), chatbots and some implications for lexicography and dictionary use. It has been well documented that chatbots originally tended to โhallucinateโ if they did not have an answer to the prompt put to them. Much larger training databases have, however, been developed and chatbots have become more accurate. Multiple...
This study explores the use of several chatbots based on recent generative large language models for automatic term extraction (ATE) from smaller text samples. The samples were selected from three domains: board games, ice hockey, and kitesurfing; and they cover three languages: English, French, and Portuguese. We used four prompting strategies: zero shot, one shot, few shots, and few shots...
In this paper we show how the academic content and computational tools featured in Lexicom form a parallel history of the last 25 years of innovation in lexicography. Lexicom is a 5-day intensive workshop offering handson training in corpus-based dictionary creation, from collecting and annotating language data to publishing the final product. Since it was launched in 2001, by Sue Atkins, Adam...
This paper presents two tasks involving large language models (LLMs)โGemini-2.0-flash and GPT-4oโused to generate distractors (i.e., incorrect options) for synonym and collocation questions in a language game. The lexical data for both tasks was sourced from the Digital Dictionary Database of Slovene (DDDS). Prompts were initially tested on a sample dataset with both models, and the...
Studies comparing dictionary entries generated with AI with those of well-established dictionaries edited by lexicographers show that LLMs tend to perform better in some tasks (e.g. writing definitions) than in others (e.g. word-sense disambiguation (e.g. Nichols 2023, Lew 2023, Jakubรญฤek & Rundell 2023, Rees & Lew 2024). One of the problems resulting from the latter is that of โfalse...
While the move to the digital design of lexical resources has, in principle, enhanced the physical and sensory accessibility of dictionaries, a lack of adherence to accessibility standards such as WCAG 2 (Web Content Accessibility Guidelines) (Campbell et all 2023) can introduce significant barriers (NCD 2006; Botelho 2021). These barriers often hinder access to the information and...
This paper presents a long-term privately-funded programme focusing on collecting of timestamped monitor corpora in a wide range of (currently 25) languages. These corpora are primarily designed for researching linguistic trends (including neology) and language change over time. They are available through the Sketch Engine platform and vary significantly in size โ from 3 million tokens for...
This paper investigates the potential of LLMs in supporting lexicographic work on non-standard linguistic varieties using data from the Dictionary of Bavarian Dialects in Austria (WBร). Based on approx. 2.4 million digitized and TEI-encoded dialect paper slips published via the Lexical Information System Austria (LIร), we construct a domain-specific corpus and evaluate LLMs in semantic...
Corpus-based conceptual analysis for the Humanitarian Encyclopedia (HE) grapples with vast amounts of lexical data to describe the meaning of key humanitarian notions and detect conceptual variation among actors (Odlum & Chambรณ, 2022). By building on Frame-based Terminology (Faber, 2015, 2022), the HE is incorporating qualitative methods necessary to subsume lexical data into manageable...
This paper explores the theory of measuring vocabulary size, including the various methods that can be used and the parameters that have to be set. We have examined the experiments carried out on English and Dutch. Gouldenet al. (1990) claims the average native speaker knows about 17,000 English base words (non-derived words). Keuleers et al. (2015) and Brysbaert et al. (2016) claim the...
We present a collection of monolingual text corpora derived from the steno protocols of 30 parliamentary chambers across 22 EU member states, covering 20 languages. The corpora are continuously and automatically updated, enabling intralingual and cross-lingual analysis of parliamentary discussions. Each chamberโs protocols are regularly downloaded, processed, and transformed into a unified...