Conveners
Parallel sessions 3 (Zrak hall)
- Miloลก Jakubรญฤek
Parallel sessions 3 (Zrak hall)
- Margit Langemets
Parallel sessions 3 (Zrak hall)
- Markus Kunzmann
Parallel sessions 3 (Zrak hall)
- Markus Kunzmann
Parallel sessions 3 (Zrak hall)
- Kris Heylen
Parallel sessions 3 (Zrak hall)
- Geraint Rees
Parallel sessions 3 (Zrak hall)
- Bรกlint Sass
Parallel sessions 3 (Zrak hall)
- Tomasz Michta
The paper outlines technological and methodological ways to arrange the dictionary parsing process. The Spanish Dictionary (Diccionario de la lengua Espaรฑola 23 ed. โ DLE 23) website (https://dle.rae.es/) serves as a basis for the research. First of all, asthe most complex multi-parameter lexicographic frameworks, explanatory dictionaries of national languages are of the most interest because...
Traditionally, historical textsโ optical character recognition (OCR) has primarily been conducted using specialised software such as Transkribus, eScriptorium, Kraken, and similar tools. To achieve accurate character recognition, these systems require extensive pre-training and the creation of a refined "ground truth" dataset. The comprehensiveness of model pre-training directly correlates...
Constructicography, or the description of grammatical constructions in a lexicographic format, is an emerging field currently in the stage of developing and automating methods for treating large numbers of (semi-)schematic constructions. This study explores how existing lexicographic data and language models can be used to facilitate the constructicographic workflow. Our results suggest that...
This paper reports on recent advancements in the development of the Mangalam Dictionary of Buddhist Sanskrit, the first corpus-driven dictionary dedicated to Buddhist Sanskrit. This is a low-resource, historical, and domain-specific language variety instantiated in South Asian Buddhist literature dating from approximately the first millennium CE. The paper focusses on advances in the...
Due to the policy of Russification in the 20th century, the Ukrainian language underwent an influx of Russianisms, among other forms of interference with its structure. Today, many Ukrainians require guidance regarding non-Russified usage, and a Large Electronic Dictionary of Ukrainian (VESUM, vesum.nlp.net.ua) is designed to meet this need. With a register of over 430,000 lemmas, it is the...
The Vienna Corpus of Arabic Varieties (VICAV) is a digital research infrastructure for the documentation and analysis of the linguistic diversity of Arabic varieties^. Integrating methods from language technology and the digital humanities, VICAV provides a modular, sustainable platform for the creation, management, and publication of heterogeneous language resources within a shared data...
The article describes the use of artificial intelligence in compiling English dictionary entries for a dictionary of abbreviations (Slovar krajลกav), published in 2025 and financed by the Slovenian Research and Innovation Agency (ARIS). Together with the Slovenian dictionary of abbreviations (Slovenski slovar krajลกav) published in 2023, the mentioned dictionary adopted a pioneering approach to...
ONLINE PRESENTATION
Technology has largely affected the way language learners seek information. Digital formats virtually superseded the paper dictionary (Ptasznik, Wolfer and Lew, 2024), online translators gained much importance (OโNeill, 2019), and web browsers became the first port of call (Kosem et al., 2019). Obviously, generative AI systems imitating human-like communication mark...
ONLINE PRESENTATION
In this presentation we describe the DICI-A (Dizionario delle collocazioni italiane per apprendenti), a new learner dictionary of Italian collocations.
The DICI-A includes ca. 11,000 collocations belonging to six syntactic relations: i. Verb + Direct object (mantenere una promessa, โto keep a promiseโ); ii. Adjective + Noun/Noun + Adjective, where the adjective is a...
ONLINE PRESENTATION
Taboo-language resources remain scarce for under-resourced languages like Afrikaans โ despite their clear relevance for natural language processing (NLP) and applications in artificial intelligence (AI). Although Afrikaans has a long-standing lexicographic tradition, it still lacks an open-access reusable lexical database for the taboo language. One of the most crucial...
ONLINE PRESENTATION
Taboo words present a challenge for a lexicographer to include and describe in a language resource, as they are forms of verbal violence. However, discarding offensive words from general-purpose lexicographic wordlists disregards the representation of an integral part of the mental lexicon. The present study aims at using lexicographic scenarios to jailbreak four GPT...
ONLINE PRESENTATION
This paper presents a corpus-based approach to compiling a bilingual Megrelian-English online dictionary. The Megrelian language belongs to the UNESCO Atlas of the Worldโs Languages in Danger group of โincreasingly endangeredโ languages, and faces a number of critical challenges, among them a lack of standardised resources, intergenerational transmission, and minimal...
This contribution focuses on the methodological aspects of the ICoMuTe project aiming to design a corpus-based multilingual terminology database for Intercultural Communication (ICC). The project seeks to explore how ICC terms relate to each other within six European languages (Dutch, English, German, French, Italian, Spanish), how these terms are connected to their scientific and cultural...