The look-up behaviour of dictionary users has an established place in lexicographic research (Bergenholtz & Johnson, 2005; Lemnitzer, 2001; Lorentzen & Theilgaard, 2012; Trap-Jensen et al., 2014). It has been used with some success to improve the quality of the interaction between the dictionary and its users, such as through discovering users’ typical search patterns, their strategies and...
Sign language lexicography, a nascent subfield, remains relatively unexplored, primarily owing to the unique attributes of sign languages (McKee & Vale, 2017). The scarcity of sign language dictionaries is attributed to linguistic, financial, and social challenges (Vacalopoulou, 2020), with limited resources available since the pioneering Dictionary of American Sign Language on Linguistic...
The COVID-19 pandemic has impacted numerous sectors at different levels and has imposed a radical change in the pace of life in societies across the globe. A partially technical vocabulary related to COVID-19 quickly became part of everyday life, introduced mainly by news and official bodies. To describe the characteristics of the terminology being disseminated in Brazil, the project Study and...
Pashto. Pashto is an Eastern Iranian language spoken in Afghanistan, Pakistan and by a large diaspora community across the globe. It is one of two official languages of Afghanistan and a regional official language in Pakistan’s Khyber Pakhtunkhwa Province. With about 15 million native speakers in Afghanistan and 30 million in Pakistan it is the second-most spoken Iranian language after...
Portuguese is the official language of nine countries and one territory. However, given the socio-historical contexts of these countries, its functional status varies greatly. In Brazil, Portugal, and São Tomé and Príncipe (Hagemeijer, 2018), Portuguese is the mother tongue for the majority of the population. In Angola and Mozambique, it is the majority vehicular language, typically as a...
We would like to introduce the results of the ELDI project (Electronic Lexical Database of Indo-Iranian Languages, Pilot module: Persian), launched in August 2020. One of the aims of the project was promoting the use of technologies in teaching languages. A website and a mobile application with the Persian–Czech dictionary were developed as the main planned results of the project. A new...
The objective of this study is to investigate how learners of Italian as a second or foreign language search for new meanings in online Italian dictionaries. Using eye-tracking technology, we carried out experiments inviting users to do exercises on ‘combinations of words’ while they consulted various dictionaries, including De Mauro – Internazionale and Garzanti Linguistica. Results should...
This paper presents an innovative lexicographic approach embedded within an online resource currently under developement: ALMA: Linguistic Multimedia Atlas of Bio/Cultural Food Diversity. ALMA serves a dual purpose: firstly, to showcase linguistic diversity through culinary practices, and secondly, to scrutinise food marketing strategies through the analysis of language and paralanguage on...
A discussion of technical and editorial considerations in producing a 1800page hardback dictionary containing 30,000 headwords from an online database of 48,000 headwords. 248The Concise English-Irish Dictionary (CEID), published in 2020 and the first major English-Irish dictionary published in print form since the 1950s, is a 1800page hardback dictionary containing 30,000 headwords and 80,000...
The recent development of the Curriculum for Teaching Greek as a Heritage Language: A Framework for Teachers underscored the need for a dictionary to serve as supplementary material during the curriculum’s implementation at Greek Community schools in the USA. This presentation aims to introduce the Greek Heritage Language Learners’ Illustrated Lexicon (Helix), an online, bilingual, illustrated...
Although the ‘Circular Economy’ has been widely discussed in the media for years, general dictionaries still do not provide the relevant definitions and/or collocations. We show by examining dictionary definitions that many salient words used in this field have undergone varying degrees of semantic broadening in the 252general language. Current terminological needs often dictate more precise...
The paper presents a New Serbian-Russian dictionary and the main principles of its development. We use the most recent explanatory dictionary of Serbian, published by the Serbian Academy of Sciences and Arts in 2018, as a starting 253point. However, we refine both the word list and the entry structure to meet the requirements of a bilingual edition. We consult text corpora of modern Serbian to...
In this article, we return to a classic lexicographical topic and address some aspects involved in the practice of defining. Digital developments have increasingly required dictionary definitions to operate independently of others if they are to be utilised in new contexts, possibly even detached from the original dictionary presentation. We examine two types of definitions where the problem...
In 2023, the Institute of the Estonian Language, in collaboration with the Center for Applied Anthropology of Estonia, conducted a user experience survey aimed at understanding the habits, needs, and attitudes of users of the language portal Sõnaveeb (‘Word Web’) and preparing for the publication of the Dictionary of Standard Estonian (DSE) in 2025. This paper addresses prescriptive and...
It may seem obvious to state that tracing the history of a language involves consulting lexicographical works of all kinds, but the truth is that specialized lexicographical compilations, i.e., those referring to the specialized languages of a particular field of knowledge, have not always been duly considered in the diachronic study of language. In this contribution, we aim to present a...
Introduction The research approach to semantic development in first language acquisition (FLA) remains predominantly enclosed in traditional lexicographic terms and notions (usually simplified). This viewpoint doesn’t adequately span the lexical system’s complexity and fails to present the mechanisms and processes involved in its development. Since FrameNet possesses standardized methods and...
Some (prescriptive) dictionaries do not include recently borrowed lexemes, while other descriptive ones treat them like older words or (‘native’) neologisms formed within the given language. The question of inclusion/exclusion is especially relevant in cases where a ‘native’ neologism in a language and a newly borrowed word are in fact (near)-synonyms; compare, for example, German downloaden –...
The paper presents a project devised by Georgian and Hungarian lexicographers which aims at improving dictionary use skills and dictionary culture in Georgia and Hungary. The project is based on previous experience, studies and findings of its authors at Ilia State University (Georgia) and Károli Gáspár University of the Reformed Church in Hungary. The feedback gathered from theoretical...
The European Network on Lexical Innovation (ENEOLI, CA22126 – www.cost. eu/actions/CA22126/, October 2023 – October 2027) is a COST Action seeking to address the lack of comprehensive, multilingual, and globally focused research on neology. As of July 2024, 252 members from 48 different countries have been participating in the Action. The main goal is to establish a network of researchers...
In this paper, we report on our development of a multi-level analysis framework that allows us to assess AI-generated lexicographic texts on both a quantitative and qualitative level and compare them with human-written texts. We approach this problem through a systematic and fine-grained evaluation, using dictionary 254articles created by human subjects with the help of ChatGPT as an example....
The paper outlines one of the results of the project dedicated to one of the endangered Kartvelian languages, especially Megrelian. Providing data collection and documentation through fieldwork implemented in Samegrelo (Georgia), the project aims to comprehensively document the Megrelian language and encompasses the development of the annotated corpus, sketch grammar, and a bilingual...
About DANTE DANTE (Database of Analysed Texts of English) was initially developed in the years 2008–2010 (Atkins, Kilgarriff & Rundell, 2010) by a lexicographic team led by Sue Atkins, Adam Kilgarriff, Valerie Grundy and Michael Rundell. It was commissioned by Foras na Gaeilge, a governmental agency promoting the use of Irish language, for the purposes of the development of the New English...
Czech Dictionary Express has been introduced as a project of a semiautomatically made dictionary of the Czech language. The Dictionary Express method (formerly known as rapid dictionaries) has been used for several different languages. In this paper, we analyse the automatic and manual tools used in Czech Dictionary Express and inspect the statistical and qualitative data such tools provide....
Introduction In 1911, Berlin missionary Karl Heinrich Julius Endemann, published his dictionary of the Sotho language Wörterbuch der Sotho Sprache, 1911. This dictionary faced scholarly neglect due to its rare combination of source and target languages, i.e., Sotho and German respectively, and also its missionary focus. Obsolete orthography, high user skill demands, and a lack of alignment...
The objective of this paper is to illustrate, through the examination of sample entries, the methodology employed in the creation of a prospective pilot corpusbased dictionary of Serbian as a second language, drawing on advancements applied in other similar projects for different languages (e.g., François et al., 2014; François et al., 2016; Klemen et al., 2023). While Serbian is spoken as the...
Spoken language is the prerequisite of written standard languages for living language communities. Yet written sources dominate lexicographic description of standard languages, and awareness of dictionaries that specifically source speech seems limited. In Norsk Ordbok (The Norwegian Dictionary), and in the Language Collections on which the dictionary is based, oral materials are perceived as...
The impact of artificial intelligence on language learning tools and specifically dictionaries has seen a significant shift with the advent of generative AI and chatbot technologies (De Schryver, 2023; Lew, 2023; Łodzikowski et al., 2024; Rees & Lew, 2024). We report on a study comparing the use of a mobile dictionary (Longman Dictionary of Contemporary English) and ChatGPT—an innovative...
The paper introduces a web portal of integrated dictionaries for Bulgarian. The mapping among the resources is lemma-based. Two dictionaries are in the centre of this integration – an Inflectional dictionary of Bulgarian, since Bulgarian is a morphologically rich language, and a Wordnet of Bulgarian - BTB-Wordnet, since it adds the level of lexical meanings to the dictionary-enhanced...
Introduction
Technical languages contain expressions that are not universally understood. We call non-lexical entities (NLEs), i.e., single- or multi-word expressions not listed in domain dictionaries. These are especially difficult to differentiate from lexical entities, when domain dictionaries are small or incomplete, which is often the case for low-resource languages. The medical domain...
Mastering idiomatic language in its broadest sense is necessary to achieve advanced levels in language learning. Therefore, phraseological information should be quickly and easily available to language learners. To this end, the Dutch project Woordcombinaties (Word Combinations) is developing an integrated lexicographic resource combining a collocation and idiom dictionary with a pattern...
In this paper, we present the search and visualization interface of the Croatian derivational lexicon ‒ CroDeriv. CroDeriv contains information on the derivational and morphological properties of Croatian lexemes. Each lemma in the lexicon is enriched with its word-formation analysis and morphological segmentation. The search interface enables simple and advanced queries, i.e., by lexemes, by...
Indonesian has been designated as the 10th official language of UNESCO General Conference. Consequently, the language development, including word update and dictionary management, is inevitable. So far, the Indonesian Comprehensive Dictionary (KBBI) has been open to the loanwords from other languages, including foreign languages and local languages. This paper compares (i) East Asian loanwords...
The incorporation of images in dictionaries has been addressed in several papers (cf. Biesaga, 2016; Klosa, 2015). Estonian lexicography has a long tradition of including visual materials in learners’ and terminological dictionaries. However, until recently, there was no picture dictionary for learners of Estonian as an L2 that is accessible as a separate resource and simultaneously linked to...
This study discusses the possibilities of expanding the scope of the largest Estonian dictionary – the EKI Combined Dictionary – with various types of constructional information. Designing a representation of constructions essentially means building a constructicon. The study starts with a short overview of existing constructicons and the main challenges their creators have faced so far. We...
This article examines advances in phraseomatics and digital phraseography through the DiCoP project and its DiCoP-Text corpus, aimed at enriching linguistic models and machine translation. The project evaluates the frequency of use of phraseological units (PUs) and improves their translation in different contexts, drawing on recent research in phraseotranslation and natural language processing...
The application of crowdsourcing in the creation of educational resources, understood as the gathering of collective intelligence for pedagogically-oriented tasks, has garnered considerable attention in recent years. Advanced internet technologies facilitate collaborative content creation between learners and educators, potentially enhancing the learning experience. Crowdsourcing has emerged...
In order to correctly use a word in a foreign language it is not enough to know “its meaning” (i.e., its translational counterpart in the native language). It is also necessary to identify the appropriate contexts of the word’s use which often differ from those of its counterparts in other languages. Bilingual dictionaries cannot represent all the contextual properties that distinguish entry...
This software demonstration presents a data model and a first use case for the representation of text corpus data on a Wikibase instance, including morphosyntactic, semantic and philological annotations as well as links to dictionary entries. Wikibase, an extension of MediaWiki, is the software that underlies Wikidata, an exceptionally large crowdsourced queryable knowledge graph, which...
This study presents a project aiming to make thesaurus data available under an academic licence. The project is based on the printed thesaurus Den Danske Begrebsordbog (DDB) which covers approx. 80% of the Danish dictionary DDO (ordnet.dk/ddo). It presents more than 100,000 different words and expressions categorised and ordered semantically in 22 thematic chapters, and 888 named sections. The...
We present a study which was carried out with teacher students of mathematics. They were asked to create either dictionary articles or concept maps for terms from an introductory lecture in their first semester. Based on the students’ submissions, we investigate whether there is a difference in the learning outcomes between the two tasks and also whether the technical means used to solve these...
The publication by Pope Francis’s encyclical letter, Laudato Sì, in 2015 addressing the climate crisis marked a major intervention by the Catholic Church in environmental debates. The letter encompassed multiple topics including climate science, consumerism, throwaway culture, poverty, and integral ecology. Given the global reach of both climate change and Catholicism, effectively...
Terminology has traditionally focused on denotative meaning, reflecting its historical commitment to establishing clear, universally accepted definitions. However, it has generally failed to acknowledge the presence of connotation within specialized discourse. Drawing from ongoing projects, such as EcoLexicon14 and the Humanitarian Encyclopedia15, we explore the fuzzy boundaries of connotation...
We present a demo of MWE-Finder, an application that enables a user to search for (flexible) multiword expressions (MWEs) in Dutch text corpora (Odijk et al., 2024). We will show many different examples in the demo, but here we show one example.
A multiword expression (MWE) is a word combination with linguistic properties that cannot be predicted from the properties of the individual words or...
For Latvian linguists, the study of slang was not a topical matter until 1970. The literary language and dialects have always been perceived as research priority, and the non-literary language was not considered an object of serious scientific work for a long time. There was a more or less pronounced derogation of the non-literary language. Only a few enthusiasts showed scientific interest in...
Introduction
Dictionaries, traditionally perceived as linguistic repositories, have evolved to practical tools incorporating a diverse range of features, with one notable addition being the inclusion of pictures (see Gouws et al., 2013; Liu, 2015; Biesaga, 2016; Lew at al., 2018; Dziemianko, 2022, to name just a few). This study delves into the role of pictorial illustrations in monolingual...
Understanding the semantic value of linguistic utterances is crucial for linguistics, lexicography, automatic text interpretation, and various NLP tasks. To address subtle variations within the semantic level, as is well known, machines retrieve stored data from corpora, lexicons and terminologies, and are equipped with taggers and rule-based systems. We already have tools for the development...
This paper introduces “Synonyms in Contrast”, a new online dictionary that addresses the complexities and nuances of neologistic (near-) synonyms in the German language. The emergence of new lexical items, often borrowings from English, has contributed to the proliferation of meaning equivalents. These share a large extent of contextual features, causing ambiguity and uncertainty among...
Multi-word expressions are a heterogeneous linguistic category which constitutes a significant part of everyday communication and they include linguistic constructions consisting of more than one word, such as idioms (e.g., kick the bucket), binomial expressions (e.g., bread and butter), phrasal verbs (e.g., turn on/off), fixed/conventionalized expressions (e.g., have a nice day) and...
This article presents the compilation of entries for several foreign languages, namely English, Italian, Latin and German of the Contemporary Slovene Dictionary of Abbreviations (CSDA). The material for the compilation of CSDA has been collected in a time frame of twenty years, both manually from monolingual, bilingual, general and terminological dictionaries (always paired with the Slovene...
The Czecho(-)17Slovak Word of the Week was a joint year-long popularization project of the Institute of the Czech National Corpus and the Ľ. Štúr Institute of Linguistics of the Slovak Academy of Sciences, which was inaugurated on the occasion of the 30th anniversary of the dissolution of Czechoslovakia (January 1, 1993). Throughout the year, each week, a new entry written in parallel in Czech...
This paper introduces the Erasmus Mundus Joint Master in Lexicography – EMJM-EMLex – especially some new developments and objectives. The EMJMEMLex programme remains focused on lexicography, but is evolving into a multidisciplinary, digital discipline in order to adapt to societal and scientific changes. Founded in 2009, experience to date confirms that the success of EMLex lies in its unique...
This paper shows research potential of the virtual lexicographic laboratory VLL DLE 23 based on the text of the Spanish Explanatory Dictionary (DLE 23). Virtual Lexicographic Laboratories (VLL) is the effective tools for linguistic researches based on dictionaries. The lexicographic text is considered not only as a basis for dictionary creating and updating but also as a means of professional...
Good dictionary examples are hard to come by. Despite corpora growing larger and larger, lexicographers still have difficulties in finding good candidate sentences for exemplifying how the dictionary headwords are used in context. There are automatic methods available to address this time-consuming task. One such method is GDEX, a feature of the Sketch Engine tool (Kilgarriff et al., 2004),...
The paper reports a pilot study on the detection of lexical semantic variation in modern Swedish. The starting point of the study is the meaning descriptions of around 65,000 headwords in ’The Contemporary Dictionary of the Swedish Academy’ (SO, 2021) covering approximately 100,000 different senses. In our work, we aim to explore the potential of the latest computational methods to discover...
This presentation outlines the development process of DICIENS, a bilingual school science dictionary (English-Spanish/Spanish-English) designed for primary education students in Spain. DICIENS marks a pioneering initiative, filling a significant gap in educational resources and pedagogical lexicography. Rooted in the theoretical framework of Frame-based Terminology (Faber 2009, 2012), this...
This communication aims at discussing how syntagmatic constraints in the lexicon can be provided in lexicographic resources more effectively than has been done to date, covering a wide range of multi-word expressions: from compounds to collocations and phrasemes. Examples are taken from the ongoing implementation of a multilingual specialised resource called ALMA – Multimedia Linguistic Atlas...
Diretes is a Spanish monolingual e-dictionary based on Lexical-Semantic Relations which are formalized by Lexical Functions, a formal tool explored within the Meaning-Text Theory. This dictionary consists of a relational database which aims to reflect the cognitive links of the lexicon through a network of semantic and lexical associations. Currently it contains more than 100,000 collocations...
We present a study of Danish multiword constructions containing one or more hyphens, such as gas- og vandmester (‘gas- and water.repairman’; ‘plumber’), ilt- og brintatomer (‘oxygen- and hydrogen atoms’) and haveborde og –stole (‘garden tables and -chairs’). Although materially analogous, such constructions exhibit different semantics, falling – as we shall argue – into two distinct groups...
Terminology within the domain of environmental economics, a rapidly growing and changing sub-discipline of economics concerned with environmental issues, has been understudied in the literature on domain-specific languages. While, on the one hand, it presents the common features of specialized vocabulary, i.e., monoreferentiality, precision, economy and objectivity (Gotti, 2008; Scarpa, 2020),...
A common issue in Corpus Linguistics is assessing representativeness and balance of a corpus (McEnery & Hardie, 2011). Biber (1993, p. 244) defines representativeness as “the extent to which a sample includes the full range of variability in a population.” Assessment has been traditionally tackled quantitatively and qualitatively both in monolingual and bilingual settings (Stefanowitsch,...
Since 2019, the Institute of the Estonian Language (EKI) has been compiling the EKI Combined Dictionary (CombiDic). Our presentation concentrates on incorporating synonyms into the CombiDic using the dictionary writing system Ekilex (Tavast et al., 2018; Tavast et al., 2020), where we have two types of synonyms – full and partial. We acknowledge that full synonymy is a rare phenomenon within a...
The work with historical documents presents many challenges, not only because some sources are not well preserved, but also because grammar and spelling rules from older times were not always consistent. Still, these texts remain as a rich source of information from our history, and we could greatly benefit from the information that can be extracted from them. At the same time, the lack of...
Existing research on ChatGPT in lexicography is undoubtedly valuable. However, it has tended to focus on metalexicographic concerns rather than effectiveness in resolving user queries directly. Moreover, it has mostly dealt with general-purpose English lexicography, often ignoring other languages and specific purposes. Focussing on 33 L1 Spanish users completing an introductory training course...
Medicine is one of the specialized domains that is of particular interest to different communities of speakers, most of whom cannot be considered experts or semi-experts. Their interest in the domain lies in the fact that a certain level of medical knowledge is needed in everyday life, much like a basic understanding of legal concepts. As a prominent characteristic of the domain,...
In our poster presentation, we will present the results of the experiment that tests the potential of large language models (LLMs) in semantic analysis of Estonian. We will focus on LLMs’ ability to analyse polysemy and create definitions. In 2024, the Institute of the Estonian Language started a new project in which we are exploring how LLMs, such as GPT, can help with the presentation of...
Corpus Pattern Analysis, CPA, is a technique for identifying local semantic and syntactic information of a word and map it to its meanings. In verbs, it consists basically of the argument structure labelled with semantic types for each argument. CPA is used in several dictionary projects and allows systematic corpus analysis; however, it is extremely time-consuming. In this paper, we present a...
A medication package insert is a legal healthcare document with important information about medications. In Brazil, the National Health Surveillance Agency (ANVISA) requires two versions of the package insert: one for patients and another one for healthcare professionals. In this study, we manually evaluated the performance of an automatic frame annotator on a corpus consisting of 100...
The paper details the current state of an ongoing collaboration between Hungarian lexicographers and computational linguists. Our goal is to provide a comprehensive and consistent description of Hungarian adjectives, benefiting lexical semantics, lexicography and NLP. This thread of research focuses on identifying systematic semantic patterns of Hungarian adjectives and their typical...
This research proposes a step forward in the automatic identification and analysis of verbal idioms in Croatian. The use of the NooJ automated text processing tool, along with the MaCoCu corpus and the Online Dictionary of Croatian Idioms (ODCI), provides a robust framework for recognizing and categorizing these multi-word expressions (MWEs). The research comprises two parts: (a) creation of a...
Among historical and ancient languages, usually under-resourced due to the limited size of corpora and the scarce availability of digital lexical resources, Latin is relatively well documented, thanks to its high relevance to the history of Europe and to the study of Romance languages. As far as lexicography is concerned, several lexical resources are available in digital format, although this...
In the revision process of dictionaries, adding new headwords or new senses to already existing headwords is what typically receives the most attention. In this article, we bring into focus the intriguing dilemma of exclusion of headwords from the Swedish Academy Glossary (SAOL), which is still published in print versions. In the e-dictionary-era, removing headwords may seem questionable, SAOL...
In the introductory part of the presentation, the authors will present the Croatian Web Dictionary – Mrežnik project (Hudeček & Mihaljević, 2020; Hudeček, Mihaljević & Jozić, 2024). Mrežnik consists of three modules – the module for adult native speakers of Croatian, the module for students and the module for non-native speakers learning Croatian. These modules have different approaches to...
While there has been a number of projects focusing on early medieval Irish lexicography (Griffith et al., 2018), few have aspired to work towards comprehensive interlinking of textual and lexical resources. This is at least in part due to the morphological complexity and variation in Early Irish (c. 600–1200CE), compounded by the absence of an orthographic standard (Stifter, 2009). The...
CHAMUÇA (Cultural HeritAge and Multilingual Understanding through lexiCal Archives) is a pioneering initiative aimed at exploring the impact of the Portuguese language on Asian languages, rooted in the historical exchanges between Portuguese traders, colonists, and diverse Asian cultures. The impact of these interactions extends beyond historical remnants to the modern-day lexicon of Asian...
he semantics of body part nouns is particularly fascinating from the point of view of the evolution of word meanings, their metaphorical and metonymic derivation and, from a cross-linguistic standpoint, for the large amount of overlap between different languages. The status of BODY as a semantic prime, especially in the Natural Semantic Metalanguage (NSM) (cf. Wierzbicka, 2014; 2007) has never...
This article presents semantic information about contemporary standard Slovenian on the Franček educational language portal, which is aimed at primary and secondary-school students. The portal’s primary role is to enhance students’ dictionary skills as part of the national language education program and to introduce users to other linguistic resources, such as school grammars. The portal...
Historical language data can give us an insight into the conceptual and everyday world of past times. However, this insight very often only related to a small group of the society with a strong political and social influence. What the linguistic and social situation looked like for the majority of the population can usually only be guessed through the interpretation of others, as only a small...
What strategies are currently being applied in electronic dictionaries and terminology databases to gender representation, with a particular focus on feminine agentives? Starting with an overview of the state of the art as to gender studies in lexicography and terminology, in this paper we reflect upon the analyzed approaches in electronic dictionaries and terminology databases, collecting...
This paper accounts for a system of semantic fields that was developed in Iceland around the turn of the century. The purpose of the system was to help describe the semantic properties of the Icelandic vocabulary and to be a practical tool in lexicographic work. The system categorizes words into semantic fields, enabling nuanced organization and practical applications in monolingual and...
This article discusses the project, Dictionary of the Dubrovnik Idiom, conducted at the Institute for the Croatian Language. The project aims to develop a borndigital diachronic dictionary of the Dubrovnik idiom, covering the period from the 16th century to the end of the 20th century. The dictionary will be based on a historical corpus compiled within the project’s scope, featuring texts from...
The paper presents a count-based semantic vector space model for Ukrainian, which has been applied for the semantic change detection task. The approach assumes creation of multidimensional vector representations of occurrences for a particular lexeme or a group of related lexemes with further visual and quantitative analysis of the obtained semantic vector space. The multidimensional space has...
Cross-lingual embedding models act as facilitator of lexical knowledge transfer and offer many advantages, notably their applicability to low-resource and nonstandard language pairs, making them a valuable tool for retrieving translation equivalents in lexicography. Despite their potential, these models have primarily been developed with a focus on Natural Language Processing (NLP), leading to...
Dictionaries have traditionally served as more than mere repositories of words; they have aimed to sketch some of the relationships between words, including semantic, collocational, or hierarchical connections. However, the physical constraints of print media often limited their scope, restricting the depiction of these relationships to cross-references, exemplifications, and, in specialized...
The LBC-Platform (https://www.lessicobeniculturali.net) is a comprehensive lexical information system that aims to integrate various types of corpora and resources: dictionaries, concordances, monolingual Language for Special Purposes (LSP) corpora in different languages and LSP parallel corpora. Designed for users interested in cultural heritage, the platform provides free access to resources...
Despite the decreasing use of regional and local varieties of the Dutch language, there is a growing public interest in dialects in the Netherlands and Flanders. Several dialect associations strive to preserve the local dialect by creating lexicons, establishing spelling conventions, writing texts in their local dialect, teaching the dialect, and sharing knowledge about their local dialect...
We present the COR.SEM lexicon, an open-source semantic lexicon for general AI purposes funded by the Danish Agency for Digitisation as part of an AI initiative embarked upon by the Danish Government in 2020. COR.SEM describes the core senses of 34,000 Danish lemmas with formal semantic information, e.g., ontological type, hypernym, semantic frame, regular polysemy pattern, and polarity value;...
This study presents an innovative approach to crafting and enhancing Japanese lexical networks by incorporating large language models (LLMs), especially GPT-4o, utilizing data from Vocabulary Database for Reading Japanese to accommodate various proficiency levels. Through this process, we extracted a total of 137,870 synonym relations and 54,324 antonym relations, forming a network comprising...
NomVallex is a manually annotated valency lexicon of Czech nouns and adjectives that enables research into various language phenomena related to valency, including the comparison of valency properties of affirmative and negative forms of words. This paper presents new developments in the way the lexicon facilitates research into word-level negation, explaining the reasoning behind the proposed...
In this paper, we explore the possibilities and challenges of lexicographic treatment of pragmatic markers, specifically epistemic and evidential markers in Czech. Our starting point is a detailed comparison of how these expressions are treated in contemporary monolingual Czech dictionaries. Following this, we present the development of the SEEMLex lexicon of Czech epistemic and evidential...