Speakers
Description
This paper introduces the Dictionary of Contemporary Serbian Language (RSSJ), an ongoing large-scale digital lexicographic project designed to serve both human users via web and mobile applications and machines through APIs. Coordinated by the diaspora association “Gathered around the Language” and the Society for Language Resources and Technologies (JeRTeh), RSSJ aims to produce a dictionary of approximately 50,000 frequently used words, reflecting vocabulary used over the past fifty years across diverse functional styles. The headword list is automatically extracted from corpora (SrpKor2013, SrpKor2021), then manually curated and enriched with data from the LeXimirka database. The project implements advanced automation at multiple stages, employing language models and static embeddings (Word2Vec, FastText, Dict2Vec) to identify synonyms, while large language models assisted in generating draft definitions. Additional methods include automated extraction of collocations, syntactic patterns, and exemplary usage via GDEX algorithms, all managed within a DMLex-inspired PostgreSQL data model. The custom web interface enables seamless integration of dictionary editing and corpus querying. Preliminary results demonstrate that automated drafting accelerates to some extent dictionary development, requiring at the same time lexicographers to adopt more dynamic, data-driven workflows and redefine traditional lexicographic practices.