Nov 17 – 20, 2025
Bled, Slovenia
Europe/Ljubljana timezone

The Hare and the Tortoise: Pipeline for Latvian Information and Communication Technologies Secondary Term Formation

Nov 19, 2025, 12:00 PM
1h
Lobby

Lobby

Speakers

Dace Šostaka Inguna Skadiņa

Description

POSTER

The Information and Communication Technologies (ICT) field has evolved rapidly in recent decades. Thus, to describe new devices, activities, and concepts that appear yearly, a vast number of terms are created primarily in English, while other languages rely on secondary term formation (STF) for ICT end-users (ETSI Guide, 2022). Systematic secondary rendering and dissemination up-to-date terminology in the target language (Chiocchetti and Ralli, 2013; Stefaniak, 2023) are crucial for language development and benefit professionals, students, and the public. We analysed the STF process in Latvian for the ICT domain during the development of the Language Technology (LT) course at the University of Latvia.

For over 30 years, the Terminology Commission of the Latvian Academy of Sciences (TCLAS, 2025) and its sub-commissions, including the Information and Communication Technologies Sub-Commission (ICTSC), have carried out term formation. ICTSC comprises of ICT professionals, terminologists, and linguists. ICT students also participate in meetings to approbate terms for the first time. The commission meets twice a month during the academic year. Terms are sourced from higher education, industry, and translating agencies, including the European Commission. They are added to the biweekly agenda, discussed, and, if accepted, recorded in an open-access Academic term database, available on the web since 2005 (ATB, 2025).

For the LT course, terms were manually extracted from lecture slides. Given ICTSC’s capacity to produce about 20 high-quality terms during a 2-hour meeting, terms were prioritised based on their relevance in the LT course. Identified terms were reviewed and defined, supplemented with usage examples and visuals. Possible Latvian term variants were proposed, with ICTSC members conducting preliminary written discussions, and 111 terms were accepted and are available in the Academic term database (ATB, 2025).

The STF process includes several challenges where AI tools could be applied. As the concept of the term is usually expressed most precisely in its definition, the most significant challenge is providing a clear definition for terms used in several ICT subdomains. Second comes weighing arguments for and against creating source-language oriented terms that can be easily back-translated and will be recognisable versus creating secondary terms that precisely reflect the definition but might be far from the direct translation of the original term (e.g., Bag of Words). The third challenge is the length of the term and euphonism – how easily it can be pronounced. As a rule of thumb, the longer the term, the less likely it will be used in spoken communication, and the direct calque will be used.

The STF process was researched (Šostaka et al., 2023), and several approaches were tested to speed up “mechanical” parts of the term creation. The first approach was using an AI tool (ChatGPT 4.0) on 140 concepts and terminology units within ISO/IEC 22989:2022(en), searching and then evaluating suggestions for STF in Latvian and comparing them to the terms already approved by the Terminology Commission (Šostaka et al., 2025). Out of 140 concepts, 75 terms had an exact match, 65 had a partial match, while 5 had no match.

The second approach was checking the time saved using a tool for term extraction from online dictionaries (Šostaka et al., 2024). The tool allows to review user-specified sources (e.g., Merriam-Webster dictionary) on the Internet, related to ICT terms; it is scalable, and it is possible to add sources of the user’s choice in other fields and languages. It allowed us to save 74 minutes when searching 40 terms, as opposed to 106 minutes needed for a manual search.

Presentation materials

There are no materials yet.