Nov 17 – 20, 2025
Bled, Slovenia
Europe/Ljubljana timezone

Project of a Specialized Dictionary Website

Nov 19, 2025, 12:00 PM
1h
Lobby

Lobby

Speakers

Mykyta Yablochkov Alona Dorozhynska Iryna Ostapova Iuliia Verbynenko

Description

POSTER

The objective of the research is to develop a technology for converting specialized dictionary text into a website with a developed user interface.

The object of the study was “Dictionary of Ukrainian biological terminology” (7,342 entries and about 26,000 terms in Ukrainian, Russian and English), that contains definitions, terms polysemy, synonymy, stresses for Slavic languages, and grammatical information.

Since the dictionary text was available in digital publishing format (PDF), no prior digitization was required. Our approach is to step-by-step transform the linear text of a dictionary into a website. The basic steps are as follows:

  1. Dictionary text normalization: restoration of the text line that represents the dictionary entry, stress marking, font markers fixation, correction of inevitable publishing errors in the dictionary entry structure, etc. This was the most time-consuming step, and it required manual processing. The text was converted into .doc format. MS Word text processor was used for processing, the result was text in .txt format, in which HTML tags were used to mark substrings, presented in bold and italic.

  2. Designing a dictionary lexicographic system model. This model serves as a basis for building a parsing algorithm, designing a database schema and interface elements. The model was designed based on an analysis of the printed version of dictionary entries markup. Lexicographic systems model methodology allows us to identify all structural elements that can be identified automatically, and to establish connections between them. Each dictionary entry is assigned one universal structure, i.e. any dictionary entry is considered as a derivative of one “template” entry.

  3. Construction of an XML schema based on the conceptual lexicographic model.

  4. Automatic conversion of dictionary text (.txt format) into an XML document, allowing to explicate all defined structural elements and the connections between them. To automatically mark the dictionary text with XML tags, a program was developed that highlights the elements of the dictionary entry structure. We consider an XML document as a stand-alone product that effectively represents lexicographic data forfurther use for various purposes.

  5. Lexicographic database creation. NoSQL (document-oriented databases) was chosen for this. In the case of relational databases, data is stored as a set of multiple tables and links between them. Working with individual tables as a single object requires a powerful software infrastructure. Moreover, the evolutionary potential of such a digital object is limited by the opacity of the database. Since dictionary entries are the basic elements of a lexicographic system with a strictly defined structure, it is logical to represent them as classes in object-oriented programming languages with subsequent processing, editing and storage in explicit form. The main advantage of NoSQL databases for our project is their ability to store explicitly lexicographic objects without changing their internal structure, which opens direct access to each element of the lexicographic object and significantly simplifies the possibility of editing and modifying (extending) it.

  6. Converting XML file to database. This was performed automatically.

  7. Designing of interface schemes and creation of a website (currently in progress).

Presentation materials

There are no materials yet.