Nov 17 – 20, 2025
Bled, Slovenia
Europe/Ljubljana timezone

Parsing of Explanatory dictionary

Nov 18, 2025, 11:00 AM
30m
Zrak hall

Zrak hall

Speakers

Iryna Ostapova Yevhen Kupriianov Mykyta Yablochkov

Description

The paper outlines technological and methodological ways to arrange the dictionary parsing process. The Spanish Dictionary (Diccionario de la lengua Española 23 ed. – DLE 23) website (https://dle.rae.es/) serves as a basis for the research. First of all, asthe most complex multi-parameter lexicographic frameworks, explanatory dictionaries of national languages are of the most interest because they offer the most comprehensive lexicographic description of a language, are produced by top experts (linguists and IT engineers), and offer numerous opportunities to fully utilize contemporary digital technologies.

Ultimately, our goal is to create a digital version of the Dictionary of Spanish that can be easily adjusted to the user's evolving demands using a built-in research toolbox. Toachieve it we started the project named as Virtual Lexicographic Laboratory of the Dictionary of Spanish (VLL DLE 23) is the title of the project.

The first step was to build up a formal model that would serve as a basis to elaborate parsing algorithm, XML schema, database schema and interfaces. The formal model of DLE 23 was built based on analyzing the structure of dictionary entries of the online version and the printed variant of DLE 23.

The second step is to create a lexicographic database. Since the dictionary entries have a strictly defined structure, it makes sense to represent them as classes in object-oriented programming languages with subsequent processing, editing and storage in explicit form. NoSQL databases (document-oriented databases) provide such apossibility. LiteDB database (http://www.litedb.org/) was chosen for our project.

The final stage of the trial version was creating a web application to work with the VLL DLE database The application was created on the basis of .Net Core 2.1 technology. A set of HTML, CSS templates and JavaScript Bootstrap scripts was used for convenience and modification of interface elements.

The DLE 23 VLL project is realized in two stages: 1) creation of a VLL pilot version to test specific technological solutions and clarify the structure of the dictionary entry; 2) development of a final application with a full-scale interface. Currently, the first stage has been completed. The pilot version demonstrates more possibilities for the user than the original online version of DLE 23. Streaming version of DLE 23 is available at https://svc2.ulif.org.ua/Dics/ResIntSpanish (captcha is used).

Further parameterization of dictionary entries was done in order to construct the pilot version of the VLL. A collection of parameters is associated with each headword: 1) headword variations; 2) headword structure; 3) headword type; 4) homonymy; 5) number of meanings; 6) number of word combinations, and some others. Each parameter was identified using the dictionary entry's HTML text as a baseline. To create a selection, the user can enter any combination of these parameters. Articles are shown in a manner akin to the original edition, and the HTML-formatted text is also displayed. Statistics are produced for every selection. Full-text search is an additional option that can be combined with parametric search. You can specify any line of HTML text as a search string.

Presentation materials

There are no materials yet.