Speaker
Description
POSTER
Recently, the digitization of resources of any type has become an increasingly discussed topic. In the linguistic field, lexicography is among the most influenced by this process, with digital dictionaries playing an essential role both for online consultation by specialists and for the automatic development of useful resources in natural language processing, as well as downstream applications.
The first dictionary automatically digitized by the “Iorgu Iordan - Alexandru Rosetti” Institute of Linguistics and made available to the public is the Etymological Dictionary of Romanian (https://delr.lingv.ro). It was parsed only shallowly, to make possible searches by the head word of the lexical entry, its variants and words from the same lexical family. It was developed rather as a proof of concept for the automatic parsing of the entries in dictionaries developed traditionally and originally meant only for printing.
The third edition of the Orthographic, Orthoepic and Morphological Dictionary of the Romanian Language (DOOM3) was produced by the Institute, initially also in printed format. Shortly after its paper format’s launch on the market, the idea of making it accessible online to the general public and in a format that meets the current needs of users (i.e., quick access on mobile devices) led to its publication on the Internet (https://doom.lingv.ro), in a manner that allows for regular searches (by the title word), but also advanced ones (for example, by combining the various types of linguistic information represented in the dictionary: parts of speech, grammatical categories, language of origin, register, variants, etc.). The latter was made possible by the deeper parsing of its entries. Also, the entire theoretical apparatus that precedes the dictionary itself in the printed version, i.e. the Introductory Study, is also accessible online, which facilitates working with it, through the possibility of automatically searching its content for occurring words.
The online version is a more complex tool than the printed dictionary, because it has implemented a mechanism for suggesting the correct forms in the event that the user enters, in the search bar, a wrong word or forms that are no longer recommended/accepted by the norm.
Following the success among students, specialists, teachers and the large public of the digital edition of the Orthographic, Orthoepic and Morphological Dictionary, the Institute invested effort in the digitalization of the new edition of the Romanian Language Dictionary (DLR). A new graphical interface has recently been created. For the moment, searches can only be made by the title word and are of several types: exact search, search with/without diacritics, search with prefixes or suffixes using the special characters * and ? (for example ab or tor, for prefixes and suffixes, respectively). The dictionary article contains several dynamic elements, especially regarding quotations, which are displayed compactly. Upon request, the user can see all quotations of a meaning or hide them completely for a synthetic view of the semantic tree (see https://dlr-test.lingv.ro/cautare/abandon). It is also possible to browse through the list of all words or download the list of words when searching with prefixes or suffixes.
In the future, we would like to add an advanced search that can be done according to criteria including: part of speech, register/usage, as well as consider other lexicographic resources to be made available online.
The method used to transpose the printable format into the online version is the same for all three dictionaries, despite the fact they have different structures.