Nov 17 – 20, 2025
Bled, Slovenia
Europe/Ljubljana timezone

lexicographR: R infrastructure to develop and deploy digital dictionaries from scratch

Nov 19, 2025, 12:00 PM
1h
Lobby

Lobby

Speaker

Ligeia Lugli

Description

DEMO

This demo introduces lexicographR (citation withheld for anonymization), a prototype computer application aimed at facilitating the creation of digital dictionaries for scholars working in low-tech environments, where access to programming skills is severely hindered by lack of funding, institutional support and technical training. Based on recent user-surveys (Lugli 2024b), these scholars are typically domain experts or language teachers without formal training in lexicography and work on specialized dictionaries pertaining to their area of expertise. As such, they are often not aware of best practices and current methods in lexicography. Few use corpora and many have been writing their dictionaries in Word or Excel files, which makes it harder for them to automatically integrate new lexical data from corpora into their existing work. They typically struggle to deploy their lexicographic output as interactive online resources, and perceive existing free-of-charge digital dictionary development solutions, such as Lexonomy and Living Dictionaries (Daigneault and Anderson 2023; Měchura 2017), as insufficiently customisable for their highly specialized dictionaries and the specific needs of target audiences (Lugli 2024). The demo will first discuss the results of our user surveys and user-need identification process. It will then briefly discuss our development philosophy, which, given the ephemeral nature of interfaces and web-technologies, prioritizes lowering the costs and technical barrier to the creation of machine-readable and re-usable dictionary data over the development of digital interfaces. Still, to foster the dissemination of dictionary data among strata of the population who are less used to interacting with data directly, we have also provided a simple way to build flexible and lightweight interfaces to deploy dictionary data online as interactive digital dictionaries.

The core of the demo will consist of a demonstration of lexicographR's main functionalities, each of which is designed to assistance with a specific lexicographic task:
1. conversion of pre-existing dictionary data from Word, Excel, csv/tsv and FLEx, CoNLL-u and vrt/vert files into JSON.

  1. processing corpus data from CoNLL-u, vrt/vert, csv/tsv, FLEx and plain text and extracting corpus frequencies nd distribution information for each dictionary headword

  2. extracting collocations from the corpus for each dictionary headwords

  3. extracting from the corpus for each dictionary headwords

  4. creating data-visualizations for the information extracted from the corpus as well as for pre-existing dictionary data

  5. designing a dictionary interface and generating the files necessary to publish the pre-existing dictionary data (potentially augmented with information extracted from the corpus and data-visualization) as either a Shiny app or a Quarto book.

  6. converting the dictionary data published in the digital dictionary to JSON-LD for release in online data repositories, such as Zenodo or figshare.

The paper will conclude with an overview of some of the dictionaries that have been created using the lexicographR app.

Presentation materials

There are no materials yet.