Speaker
Description
It has been almost half a century since we started “doing” lexicography on computers. Let’s stop for a minute now and take a critical look at the data models we have been using to represent the structure of dictionaries in dictionary writing systems and other software.
In this talk, I will trace the history of lexicographic data modelling from its beginnings as text markup for retro-digitised dictionaries, to the present day when most dictionaries are born-digital. I will show that, regardless of which notation we use (XML, JSON or other), the underlying design pattern is almost always a tree structure in which the various content items (headwords, senses, definitions…) are arranged in a parent-child hierarchy.
I will argue that the tree-structured pattern is not expressive enough to handle some phenomena that occur in dictionaries, such as entry-to-entry cross-references, the placement of multiword subentries, and complex hierarchies of subsenses. These things would be easier to manage in a graph-based data structure, such as a relational database or a Semantic Web-style knowledge graph.
Dictionary projects which insist on a purely tree-structured data model are failing to make full use of the digital medium. But upgrading to a graph-based data model is difficult because tree-structured thinking is entrenched in the minds of lexicographers and dictionary users alike. This talk will conclude with an introduction to DMLex, a recently standardised “Data Model for Lexicography” which aims to ease this transition by being a hybrid model, combining tree structures where possible with graph structures where necessary.