Speakers
Description
Among historical and ancient languages, usually under-resourced due to the limited size of corpora and the scarce availability of digital lexical resources, Latin is relatively well documented, thanks to its high relevance to the history of Europe and to the study of Romance languages. As far as lexicography is concerned, several lexical resources are available in digital format, although this is not true for all periods and registers. Indeed, Latin is attested across an extremely long time span, ranging from the 3rd century BC up to the present day – as it still is the official language of the Vatican City State.
Lexical resources for Latin include bilingual dictionaries (Glare, 2012; Lewis & Short, 1879), thesauri (the Thesaurus linguae Latinae – 1900–), and glossaries (Cange et al., 1883–1887), covering different eras from Early (3rd BC – 2nd BC), to Classical (1st BC – 2nd AC), Late (3rd AC – 6th AC) and Medieval Latin. However, for the variety attested at later stages, usually referred to as Neo-Latin, the situation is quite different: much less research has been conducted, and there is no reference dictionary devoted to this period. The Neulateinische Wortliste (NLW, cf. Ramminger, 2016), a work-in-progress lexicon compiled by Johann Ramminger, collects lexical entries for more than 21,000 words attested from the 14th to the 18th century AC in texts written by humanists in a style closely resembling Classical Latin.
This resource was made available only through a web interface (http://nlw.
Renaessancestudier.org/neulateinische_wortliste.htm), with no access to source data, which were kindly provided to us by the author for the purposes of this work.
Another issue is the lack of interoperability between the NLW and other related resources. This limitation prevents, for instance, from retrieving in the several available Latin corpora the textual occurrences of the lexical items listed in the NLW. Instead, making linguistic resources interoperable means to make their (meta)data interact on the Web by using shared communication protocols, data categories and ontologies, thus addressing the so-called FAIR principles of data management (Wilkinson et al., 2016). To meet such need, a current approach to interlinking linguistic resources takes up the so-called Linked Data principles, so that “it is possible to follow links between existing resources to find other, related data and exploit network effects” (Chiarcos et al., 2013, p. viii).
According to the Linked Data paradigm, data in the Semantic Web (Berners-Lee, Hendler & Lassila, 2001) are interlinked through connections that can be semantically queried, to make the structure of web data better serve the needs of users. Data in the Semantic Web are represented according to the Resource Description Framework (RDF) data model (Lassila et al., 1998), where information is structured in terms of “triples” that connect a “subject” to an “object” through a predicate (“property”), and relations between items are expressed by assigning them to “classes” and “subclasses”. Over the last decade, the research community working in the area of Linguistic Linked Open Data (LLOD) has developed several standard de facto ontologies to represent linguistic, and especially lexical, information stored in resources.
In this work, we describe the steps undertaken to publish the NLW as LLODand to connect it to the LiLa (Linking Latin) Knowledge Base (KB) of interoperable linguistic resources for Latin, which was recently created following the principles of the Linked Data paradigm (https://lila-erc.eu). Linking the NLW to the LiLa KB makes the (meta)data provided by the NLW accessible and retrievable through federated querying across interoperable resources, including those for languages different from Latin, thus, easing crosslingual investigations. To model the information provided by the NLW at different levels (morphological information, sense(s) accompanied by a translation in German,
attestations), we rely on Ontolex, a widely used vocabulary that is by now established as a de facto standard for the release of lexical resources as LLOD (https://www.w3.org/2016/05/ontolex/).
After describing the process of linking the NLW to LiLa, we provide examples of the advantages that having this resource published as LLOD yields for both the scientific community of scholars interested in Neo-Latin, and for the one interested in lexicographic research. In particular, we present a few federated queries run across the NLW and other both lexical and textual resources currently interlinked in the LiLa KB. Finally, we propose a lexically-based comparison on how lexical entries are described in different lexical resources, such as a word list and a full-fledged dictionary. With the help of a few case studies, we show that the same modelling strategy may account for different lexicographic choices in terms of hierarchical structure of the lexical entries and subentries. If we take the case of substantivized adjectives, we observe that they are treated differently:
either as entries or sub-entries of a superordinate lexical entry. Thanks to the OntoLex module for Lexicography (lexicog:Component and lexicog:describes; see https://www.w3.org/2019/09/lexicog/), we can compare how different resources account for lexical items of this kind.