Speakers
Description
About DANTE DANTE (Database of Analysed Texts of English) was initially developed in the years 2008–2010 (Atkins, Kilgarriff & Rundell, 2010) by a lexicographic team led by Sue Atkins, Adam Kilgarriff, Valerie Grundy and Michael Rundell. It was commissioned by Foras na Gaeilge, a governmental agency promoting the use of Irish language, for the purposes of the development of the New English Irish Dictionary (Mianáin & Convery, 2014). It was produced on the basis of an English corpus having about 1.7 billion words using the Sketch Engine toolchain (Kilgarriff et al., 2014). DANTE provides a very detailed lexicographic analysis of about 50,000 single word English entries as well as 45,000 compounds, with lexical units subject to the following structure:
- wordclass
- secondary grammar (inherent properties of headword)
- informal definitions
- syntactic constructions and arguments of the headword
- lexical collocates based on corpus frequencies
- support verb constructions
- support prepositions domain /subject field
- regional variety
- speaker/writer attitude
- time
- register
- style
- full example sentences (from corpus)
- variant forms
- derived forms
- cross-reference.
DANTE Resurrected
DANTE was originally released on the IDM DPS platform7 and until 2023 it was a closed proprietary product of Foras na Gaeilge. In this paper we present new development following the decision made by Foras na Gaeilge to release the content of DANTE under the terms of the CC-BY 4.0 open source license.8 We show a new dedicated web interface for DANTE based on the Lexonomy dictionary platform (Mĕchura et al., 2017; Rambousek et al., 2021) hosted at anonymized which interlinks DANTE with additional corpus based resources and discuss future uses of DANTE for the purposes of research in lexicography. We particularly focus on using DANTE for evaluation of automatic dictionary drafting, including by using large language models, and the full paper will provide an experimental evaluation of these methods based on DANTE data.
Conclusions
The 2010 paper on DANTE ends with the following description: “DANTE is a lexicographic project where the end-product is not a dictionary but an in-depth analysis to be used for creating one or more dictionaries. The users of DANTE are not the dictionary-using public but the lexicographic teams who will take this on to dictionary status.” We believe that by open sourcing DANTE, Foras na Gaeilge made an important step towards the goals as initially envisaged and that DANTE is an important and welcome contribution to the international lexicographic community.