Speaker
Description
POSTER
Eric Raymond’s influential essay (Raymond 1999) about the community-based software development as practiced in the Open Source movement vs. the previously dominant, closed, top-down approach mostly preferred in the commercial realm proved also instructive for the Wikiverse. Its flagship project Wikipedia with a comparable approach to knowledge production and dissemination disrupted the market of encyclopedic offerings to the extent that it became the primary source of information in that context, driving previous commercial market leaders out of business. While Wiktionary, the lexicographic equivalent of Wikipedia, did not have the same effect on its established competitors, it has drawn considerable academic interest as a lexical resource, from favorable comparisons to controlled or closed-source resources (Meyer and Gurevych 2010; 2012) over integrations with such resources (McCrae, Montiel-Ponsoda, and Cimiano 2012) to its conversion and augmentation as a comprehensive, multilingual Linked Open Data resource in its own right (Sérasset 2015). The Wikiverse picked up this research-driven development of structured, machine-readable lexical datasets by incorporating lexicographic information in Wikidata (Lindemann 2025), basing the data model in turn on Ontolex Lemon, the lexicon model for ontologies which originated in a research collaboration.
The Digitales Wörterbuch der deutschen Sprache (DWDS) wanted to further explore this relationship between the academic realm on the one hand, with its lexicographic projects more akin to Raymond’s cathedrals, and the bazaar-like, dynamic and community-driven approach on the other, which informs the construction of Wikidata’s knowledge graph. In January 2023 the DWDS conducted a data donation of about 185,000 German lexemes to Wikidata. In line with previous studies (Kosem et al. 2021), the facts donated to Wikidata comprised lexical information most likely to be liberally licensed by projects like the DWDS (lexical category, written representations, grammatical features), while other copyrighted information (sense glosses, etymology etc.) was deliberately excluded. The poster presents the challenges of this data donation, for example impedances in mapping the different data models, organizing support in the community or overcoming technical obstacles. It also reports on the first results: Since the initial data import two years ago, the German lexeme inventory of Wikidata grew to over 200,000 entries. By now it registers over 550,000 links of those entries to external lexical resources beside the DWDS, and last but not least over 11,000 community-contributed links to concepts on the sense level, that in turn link to about 175,000 lexemes in other languages.