8–12 Oct 2024
Hotel Croatia
Europe/Warsaw timezone

Linking Historical Corpus Data and Annotations Using Wikibase

9 Oct 2024, 17:00
1h 30m
Bobara Hall (Hotel Croatia)

Bobara Hall

Hotel Croatia

Speakers

David Lindemann Mikel Alonso

Description

This software demonstration presents a data model and a first use case for the representation of text corpus data on a Wikibase instance, including morphosyntactic, semantic and philological annotations as well as links to dictionary entries. Wikibase, an extension of MediaWiki, is the software that underlies Wikidata, an exceptionally large crowdsourced queryable knowledge graph, which includes nodes for ontological concepts, on the one hand, and for lexemes, lexeme senses and lexeme forms, on the other, together with annotations to and relations between them. We argue that the proposed model and the chosen software solutions for the representation of corpus and dictionary data, all free and open source, meet with the requirements of provenance transparency, open access and re-use, and the capability of collaborative work on the data. We also present our own scripts wrapped in a web application that shortcut several workflow steps
in a first use case, a 1737 Basque manuscript, transcribed on Wikisource, and represented as an annotated dataset on our Wikibase instance.

Co-authors

Presentation materials

There are no materials yet.