Euralex 2024

Name: Euralex 2024
Start: 2024-10-08T08:30:00+02:00
End: 2024-10-12T19:00:00+02:00
Location: Hotel Croatia

8–12 Oct 2024

Hotel Croatia

Europe/Warsaw timezone

Philological and Technical Challenges in the Creation of a Pashto-English Dictionary

8 Oct 2024, 17:00

30m

Bobara Hall (Hotel Croatia)

Bobara Hall

Hotel Croatia

Paralel Sessions

Veronika Milanova Jeremy Bradley Julian Kreidl

Pashto. Pashto is an Eastern Iranian language spoken in Afghanistan, Pakistan and by a large diaspora community across the globe. It is one of two official languages of Afghanistan and a regional official language in Pakistan’s Khyber Pakhtunkhwa Province. With about 15 million native speakers in Afghanistan and 30 million in Pakistan it is the second-most spoken Iranian language after Persian. Its accessibility to scholarly and lay communities is, however, not commensurate with its linguistic and geopolitical importance. Resources that exist were generally created for anachronistic purposes, do not satisfy the needs of modern user bases, and are insufficiently accessible. Open Pashto-English Dictionary (OPED). Our dictionary project aims to tackle these challenges head-on by creating a sustainable, adaptable, openly accessible, open-source, TEI Lex-0 conform online dictionary, designed with the wishes and requirements of the myriad user groups in mind. This 2 ½ year project is funded by the Austrian Academy of Sciences’ go!digital 3.0 programme and being carried out at the Academy’s Institute for Iranian Studies in cooperation with the University of Vienna’s Department of Finno-Ugrian Studies. As of March 2024, a working demo version of our dictionary can be found at oped.univie.ac.at. Practical Challenges. Pashto is by no means an undescribed language: ample lexicographic resources exist, but high-quality resources that do exist (a) have not been adequately digitized, if at all, (b) are out of date, (c) have substantial gaps in the information they contain about, e.g., the inflection of nouns and verbs, (d) use Russian, rather than English, as a metalanguage. Given our limited personnel and time resources, the efficient approach in raising the accessibility of the Pashto lexicon is to process what already exists: digitize, translate, and update existing resources and publish the resulting resource in an open and expandable format. Technological Challenges. Pashto uses a variation of the Arabic alphabet with additional graphemes not found in the Arabic or Persian alphabets (e.g., <ټ> for a voiceless retroflex plosive /ʈ/). Consequently, all tools we use must be able to handle not only rightto-left text and the correct joining of characters necessary in the Arabic script, but also support the usage of fonts that adequately realize all relevant symbols. This has been surprisingly challenging, with even infrastructures designed with “smaller” languages in mind turned out to not support the adequate display Pashto orthography, or the correct alphabetical ordering. This, coupled with desired flexibility regarding the depiction and search functionalities (due to the diverse user base we have in mind), has forced us to rely on “garage solutions” at times that might be seen as reinventing the wheel, but are in fact necessary to create the user interface we promised to create. Philological Challenges. Even the second edition of the by far most extensive and qualitative lexicographical resource on Pashto, Aslanov (1985), has become quite outdated, not only due to the passage of time since its publication but also by the political upheaval Pakistan and especially Afghanistan have gone through in recent decades. Pashto-speaking areas on both sides of the border have been subject to profound changes in administrative structures, resulting also in changes in administrative terminologies. Furthermore, Pashto speakers being forced into diaspora and refugee situations has altered the usage domains of the language profoundly. We are addressing lexical gaps and semantic changes within the lexicon by cooperating with native speakers, esp. active in the domain of translation. Through the creation of a morphological model, we also hope to make a corpus-based expansion of our dictionary possible in future. Our User Base. A matter deserving special attention especially when creating lexical materials for an under-resourced language is that one must expect a heterogenous user group: when few accessible lexical resources exist for a language, prospective users of a dictionary will include native speakers, foreign scholars from different theoretical backgrounds, computational linguists, etc. – i.e., groups of users for whom different resources are developed in the case of structurally strong languages such as English. The user interface we have created for our dictionary 178aims to be as customizable as possible in order to not just meet the needs of a small, idealized pool of users, but of a diverse user base with radically different desires and requirements.

Veronika Milanova Jeremy Bradley Julian Kreidl

There are no materials yet.

Euralex 2024

Philological and Technical Challenges in the Creation of a Pashto-English Dictionary

Bobara Hall

Hotel Croatia

Speakers

Description

Co-authors

Presentation materials

Choose timezone

Euralex 2024

Speakers

Description

Co-authors

Presentation materials