Nov 17 – 20, 2025
Bled, Slovenia
Europe/Ljubljana timezone

LLM-Assisted Dialect Lexicography: Challenges and Opportunities in Processing Historical Bavarian Dialects

Nov 20, 2025, 10:05 AM
25m
Sonce hall

Sonce hall

Speakers

Philipp Stöckle Daniel Elsner Wolfgang Koppensteiner Katharina Korecky-Kröll

Description

This paper investigates the potential of LLMs in supporting lexicographic work on non-standard linguistic varieties using data from the Dictionary of Bavarian Dialects in Austria (WBÖ). Based on approx. 2.4 million digitized and TEI-encoded dialect paper slips published via the Lexical Information System Austria (LIÖ), we construct a domain-specific corpus and evaluate LLMs in semantic classification and dictionary entry generation. Key preparatory steps include metadata enrichment, glossary and ontology development, and prompt engineering combined with Retrieval-Augmented Generation (RAG) techniques. Preliminary results suggest that LLMs can assist in organizing dialectal material into coherent semantic groupings. However, challenges persist regarding data preprocessing, structural conformity, and selection of representative examples. We discuss methodological implications and outline future directions, including the integration of agent-based systems and fine-tuning approaches tailored to dialect resources. This study contributes to the broader discourse on AI-assisted lexicography, highlighting both the potential and limitations of current LLM technologies in handling underrepresented language varieties.

Presentation materials

There are no materials yet.