Speakers
Description
This paper investigates the potential of LLMs in supporting lexicographic work on non-standard linguistic varieties using data from the Dictionary of Bavarian Dialects in Austria (WBÖ). Based on approx. 2.4 million digitized and TEI-encoded dialect paper slips published via the Lexical Information System Austria (LIÖ), we construct a domain-specific corpus and evaluate LLMs in semantic classification and dictionary entry generation. Key preparatory steps include metadata enrichment, glossary and ontology development, and prompt engineering combined with Retrieval-Augmented Generation (RAG) techniques. Preliminary results suggest that LLMs can assist in organizing dialectal material into coherent semantic groupings. However, challenges persist regarding data preprocessing, structural conformity, and selection of representative examples. We discuss methodological implications and outline future directions, including the integration of agent-based systems and fine-tuning approaches tailored to dialect resources. This study contributes to the broader discourse on AI-assisted lexicography, highlighting both the potential and limitations of current LLM technologies in handling underrepresented language varieties.