Speaker
Marko Robnik-Šikonja
Description
Currently, large language models (LLMs) are redefining methodological approaches in many scientific areas, including linguistics and lexicography. LLMs are pretrained on huge text corpora by predicting the next tokens and adapted for human interaction with the instruction following datasets. This does not make them immune to hallucinations and biases, requiring a human-in-the-loop approach. In the context of lexicography, LLMs can be used to support several tasks. We will present how the information contained in language databases can be utilized to improve LLMs on lexicographic tasks. Our current methodology is based on knowledge graph extraction, continued pretraining of LLMs, prompt engineering, and semi-automatic evaluation.