Nov 17 – 20, 2025
Bled, Slovenia
Europe/Ljubljana timezone

Automatic Detection of Word Sense Shift from Corpus Data

Nov 18, 2025, 12:30 PM
30m
Arnold hall

Arnold hall

Speaker

Ondřej Herman

Description

Language evolves continuously, rendering static dictionaries quickly outdated. While previous research has addressed the automatic detection of new words, identifying subtler semantic changes in existing words remains a challenge. In this work, we propose a robust, language-independent methodology for the automatic detection of word sense shifts using diachronic corpus data. Our approach builds on the Adaptive Skip-Gram algorithm for word sense induction, enabling us to model polysemy directly from raw text without reliance on external sense inventories.

We calculate the temporal distribution of induced senses and apply trend estimation techniques—specifically linear regression and the Theil–Sen estimator—to detect statistically significant shifts. This two-stage architecture decouples sense induction from trend analysis, increasing overall robustness and interpretability. Unlike traditional methods in lexical semantic change detection, which often target dramatic historical shifts, our method is designed to detect emerging or evolving senses over shorter timescales using large web corpora.

We evaluate our method on Timestamped corpora in English and Czech and present several examples of detected sense shifts. The results demonstrate the feasibility of scalable, automatic sense shift detection and its potential applications in lexicography and linguistic research.

Presentation materials

There are no materials yet.