Nov 17 – 20, 2025
Bled, Slovenia
Europe/Ljubljana timezone

Woordpeiler: A New Tool for Visualizing and Analyzing Lexical Trends in Contemporary Dutch

Nov 20, 2025, 12:00 PM
30m
Arnold hall

Arnold hall

Speakers

Kris Heylen Vincent Prins Katrien Depuydt Jesse de Does Laura van Eerten Thomas Haga

Description

Representative monitor corpora with detailed metadata offer a solid empirical basis for documenting lexical innovation and change (Kosem et al. 2021). However, continuously updated time-stamped textual data presents challenges for data management, lexicographic analysis, and visualization. Building on its existing corpus infrastructure, the Dutch Language Institute (INT) has developed Woordpeiler (“Word Pollster”, https://woordpeiler.ivdnt.org/), an online application to (a) visualize and analyze word frequencies over time and (b) support the analysis of neologisms and lexical trends in Dutch since 2000.

As part of its mission to maintain a sustainable Dutch language infrastructure, INT developed the Corpus Hedendaags Nederlands (CHN), currently (September 2025) containing 4.3 billion tokens across 10.6 million documents. The corpus supports INT’s lexicographic workflow and is available through CLARIN. Daily and yearly data from major Dutch-language newspaper publishers (in the Netherlands, Belgium, Suriname, and the Dutch Caribbean) is processed via an automated workflow. All data is converted into a unified TEI format, enriched with metadata (e.g. language variety) and linguistic annotation. Using INT’s BlackLab system (de Does et al. 2017), the data is indexed and published as weekly (internal) or monthly (external) CHN updates.

While CHN users could already obtain word frequencies through BlackLab’s query interface, Woordpeiler adds visualization and trend analysis tools. Frequency data for POS-tagged word forms, lemmas, and bigrams are exported to a PostgreSQL database optimized with TimeScaleDB. Through Woordpeiler’s interface (Fig. 1), users can generate interactive graphs for words and bigrams to visualize and compare changes in absolute and relative frequencies across customizable time intervals (day, week, month, year). Wild cards can be used for searches and graphs can be filtered or split by language variety (Belgium, Netherlands, Suriname, Caribbean), with tooltips providing statistics and links to the underlying corpus data. In advanced search, users can refine searches by lemma, part of speech and newspaper (only internally). Graphs can be downloaded PNGs or shared through unique URLs.

A separate pane (Figure 2) offers additional trend analyses (currently only available internally). One function detects “trending” words or bigrams in a given interval using simple maths keyness (Kilgarriff 2009) relative to the preceding period. Users can adjust smoothing and also detect disappearing words via inverse keyness. A second function identifies new words or bigrams in a selected interval, optionally allowing a limited number of earlier nonce occurrences. Results appear as sortable, POS-filterable lists with accompanying frequency graphs.

Woordpeiler and its database are fully integrated into INT’s corpus-processing workflow, minimizing publication lags and ensuring quality control. The tool will support corpus-lexicographic work by adding validated frequency information to the central lexicon GiGaNT and improving workflows for identifying neologisms and out-of-dictionary words. Additionally, Woordpeiler serves science communication and outreach goals: it underpins a monthly and annual Woordpeiling (“Word Poll”) shared via INT’s website and social media, and it is used in educational materials about language variation and change for secondary school students.

Presentation materials

There are no materials yet.