Speaker
Description
The software demonstration presents a new lexicographic resource for Lithuanian the Lexical Database of Lithuanian Language Usage (further on, database) particularly focusing on the Collocate Search and presenting its functionality by several examples.
The Lexical Database of Lithuanian Language Usage is the first corpusdriven lexical database for Lithuanian (Kovalevskaitė et al., 2021). The material for the database was collected from the written part (620,000 words) of the morphologically annotated and CEFR-graded Lithuanian Pedagogic Corpus. The corpus was developed for learning purposes and consists of texts collected from the Lithuanian language coursebooks and a variety of authentic learner-relevant Lithuanian data (Kovalevskaitė et al., 2020, p. 246).
For the description of word usage, the inductive procedure of Corpus Pattern Analysis (Hanks, 2013) was adopted, which was partly automated using the Lithuanian Sketch Grammar in Sketch Engine (see Kovalevskaitė et al., 2020). Although the corpus we used is rather small, the usage information on the lexical and grammatical patterns was collected for frequent words (frequency of 100 and above) from the core vocabulary, i.e., words that appeared in all CEFR levels (from A1 to B2) or at least in three levels (Kovalevskaitė et al., 2020, p. 247). The final headword list of 3700 items includes appr. 700 words (nouns, verbs, adjectives, adverbs) of high-frequency, and word formations and multi-word expressions from the core vocabulary related to these frequent words.
In the database, usage patterns are associated with specific meanings of the headword (e.g., the 3rd meaning of the headword BĖGTI (‘to run’) is pprox. by one usage pattern 3.1., see Figure 1). After selecting a specific pattern, a three-colour table is displayed, in which the individual columns represent the grammatical (marked in blue), semantic (pink), and lexical levels (purple) of the pattern. As for the 3rd meaning of the verb bėgti, the information at the grammatical level shows that this verb in the present tense form (marked as BĖGTI_prs) is typically used with the subject (denoting an agent) and the adverbial of manner (marked as Adv). At the semantic and lexical level, we can additionally learn that the agent in this model is abstract (usually expressed by a collocate laikas ‘time’), and the adverbial of manner (‘būdas’) is expressed by the collocate greitai
(‘quickly’). The multilevel representation of a pattern contains a lot of usage data, however, from the point of user-friendliness this representation still needs better solutions.
Information at the lexical level of usage patterns provides collocates, whichare defined as words commonly used with the headword. Due to the small corpus size, we do not evaluate collocates by statistical significance: a word is considered to be a collocate of the headword if they co-occur 3 or more times in the subcorpus. In the database, there is a special Collocate Search function where the user can find words for which the search word is a common collocate. For example, the search results for the verb sveikinti (‘to congratulate’) will display the noun proga (‘occasion’). The full record of the noun proga (‘occasion’) contains one meaning with 8 patterns (accessible via Headword Search), however, the Collocate Search will result in a pattern 1.1, where the searched collocate sveikinti (‘to congratulate’) is used (Figure 2).
The expanded information of the pattern 1.1 shows (see Figure 3) that when the noun proga (‘occasion’) in singular instrumental case is used with an attribute that does not agree with the noun (gimtadienio proga (‘on one’s birthday’), this phrase refers to a [reason] (‘priežastis’) on the semantic level. Thus, a complete pattern is: sveikinti gimtadienio proga (‘to congratulate someone on their birthday’).
If a headword is polysemous, then the user will see it from the particular usage pattern, e.g., the search noun laimė (‘happiness’) is a collocate in the 2.1. pattern of the headword NEŠTI (‘to carry’), when this verb is used in its 2nd meaning, e.g., neša laimę (‘brings happiness’). Grammatical level indicates that this verb is used with object in accusative:At present, the presentation of collocations in the database is useful mainly for decoding function. However, to enable foreign learners to use semantically transparent collocations productively it is also important to present them as units (Siepmann, 2008). Improving the resources for learning Lithuanian collocations some other features may be worth taking into consideration, e.g., possibility to explore connectivity between collocates at various levels of collocation networks (e.g., Brezina et al., 2015) and links to language proficiency level.