Speaker
Description
The purpose of the conference paper (poster) is to present an electronic contextual repository consisting of DA – dictionary agonyms (lexical innovations, but also unique, extremely rare, previously unrecorded words – cf. Bartwicka et al., 2007; Fedorushkov, 2009) extracted from a corpus of Russian regional press texts from the turn of the millennium (1996–2006), with a special focus on neonymic lexis (neologisms and occasionalisms). The repository does not contain dialectological data and metadata, but words are assigned information about the location of appearance in the usus. The poster includes information about the data corpus, research methods and technical methods, the structure of the repository. 10 000 of DA are retrieved from the press corpus (Fedorushkov, 2023) consisting of archives of more than two hundred regional Russian newspapers from the indicated period. The methods of retrieving words from the corpus are both traditional (manual) and automated. All words of the press corpus (about 250 million) were given verification regarding their occurrence in earlier selected dictionaries of the Russian language (e.g., Ozhegov & Shvedova, 1992 and also Ushakov, MAS, BAS and others). Tools for obtaining words not included in the Zalizniak 1987 Russian dictionary – are tagger I parcer based on AOT technology (morphoanalyzer with a database of words and generated wordforms derived from the Zaliznyak 1987 dictionary). Excerption filters for DA are generated in REGEX syntax along with coding for grammeme clusters allowing to select separate parts of speech. The obtained list was verified against another list of words from other Russian language dictionaries – including (manually) – from dictionaries of neologisms (e.g., by Kotelova, Milekovskaya, Soloviev, Butseva, Levashev and others). The total list of verification dictionaries is about forty. Also described are the algorithms and analytical methods that were used to extract, process and organize DA. Selected DA are unique words not registered in dictionaries before 2006 (the year of verification activities), so there are potential neologisms among them. The excerpting work also ended in the indicated period (cf. Fedorushkov, 2008). Due to the enormity of technical and substantive work, later lexicographic sources were not taken into account. The selection of contexts for DA from the corpus lasted for nearly 16 years according to the principles of linguo-chronologization described in the monograph Wierzchoń (2008). One of these principles is the selection of the earliest context of DA occurrence. The selection of contexts was done through the dtSearch indexer, which allows to define an alphabet for segregating words from electronic texts. The vocabulary in the Repository is placed in two correlated lists using a network of hyperlinks in HTML technilogy. The arrangements of these indexes are alphabetical: the first index is “from the beginning of the word” (a fronte), while the other is in inverted order – alphabetically “from the end of the word” (a tergo). In this way, each DA can be viewed in two lists – indexes. Additional information relates to the use of the Repository with special attention to the context of DA occurrence with an accompanying map of the region and city, the date of registration and the name of the regional newspaper in which it was registered. In view of this, each DA is provided with a distinction in the context of use and geochronological information. The territorial range of DA occurrence – practically the entire territory of the Russian Federation before 2007 – the locations will be presented in the poster in the form of the infographic. The poster shows infographics on the expansion of the growth of agnonymic lexis by region and years. The DA obtained from the selected period are largely composites (MisturskaBojanowska, 2013) of the type SMS-ругательство, смс, онлайн-болтовня, фэшн-неделя, FM-станция, аудиовидеосинхронизация, профайлер, путиномания, путиномика, брендмейкер, порноюмор, IT-технология, sms-голосование, лонг-дринк, интернет-переписка, смарт-фон, смарттелефон with a tendency toward affixation, i.e., with the presence of a formant like affix, affixoid, semi-affix, radixoid, radix (cf. classifications in Bartkov & Minina, 2019). Also in the poster is placed information regarding research paths in the analysis of neonymic lexis. With the help of the Repository provided as an integral part of the poster presentation, it is possible to observe precisely the growth of particular types of word-forming tendencies in the development of language. The Repository is aimed at a wide range of lexicographers.