Speaker
Description
In this paper we present a newly developed formal framework (as well as its practical implementation) for automatic, lexically driven analysis of Danish text tokens. The framework (called “CLINK”) employs a minimal token definition (the “morph”) and a compact lexical representation (the “CLINK template”). All morphs (i.e., text elements with individual semantic contribution) are lexicalized using the same template, word forms, affixes, glue elements, puncutation marks, multi-word expressions, etc. Thus, the definition of “lexeme” is reinterpreted in functional-computational terms. The grammar rules of CLINK are purely abstract, viz. those of the Lambek calculus (categorial grammar). This paper gives an overview of the CLINK framework (motivations and application). References to performance metrics will be given (suggesting CLINK to be on a par with the Danish state-of-the-art in PoS-tagging while providing much richer annotation structure). However, we consider the formal framework in itself to be the main contribution of this short paper. CLINK will be available for test runs at EURALEX.