Empirical Linguistics and Computational Language Modeling (LiMo) (Department of Computational Linguistics of Heidelberg University and Leibniz Institute for the German Language)

Data publications of the Leibniz ScienceCampus “Empirical Linguistics and Computational Language Modeling”

The Leibniz ScienceCampus “Empirical Linguistics and Computational Language Modeling” (LiMo) is a cooperative research project between the Leibniz Institute for the German Language (Leibniz-Institut für Deutsche Sprache, IDS) in Mannheim and the Department of Computational Linguistics at Heidelberg University (ICL). The general aims of the project are to develop new methods, models, and tools for compiling and analysing automatically large German textual corpora covering different domains, genres and language varieties.

The project is supported by funds from the Baden-Württemberg Ministry of Science, Research and the Arts and the Leibniz Association together with funds provided by the Leibniz Institute for the German Language and Heidelberg University.

Funding Period: 2015 – 2020

41 to 50 of 82 Results
Plain Text - 31.9 KB - MD5: bf369686743f258705fd6cc675cfcaf0
Adobe PDF - 126.2 KB - MD5: 846a2849d5f0f4a119d504d79260c6fa
Oct 7, 2019
Marasović, Ana, 2019, "Multilingual Modal Sense Classification using a Convolutional Neural Network [Source Code]",, heiDATA, V1
Abstract Modal sense classification (MSC) is aspecial WSD task that depends on themeaning of the proposition in the modal’s scope. We explore a CNN architecture for classifying modal sense in English and German. We show that CNNs are superior to manually designed feature-based cl...
Oct 7, 2019
Marasović, Ana; Zhou, Mengfei; Frank, Anette, 2019, "The MSC Data Set",, heiDATA, V1
From this page you can download resources we created for modal sense classification as reported in Zhou et al. (2015), Marasović et al. (2016) and Marasović and Frank (2015) (see "Related Publication" below): Heuristically sense-annotated training data acquired from EUROPARL and...
Oct 7, 2019 - The MSC Data Set
ZIP Archive - 6.2 MB - MD5: 98dbe1d608c24c3dfd31f166daeee77b
Oct 8, 2019
Ruppenhofer, Josef, 2019, "Affixoid Dataset (DE)",, heiDATA, V1, UNF:6:+MGK9lTPTXx7Rclu1BpPnw== [fileUNF]
The dataset contains the manual annotations for the COLING 2018 submission "Distinguishing affixoid formations from compounds" by Josef Ruppenhofer, Michael Wiegand, Rebecca Wilm and Katja Markert. 1788 complex words containing one of 7 German suffixoid candidates (e.g. -hai, -go...
Oct 8, 2019 - Affixoid Dataset (DE)
Tab-Delimited - 61.6 KB - MD5: 8e2e107227a8ab7d59fb9a48dfa9f475
Oct 8, 2019 - Affixoid Dataset (DE)
Plain Text - 758 B - MD5: 017f60a9c77782cd97a45c4dd74e117c
Oct 22, 2019
Becker, Maria, 2019, "COREC – A neural multi-label COmmonsense RElation Classification system",, heiDATA, V1
We examine the learnability of Commonsense knowledge relations as represented in CONCEPTNET. We develop a neural open world multi-label classification system that focuses on the evaluation of classification accuracy for individual relations. Based on an in-depth study of the spec...
