Empirical Linguistics and Computational Language Modeling (LiMo)

Data publications of the Leibniz ScienceCampus “Empirical Linguistics and Computational Language Modeling”

The Leibniz ScienceCampus “Empirical Linguistics and Computational Language Modeling” (LiMo) is a cooperative research project between the Leibniz Institute for the German Language (Leibniz-Institut für Deutsche Sprache, IDS) in Mannheim and the Department of Computational Linguistics at Heidelberg University (ICL). The general aims of the project are to develop new methods, models, and tools for compiling and analysing automatically large German textual corpora covering different domains, genres and language varieties.

The project is supported by funds from the Baden-Württemberg Ministry of Science, Research and the Arts and the Leibniz Association together with funds provided by the Leibniz Institute for the German Language and Heidelberg University.

Funding Period: 2015 – 2020

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

21 to 30 of 185 Results

german-opinon-role-extractor-master.zip Sep 2, 2019 - Opinion role extractor ZIP Archive - 20.8 MB - MD5: 6704c06c5a8566eb05c3a8e0e0baebc2 Code
README Sep 2, 2019 - Opinion role extractor Plain Text - 13.0 KB - MD5: c4eb5b271a38da142c703216f9648f09 Documentation
Lexicon of Abusive Words (EN) Sep 2, 2019 Wiegand, Michael, 2019, "Lexicon of Abusive Words (EN)", https://doi.org/10.11588/data/MKPEYV, heiDATA, V1 This goldstandard contains a bootstrapped lexicon of abusive words. The lexicon comprises a large set of English negative polar expressions annotated as either abusive or not.
lexicon-of-abusive-words-master.zip Sep 2, 2019 - Lexicon of Abusive Words (EN) ZIP Archive - 738.4 KB - MD5: 46f33f5b7a9c866b1a2fb6dc956b945d
README.md Sep 2, 2019 - Lexicon of Abusive Words (EN) Markdown Text - 4.4 KB - MD5: 3cbbac5ff1534a6e9c3fcc9a1b0be976 Documentation
GermEval-2018 Corpus (DE) Sep 2, 2019 Wiegand, Michael, 2019, "GermEval-2018 Corpus (DE)", https://doi.org/10.11588/data/0B5VML, heiDATA, V1 This dataset comprises the training and test data (German tweets) from the GermEval 2018 Shared on Offensive Language Detection.
GermEval-2018-Data-master.zip Sep 2, 2019 - GermEval-2018 Corpus (DE) ZIP Archive - 14.8 MB - MD5: 6471a35acf802906383e6d19e5241b37 Code
README.md Sep 2, 2019 - GermEval-2018 Corpus (DE) Markdown Text - 2.1 KB - MD5: 82583130d72db06eb5fe686c1a8338ac Documentation
Sentiment View Lexicon (EN) Sep 5, 2019 Wiegand, Michael; Ruppenhofer, Josef; Schulder, Marc, 2019, "Sentiment View Lexicon (EN)", https://doi.org/10.11588/data/2JK48O, heiDATA, V1 This gold standard contains sentiment expressions (verbs, nouns and adjectives) that have been annotated according to their (prior) sentiment view. Each sentiment expression is labelled either as actor or speaker view.
LICENSE Sep 5, 2019 - Sentiment View Lexicon (EN) Plain Text - 18.2 KB - MD5: 4a17ffc27c9f3b240fbf4fe17783c89c Documentation

german-opinon-role-extractor-master.zip

Sep 2, 2019 - Opinion role extractor

ZIP Archive - 20.8 MB -

Code

README

Sep 2, 2019 - Opinion role extractor

Plain Text - 13.0 KB -

Documentation

Lexicon of Abusive Words (EN)

Sep 2, 2019

Wiegand, Michael, 2019, "Lexicon of Abusive Words (EN)", https://doi.org/10.11588/data/MKPEYV, heiDATA, V1

This goldstandard contains a bootstrapped lexicon of abusive words. The lexicon comprises a large set of English negative polar expressions annotated as either abusive or not.

lexicon-of-abusive-words-master.zip

Sep 2, 2019 - Lexicon of Abusive Words (EN)

ZIP Archive - 738.4 KB -

README.md

Sep 2, 2019 - Lexicon of Abusive Words (EN)

Markdown Text - 4.4 KB -

Documentation

GermEval-2018 Corpus (DE)

Sep 2, 2019

Wiegand, Michael, 2019, "GermEval-2018 Corpus (DE)", https://doi.org/10.11588/data/0B5VML, heiDATA, V1

This dataset comprises the training and test data (German tweets) from the GermEval 2018 Shared on Offensive Language Detection.

GermEval-2018-Data-master.zip

Sep 2, 2019 - GermEval-2018 Corpus (DE)

ZIP Archive - 14.8 MB -

Code

README.md

Sep 2, 2019 - GermEval-2018 Corpus (DE)

Markdown Text - 2.1 KB -

Documentation

Sentiment View Lexicon (EN)

Sep 5, 2019

Wiegand, Michael; Ruppenhofer, Josef; Schulder, Marc, 2019, "Sentiment View Lexicon (EN)", https://doi.org/10.11588/data/2JK48O, heiDATA, V1

This gold standard contains sentiment expressions (verbs, nouns and adjectives) that have been annotated according to their (prior) sentiment view. Each sentiment expression is labelled either as actor or speaker view.

LICENSE

Sep 5, 2019 - Sentiment View Lexicon (EN)

Plain Text - 18.2 KB -

Documentation

Add Data

Share Dataverse

Link Dataverse

Reset Modifications