Skip to main content
Metrics
22,697 Downloads
Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Find Advanced Search

1 to 10 of 46 Results
Oct 22, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
Becker, Maria, 2019, "Genre-sensitive Neural Situation Entity classifier (DE, EN)", https://doi.org/10.11588/data/XXKWU0, heiDATA, V1
This is a Classifier for situation entity types as described in Becker et al., 2017. These clause types depend on a combination of syntactic-semantic and contextual features. We explore this task in a deeplearning framework, where tuned word representations capture lexical, synta...
Oct 22, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
Becker, Maria, 2019, "COREC – A neural multi-label COmmonsense RElation Classification system", https://doi.org/10.11588/data/E5EHBV, heiDATA, V1
We examine the learnability of Commonsense knowledge relations as represented in CONCEPTNET. We develop a neural open world multi-label classification system that focuses on the evaluation of classification accuracy for individual relations. Based on an in-depth study of the spec...
Oct 21, 2019 - Statistical Natural Language Processing Group
Beilharz, Benjamin; Sun, Xin, 2019, "LibriVoxDeEn - A Corpus for German-to-English Speech Translation and Speech Recognition", https://doi.org/10.11588/data/TMEDTX, heiDATA, V1
This dataset is a corpus of sentence-aligned triples of German audio, German text, and English translation, based on German audio books. The corpus consists of over 100 hours of audio material and over 50k parallel sentences. The speech data are low in disfluencies because of the...
Oct 8, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
Ruppenhofer, Josef, 2019, "Affixoid Dataset (DE)", https://doi.org/10.11588/data/QKF4LT, heiDATA, V1, UNF:6:+MGK9lTPTXx7Rclu1BpPnw== [fileUNF]
The dataset contains the manual annotations for the COLING 2018 submission "Distinguishing affixoid formations from compounds" by Josef Ruppenhofer, Michael Wiegand, Rebecca Wilm and Katja Markert. 1788 complex words containing one of 7 German suffixoid candidates (e.g. -hai, -go...
Oct 7, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
Marasović, Ana; Zhou, Mengfei; Frank, Anette, 2019, "The MSC Data Set", https://doi.org/10.11588/data/JEESIQ, heiDATA, V1
From this page you can download resources we created for modal sense classification as reported in Zhou et al. (2015), Marasović et al. (2016) and Marasović and Frank (2015) (see "Related Publication" below): Heuristically sense-annotated training data acquired from EUROPARL and...
Oct 7, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
Marasović, Ana, 2019, "Multilingual Modal Sense Classification using a Convolutional Neural Network [Source Code]", https://doi.org/10.11588/data/ERDJDI, heiDATA, V1
Abstract Modal sense classification (MSC) is aspecial WSD task that depends on themeaning of the proposition in the modal’s scope. We explore a CNN architecture for classifying modal sense in English and German. We show that CNNs are superior to manually designed feature-based cl...
Sep 5, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
Wiegand, Michael; Bocionek, Christine; Ruppenhofer, Josef, 2019, "Sentiment Compound Data (DE)", https://doi.org/10.11588/data/LSTRK3, heiDATA, V1
This dataset contains gold standards that are required for building a classifier that automatically extracts opinion (noun) compounds.
Sep 5, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
Wiegand, Michael; Ruppenhofer, Josef; Schulder, Marc, 2019, "Sentiment View Lexicon (EN)", https://doi.org/10.11588/data/2JK48O, heiDATA, V1
This gold standard contains sentiment expressions (verbs, nouns and adjectives) that have been annotated according to their (prior) sentiment view. Each sentiment expression is labelled either as actor or speaker view.
Sep 2, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
Wiegand, Michael, 2019, "GermEval-2018 Corpus (DE)", https://doi.org/10.11588/data/0B5VML, heiDATA, V1
This dataset comprises the training and test data (German tweets) from the GermEval 2018 Shared on Offensive Language Detection.
Sep 2, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
Wiegand, Michael, 2019, "Lexicon of Abusive Words (EN)", https://doi.org/10.11588/data/MKPEYV, heiDATA, V1
This goldstandard contains a bootstrapped lexicon of abusive words. The lexicon comprises a large set of English negative polar expressions annotated as either abusive or not.
Add Data

Sign up or log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.

Contact heiDATA Support

heiDATA Support

Please fill this out to prove you are not a robot.

+ =