Metrics
189,636 Downloads
Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

11 to 20 of 25 Results
Jun 13, 2020 - Statistical Natural Language Processing Group
Beilharz, Benjamin; Sun, Xin, 2019, "LibriVoxDeEn - A Corpus for German-to-English Speech Translation and Speech Recognition", https://doi.org/10.11588/data/TMEDTX, heiDATA, V2
This dataset is a corpus of sentence-aligned triples of German audio, German text, and English translation, based on German audio books. The corpus consists of over 100 hours of audio material and over 50k parallel sentences. The speech data are low in disfluencies because of the...
Sep 2, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
Wiegand, Michael, 2019, "Lexicon of Abusive Words (EN)", https://doi.org/10.11588/data/MKPEYV, heiDATA, V1
This goldstandard contains a bootstrapped lexicon of abusive words. The lexicon comprises a large set of English negative polar expressions annotated as either abusive or not.
Aug 19, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
Kotnis, Bhushan, 2019, "KGE Algorithms", https://doi.org/10.11588/data/CSXYSS, heiDATA, V1
An updated method for link prediction that uses a regularization factor that models relation argument types Abstract (Kotnis and Nastase, 2017): Learning relations based on evidence from knowledge repositories relies on processing the available relation instances. Knowledge repos...
Mar 26, 2021 - IWR Computer Graphics
Mara, Hubert, 2019, "HeiCuBeDa Hilprecht - Heidelberg Cuneiform Benchmark Dataset for the Hilprecht Collection", https://doi.org/10.11588/data/IE8CCN, heiDATA, V2
The number of known cuneiform tablets is assumed to be in the hundreds of thousands. A fraction has been published by printing photographs and manual tracings in books, which is collected by the online Cuneiform Digital Library Initiative (CDLI) catalog including some of these im...
Sep 2, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
Wiegand, Michael, 2019, "GermEval-2018 Corpus (DE)", https://doi.org/10.11588/data/0B5VML, heiDATA, V1
This dataset comprises the training and test data (German tweets) from the GermEval 2018 Shared on Offensive Language Detection.
Dec 10, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
Becker, Maria, 2019, "GER_SET: Situation Entity Type labelled corpus for German", https://doi.org/10.11588/data/BBQYD0, heiDATA, V1
Semantic clause types, also called Situation Entity (SE) types (Smith, 2003) are linguistic characterizations of aspectual properties shown to be useful for tasks like argumentation structure analysis (Becker et al., 2016), genre characterization (Palmer and Friedrich, 2014), and...
Oct 22, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
Becker, Maria, 2019, "Genre-sensitive Neural Situation Entity classifier (DE, EN)", https://doi.org/10.11588/data/XXKWU0, heiDATA, V1
This is a Classifier for situation entity types as described in Becker et al., 2017. These clause types depend on a combination of syntactic-semantic and contextual features. We explore this task in a deeplearning framework, where tuned word representations capture lexical, synta...
Jul 15, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
Nastase, Vivi; Fritz, Devon; Frank, Anette, 2019, "DeModify", https://doi.org/10.11588/data/KIWEMF, heiDATA, V1
deModify consists of 3631 instances, each with three annotations obtained through CrowdFlower. An instance is a short story in which a modifier is annotated with respect to its impact on the information in the story, assessed through its deletion from the context: crucial, not-cr...
Oct 22, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
Becker, Maria, 2019, "COREC – A neural multi-label COmmonsense RElation Classification system", https://doi.org/10.11588/data/E5EHBV, heiDATA, V1
We examine the learnability of Commonsense knowledge relations as represented in CONCEPTNET. We develop a neural open world multi-label classification system that focuses on the evaluation of classification accuracy for individual relations. Based on an in-depth study of the spec...
Feb 6, 2019 - AIPHES
Heinzerling, Benjamin, 2019, "BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)", https://doi.org/10.11588/data/V9CXPR, heiDATA, V1
BPEmb is a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE). In an evaluation using fine-grained entity typing as testbed, BPEmb performs competitively, and for some languages better than alternative subword approaches, while r...
Add Data

Sign up or log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.