Skip to main content
Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

1 to 10 of 74 Results
Feb 17, 2021 - Empirical Linguistics and Computational Language Modeling (LiMo)
Daza, Angel, 2021, "X-SRL Dataset and mBERT Word Aligner",, heiDATA, V1
This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source annotations (for example labeled English sentences) into the target side (for example a German translation of th...
Aug 13, 2014 - Database Systems Research Group
Strötgen, Jannik; Gertz, Michael, 2014, "WikiWarsDE Corpus",, heiDATA, V1
The WikiWarsDE corpus is a German corpus containing Wikipedia articles with annotations of temporal expressions. Its creation was motivated by the English WikiWars corpus (Mazur & Dale 2010). WikiWarsDE was developed to support research on temporal information extraction and norm...
Jun 18, 2014 - Statistical Natural Language Processing Group
Hieber, Felix; Schamoni, Shigehiko; Sokolov, Artem; Riezler, Stefan, 2014, "WikiCLIR: A Cross-Lingual Retrieval Dataset from Wikipedia",, heiDATA, V1
WikiCLIR is a large-scale (German-English) retrieval data set for Cross-Language Information Retrieval (CLIR). It contains a total of 245,294 German single-sentence queries with 3,200,393 automatically extracted relevance judgments for 1,226,741 English Wikipedia articles as docu...
Aug 23, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
van den Berg, Esther; Korfhage, Katharina; Ruppenhofer, Josef; Wiegand, Michael; Markert, Katja, 2019, "Twitter Titling Corpus",, heiDATA, V1, UNF:6:+F3lLKziwMvjy+xyktkilw== [fileUNF]
The Twitter Titling Corpus contains 4002 stance-annotated tweets collected between 20 June 2017 and 30 August 2017 mentioning 6 presidents. Each tweet is annotated for the naming form used to refer to the president, for the purpose of a study on the relation between naming variat...
Mar 26, 2020 - Empirical Linguistics and Computational Language Modeling (LiMo)
Rehbein, Ines; Ruppenhofer, Josef; Do, Bich-Ngoc, 2020, "tweeDe",, heiDATA, V1
A German UD Twitter treebank, with >12,000 tokens from 519 tweets, annotated in the Universal Dependencies framework
Oct 7, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
Marasović, Ana; Zhou, Mengfei; Frank, Anette, 2019, "The MSC Data Set",, heiDATA, V1
From this page you can download resources we created for modal sense classification as reported in Zhou et al. (2015), Marasović et al. (2016) and Marasović and Frank (2015) (see "Related Publication" below): Heuristically sense-annotated training data acquired from EUROPARL and...
Nov 2, 2016 - Perspektive Bibliothek
Drees, Bastian, 2016, "Text und Data Mining an wissenschaftlichen Repositorien und Publikationsservern in Deutschland - Zusammenfassung der Ergebnisse einer Umfrage im Februar und März 2016",, heiDATA, V2
Es wurden die auf den Homepages angegebenen Ansprechpartner wissenschaftlicher Repositorien und Publikationsserver in Deutschland zu ihren Erfahrungen mit Text und Data Mining befragt. Die Befragung fand zwischen dem 22. und 26.2.2016 per E-Mail statt. Es wurden Ansprechpartner v...
Statistical Natural Language Processing Group(Heidelberg University - Department of Computational Linguistics)
May 21, 2014
The Statistical Natural Language Processing Group is part of the Department of Computational Linguistics. Our research addresses various aspects of the problem of the confusion of languages, by means of statistical learning techniques. Research topics include the following: Stati...
Feb 4, 2019 - AIPHES
Marasovic, Ana, 2019, "SRL4ORL: Improving Opinion Role Labeling Using Multi-Task Learning With Semantic Role Labeling [Source Code]",, heiDATA, V1
This repository contains code for reproducing experiments done in Marasovic and Frank (2018). Paper abstract: For over a decade, machine learning has been used to extract opinion-holder-target structures from text to answer the question "Who expressed what kind of sentiment towar...
Feb 14, 2017 - PhD related Material - Faculty of Mathematics and Computer Science
Merten, Thorsten, 2017, "Source Code, Data and Additional Material for the Thesis: "Identification of Software Features in Issue Tracking System Data"",, heiDATA, V2
This dataset provides the code and the data sets used in the PHD thesis "Identification of Software Features in Issue Tracking System Data" as well as the files that represent the results measured in experiments. For problem studies (e.g. chapters 10 and 11) the folders include t...
Add Data

Sign up or log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.

Contact heiDATA Support

heiDATA Support

Please fill this out to prove you are not a robot.

+ =