Skip to main content
Metrics
29,409 Downloads
Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Find Advanced Search

1 to 10 of 62 Results
Jan 15, 2020 - GIScience and 3D Spatial Data Processing
Herfort, Benjamin; Anders, Katharina; Marx, Sabrina; Eberlein, Stefan; Höfle, Bernhard, 2020, "3D Micro-Mapping of Subsidence Stations [Source Code and Data]", https://doi.org/10.11588/data/OU8YA1, heiDATA, V1
This dataset comprises the source code to reproduce the 3D micro-mapping tool for plane adjustment at subsidence stations. In this project, users adjust a plane (height and orientation) at the positions of fixed poles, so-called subsidence stations, to provide information on the...
Mar 26, 2020 - Empirical Linguistics and Computational Language Modeling (LiMo)
Rehbein, Ines; Ruppenhofer, Josef; Zimmermann, Victor, 2020, "A harmonised testsuite for social media POS tagging (DE)", https://doi.org/10.11588/data/KXLMHN, heiDATA, V1
A harmonised POS testsuite of web data, CMC and Twitter microtext, with word forms and STTS pos tags (+ some additional CMC-specific tags). UD pos tags have been automatically converted, based on the STTS pos tags. The data does not contain (manually corrected) lemma information....
Feb 4, 2019 - AIPHES
Marasovic, Ana, 2019, "Abstract Anaphora Resolution [Source Code]", https://doi.org/10.11588/data/UDMPY5, heiDATA, V1
Abstract Anaphora Resolution (AAR) aims to find the interpretation of nominal expressions (e.g., this result, those two actions) and pronominal expressions (e.g., this, that, it) that refer to abstract-object-antecedents such as facts, events, plans, actions, or situations. The f...
Jul 15, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
Nastase, Vivi; Kotnis, Bhushan, 2019, "Abstract graphs, abstract paths, grounded paths for Freebase and NELL", https://doi.org/10.11588/data/AVLFPZ, heiDATA, V1
We describe a method for representing knowledge graphs that capture an intensional representation of the original extensional information. This representation is very compact, and it abstracts away from individual links, allowing us to find better path candidates, as shown by the...
Jul 15, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
Nastase, Vivi; Hitschler, Julian, 2019, "ACL word segmentation correction", https://doi.org/10.11588/data/VK99LU, heiDATA, V1
The data in this collection consists of two parallel directories, one ("raw") containing the raw text of 18850 articles from the ACL 2013/02 collection, the other ("re-segmented") the word-resegmented version of these articles, obtained using nematus, a seq2seq neural model used...
Oct 8, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
Ruppenhofer, Josef, 2019, "Affixoid Dataset (DE)", https://doi.org/10.11588/data/QKF4LT, heiDATA, V1, UNF:6:+MGK9lTPTXx7Rclu1BpPnw== [fileUNF]
The dataset contains the manual annotations for the COLING 2018 submission "Distinguishing affixoid formations from compounds" by Josef Ruppenhofer, Michael Wiegand, Rebecca Wilm and Katja Markert. 1788 complex words containing one of 7 German suffixoid candidates (e.g. -hai, -go...
AIPHES(Heidelberg University, Technical University of Darmstadt, HITS)
Jan 31, 2019
Data publications from the DFG-funded research training group on Adaptive Information Processing from Heterogeneous Sources (AIPHES) at the CS Department at the Technical University of Darmstadt, the Institute for Computational Linguistics at the University of Heidelberg and the...
Jul 12, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)
Opitz, Juri, 2019, "AMR parse quality prediction [Source Code]", https://doi.org/10.11588/data/STHBGW, heiDATA, V1
Accuracy prediction for AMR parsing predicts 33 accuracy metrics for a given sentence and its (automatic) AMR parse Abstract (Opitz and Frank, 2019): Semantic proto-role labeling (SPRL) is an alternative to semantic role labeling (SRL) that moves beyond a categorical definition o...
Jun 16, 2014 - Statistical Natural Language Processing Group
Sokolov, Artem; Jehl Laura; Hieber Felix; Ruppert, Eugen; Riezler, Stefan, 2014, "BoostCLIR: JP-EN Relevance Marked Patent Corpus", https://doi.org/10.11588/data/10001, heiDATA, V1
BoostCLIR is a bilingual (Japanese-English) corpus of patent abstracts, extracted from the MAREC patent data, and the data from the NTCIR PatentMT workshop collections, accompanied with relevance judgements for the task of patent prior-art search. Important: The English side of t...
Feb 6, 2019 - AIPHES
Heinzerling, Benjamin, 2019, "BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)", https://doi.org/10.11588/data/V9CXPR, heiDATA, V1
BPEmb is a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE). In an evaluation using fine-grained entity typing as testbed, BPEmb performs competitively, and for some languages better than alternative subword approaches, while r...
Add Data

Sign up or log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.

Contact heiDATA Support

heiDATA Support

Please fill this out to prove you are not a robot.

+ =