Empirical Linguistics and Computational Language Modeling (LiMo)

Data publications of the Leibniz ScienceCampus “Empirical Linguistics and Computational Language Modeling”

The Leibniz ScienceCampus “Empirical Linguistics and Computational Language Modeling” (LiMo) is a cooperative research project between the Leibniz Institute for the German Language (Leibniz-Institut für Deutsche Sprache, IDS) in Mannheim and the Department of Computational Linguistics at Heidelberg University (ICL). The general aims of the project are to develop new methods, models, and tools for compiling and analysing automatically large German textual corpora covering different domains, genres and language varieties.

The project is supported by funds from the Baden-Württemberg Ministry of Science, Research and the Arts and the Leibniz Association together with funds provided by the Leibniz Institute for the German Language and Heidelberg University.

Funding Period: 2015 – 2020

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

1 to 10 of 185 Results

AMR parse quality prediction [Source Code] Jul 12, 2019 Opitz, Juri, 2019, "AMR parse quality prediction [Source Code]", https://doi.org/10.11588/data/STHBGW, heiDATA, V1 Accuracy prediction for AMR parsing predicts 33 accuracy metrics for a given sentence and its (automatic) AMR parse Abstract (Opitz and Frank, 2019): Semantic proto-role labeling (SPRL) is an alternative to semantic role labeling (SRL) that moves beyond a categorical definition o...
quamr.zip Jul 12, 2019 - AMR parse quality prediction [Source Code] ZIP Archive - 12.7 MB - MD5: 7057006601db4c004d0f5e041e508e08 CodeData
DeModify Jul 15, 2019 Nastase, Vivi; Fritz, Devon; Frank, Anette, 2019, "DeModify", https://doi.org/10.11588/data/KIWEMF, heiDATA, V1 deModify consists of 3631 instances, each with three annotations obtained through CrowdFlower. An instance is a short story in which a modifier is annotated with respect to its impact on the information in the story, assessed through its deletion from the context: crucial, not-cr...
demodify.data_split.tsv Jul 15, 2019 - DeModify Tab-Separated Values - 112.2 KB - MD5: 9859efc83ee0b6a30af19448be4d6f0b Data
demodify.tsv Jul 15, 2019 - DeModify Tab-Separated Values - 5.1 MB - MD5: 12bab5c05a384c4fbe64c9afd81f9c6d Data
README Jul 15, 2019 - DeModify Plain Text - 2.7 KB - MD5: f4d3244cd7ed0511b580c40dda38fa26 Documentation
ACL word segmentation correction Jul 15, 2019 Nastase, Vivi; Hitschler, Julian, 2019, "ACL word segmentation correction", https://doi.org/10.11588/data/VK99LU, heiDATA, V1 The data in this collection consists of two parallel directories, one ("raw") containing the raw text of 18850 articles from the ACL 2013/02 collection, the other ("re-segmented") the word-resegmented version of these articles, obtained using nematus, a seq2seq neural model used...
acl-201302_word-resegmented.tar.gz Jul 15, 2019 - ACL word segmentation correction Gzip Archive - 371.1 MB - MD5: 96d089771cde56bb9ac5296189fb403b Data text files
README Jul 15, 2019 - ACL word segmentation correction Plain Text - 782 B - MD5: b305fd3ce016837f601aa137fd8ecf63 Documentation
Abstract graphs, abstract paths, grounded paths for Freebase and NELL Jul 15, 2019 Nastase, Vivi; Kotnis, Bhushan, 2019, "Abstract graphs, abstract paths, grounded paths for Freebase and NELL", https://doi.org/10.11588/data/AVLFPZ, heiDATA, V1 We describe a method for representing knowledge graphs that capture an intensional representation of the original extensional information. This representation is very compact, and it abstracts away from individual links, allowing us to find better path candidates, as shown by the...

AMR parse quality prediction [Source Code]

Jul 12, 2019

Opitz, Juri, 2019, "AMR parse quality prediction [Source Code]", https://doi.org/10.11588/data/STHBGW, heiDATA, V1

Accuracy prediction for AMR parsing predicts 33 accuracy metrics for a given sentence and its (automatic) AMR parse Abstract (Opitz and Frank, 2019): Semantic proto-role labeling (SPRL) is an alternative to semantic role labeling (SRL) that moves beyond a categorical definition o...

quamr.zip

Jul 12, 2019 - AMR parse quality prediction [Source Code]

ZIP Archive - 12.7 MB -

CodeData

DeModify

Jul 15, 2019

Nastase, Vivi; Fritz, Devon; Frank, Anette, 2019, "DeModify", https://doi.org/10.11588/data/KIWEMF, heiDATA, V1

deModify consists of 3631 instances, each with three annotations obtained through CrowdFlower. An instance is a short story in which a modifier is annotated with respect to its impact on the information in the story, assessed through its deletion from the context: crucial, not-cr...

demodify.data_split.tsv

Jul 15, 2019 - DeModify

Tab-Separated Values - 112.2 KB -

Data

demodify.tsv

Jul 15, 2019 - DeModify

Tab-Separated Values - 5.1 MB -

Data

README

Jul 15, 2019 - DeModify

Plain Text - 2.7 KB -

Documentation

ACL word segmentation correction

Jul 15, 2019

Nastase, Vivi; Hitschler, Julian, 2019, "ACL word segmentation correction", https://doi.org/10.11588/data/VK99LU, heiDATA, V1

The data in this collection consists of two parallel directories, one ("raw") containing the raw text of 18850 articles from the ACL 2013/02 collection, the other ("re-segmented") the word-resegmented version of these articles, obtained using nematus, a seq2seq neural model used...

acl-201302_word-resegmented.tar.gz

Jul 15, 2019 - ACL word segmentation correction

Gzip Archive - 371.1 MB -

Data

text files

README

Jul 15, 2019 - ACL word segmentation correction

Plain Text - 782 B -

Documentation

Abstract graphs, abstract paths, grounded paths for Freebase and NELL

Jul 15, 2019

Nastase, Vivi; Kotnis, Bhushan, 2019, "Abstract graphs, abstract paths, grounded paths for Freebase and NELL", https://doi.org/10.11588/data/AVLFPZ, heiDATA, V1

We describe a method for representing knowledge graphs that capture an intensional representation of the original extensional information. This representation is very compact, and it abstracts away from individual links, allowing us to find better path candidates, as shown by the...

Add Data

Share Dataverse

Link Dataverse

Reset Modifications