Data publications of the Leibniz ScienceCampus “Empirical Linguistics and Computational Language Modeling”

The Leibniz ScienceCampus “Empirical Linguistics and Computational Language Modeling” (LiMo) is a cooperative research project between the Leibniz Institute for the German Language (Leibniz-Institut für Deutsche Sprache, IDS) in Mannheim and the Department of Computational Linguistics at Heidelberg University (ICL). The general aims of the project are to develop new methods, models, and tools for compiling and analysing automatically large German textual corpora covering different domains, genres and language varieties.

The project is supported by funds from the Baden-Württemberg Ministry of Science, Research and the Arts and the Leibniz Association together with funds provided by the Leibniz Institute for the German Language and Heidelberg University.

Funding Period: 2015 – 2020

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

1 to 10 of 185 Results
ZIP Archive - 37.7 KB - MD5: 6b35c476556dfdb2b9b25a7a1cdc755d
Code
Feb 17, 2021
Daza, Angel, 2021, "X-SRL Dataset and mBERT Word Aligner", https://doi.org/10.11588/data/HVXXIJ, heiDATA, V1
This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source annotations (for example labeled English sentences) into the target side (for example a German translation of th...
Aug 23, 2019 - Twitter Titling Corpus
Tabular Data - 219.0 KB - 5 Variables, 4002 Observations - UNF:6:+F3lLKziwMvjy+xyktkilw==
Data
Aug 23, 2019
van den Berg, Esther; Korfhage, Katharina; Ruppenhofer, Josef; Wiegand, Michael; Markert, Katja, 2019, "Twitter Titling Corpus", https://doi.org/10.11588/data/IOHXDF, heiDATA, V1, UNF:6:+F3lLKziwMvjy+xyktkilw== [fileUNF]
The Twitter Titling Corpus contains 4002 stance-annotated tweets collected between 20 June 2017 and 30 August 2017 mentioning 6 presidents. Each tweet is annotated for the naming form used to refer to the president, for the purpose of a study on the relation between naming variat...
Mar 26, 2020 - tweeDe
Unknown - 945.9 KB - MD5: 32d20db78b577a921d9fd4bc3868770e
Data
Mar 26, 2020
Rehbein, Ines; Ruppenhofer, Josef; Do, Bich-Ngoc, 2020, "tweeDe", https://doi.org/10.11588/data/S90S35, heiDATA, V1
A German UD Twitter treebank, with >12,000 tokens from 519 tweets, annotated in the Universal Dependencies framework
Gzip Archive - 1.7 MB - MD5: b2d04463fd249e1a19e641a99c65e70d
Data
Gzip Archive - 4.3 MB - MD5: b37e0268b451b32e52948e47baf80603
Data
ZIP Archive - 32.4 KB - MD5: 3bf4fe4ba2daaade0ae9c765233145c3
Code
Nov 13, 2023 - Neural Techniques for German Dependency Parsing
Do, Bich-Ngoc; Rehbein, Ines, 2023, "Topological Field Labeler for German", https://doi.org/10.11588/data/YYNQFF, heiDATA, V1
This resource contains the code of the topological labeler used in the paper: Do and Rehbein (2020). "Parsers Know Best: German PP Attachment Revisited". For this tool, labeling topological field is formulated as a sequence labeling task. We also include in this resource two pre-...
Add Data

Sign up or log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.