Data publications of the Leibniz ScienceCampus “Empirical Linguistics and Computational Language Modeling”

The Leibniz ScienceCampus “Empirical Linguistics and Computational Language Modeling” (LiMo) is a cooperative research project between the Leibniz Institute for the German Language (Leibniz-Institut für Deutsche Sprache, IDS) in Mannheim and the Department of Computational Linguistics at Heidelberg University (ICL). The general aims of the project are to develop new methods, models, and tools for compiling and analysing automatically large German textual corpora covering different domains, genres and language varieties.

The project is supported by funds from the Baden-Württemberg Ministry of Science, Research and the Arts and the Leibniz Association together with funds provided by the Leibniz Institute for the German Language and Heidelberg University.

Funding Period: 2015 – 2020

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

81 to 90 of 184 Results
Tabular Data - 19.7 KB - 5 Variables, 296 Observations - UNF:6:e8JLFj0rmt8hCbrLS38QTg==
Data
Markdown Text - 1.2 KB - MD5: 2fb7128786b3a52452273bb4546963c5
Documentation
Feb 17, 2021
Daza, Angel, 2021, "X-SRL Dataset and mBERT Word Aligner", https://doi.org/10.11588/data/HVXXIJ, heiDATA, V1
This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source annotations (for example labeled English sentences) into the target side (for example a German translation of th...
Markdown Text - 6.0 KB - MD5: 00d9aab1a8323bf228abd46cd51a666b
Documentation
ZIP Archive - 37.7 KB - MD5: 6b35c476556dfdb2b9b25a7a1cdc755d
Code
Nov 13, 2023 - Neural Techniques for German Dependency Parsing
Do, Bich-Ngoc; Rehbein, Ines, 2023, "Topological Field Labeler for German", https://doi.org/10.11588/data/YYNQFF, heiDATA, V1
This resource contains the code of the topological labeler used in the paper: Do and Rehbein (2020). "Parsers Know Best: German PP Attachment Revisited". For this tool, labeling topological field is formulated as a sequence labeling task. We also include in this resource two pre-...
Gzip Archive - 67.1 MB - MD5: 34a0bcd15baaa3d6588e908e89b986a7
Data
Gzip Archive - 66.3 MB - MD5: b55949b5530dec3e4933c3efedc63600
Data
Gzip Archive - 40.5 MB - MD5: 2256cc7718eb340cdf0941dd8e41db9e
Data
Markdown Text - 82 B - MD5: 6f6c9146cb4bb5767db27672dcc6103f
Documentation
Add Data

Sign up or log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.