Empirical Linguistics and Computational Language Modeling (LiMo)

Data publications of the Leibniz ScienceCampus “Empirical Linguistics and Computational Language Modeling”

The Leibniz ScienceCampus “Empirical Linguistics and Computational Language Modeling” (LiMo) is a cooperative research project between the Leibniz Institute for the German Language (Leibniz-Institut für Deutsche Sprache, IDS) in Mannheim and the Department of Computational Linguistics at Heidelberg University (ICL). The general aims of the project are to develop new methods, models, and tools for compiling and analysing automatically large German textual corpora covering different domains, genres and language varieties.

The project is supported by funds from the Baden-Württemberg Ministry of Science, Research and the Arts and the Leibniz Association together with funds provided by the Leibniz Institute for the German Language and Heidelberg University.

Funding Period: 2015 – 2020

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

81 to 90 of 184 Results

GTTC_addendum.tab Jan 20, 2021 - German Twitter Titling Corpus Tabular Data - 19.7 KB - 5 Variables, 296 Observations - UNF:6:e8JLFj0rmt8hCbrLS38QTg== Data
README.MD Jan 20, 2021 - German Twitter Titling Corpus Markdown Text - 1.2 KB - MD5: 2fb7128786b3a52452273bb4546963c5 Documentation
X-SRL Dataset and mBERT Word Aligner Feb 17, 2021 Daza, Angel, 2021, "X-SRL Dataset and mBERT Word Aligner", https://doi.org/10.11588/data/HVXXIJ, heiDATA, V1 This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source annotations (for example labeled English sentences) into the target side (for example a German translation of th...
README.md Feb 17, 2021 - X-SRL Dataset and mBERT Word Aligner Markdown Text - 6.0 KB - MD5: 00d9aab1a8323bf228abd46cd51a666b Documentation
xsrl_mbert_aligner.zip Feb 17, 2021 - X-SRL Dataset and mBERT Word Aligner ZIP Archive - 37.7 KB - MD5: 6b35c476556dfdb2b9b25a7a1cdc755d Code
Topological Field Labeler for German Nov 13, 2023 - Neural Techniques for German Dependency Parsing Do, Bich-Ngoc; Rehbein, Ines, 2023, "Topological Field Labeler for German", https://doi.org/10.11588/data/YYNQFF, heiDATA, V1 This resource contains the code of the topological labeler used in the paper: Do and Rehbein (2020). "Parsers Know Best: German PP Attachment Revisited". For this tool, labeling topological field is formulated as a sequence labeling task. We also include in this resource two pre-...
baseline-marmot.tar.gz Nov 13, 2023 - Topological Field Labeler for German Gzip Archive - 67.1 MB - MD5: 34a0bcd15baaa3d6588e908e89b986a7 Data
baseline.tar.gz Nov 13, 2023 - Topological Field Labeler for German Gzip Archive - 66.3 MB - MD5: b55949b5530dec3e4933c3efedc63600 Data
embeddings.tar.gz Nov 13, 2023 - Topological Field Labeler for German Gzip Archive - 40.5 MB - MD5: 2256cc7718eb340cdf0941dd8e41db9e Data
README.md Nov 13, 2023 - Topological Field Labeler for German Markdown Text - 82 B - MD5: 6f6c9146cb4bb5767db27672dcc6103f Documentation

GTTC_addendum.tab

Jan 20, 2021 - German Twitter Titling Corpus

Tabular Data - 19.7 KB - 5 Variables, 296 Observations -

Data

README.MD

Jan 20, 2021 - German Twitter Titling Corpus

Markdown Text - 1.2 KB -

Documentation

X-SRL Dataset and mBERT Word Aligner

Feb 17, 2021

Daza, Angel, 2021, "X-SRL Dataset and mBERT Word Aligner", https://doi.org/10.11588/data/HVXXIJ, heiDATA, V1

This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source annotations (for example labeled English sentences) into the target side (for example a German translation of th...

README.md

Feb 17, 2021 - X-SRL Dataset and mBERT Word Aligner

Markdown Text - 6.0 KB -

Documentation

xsrl_mbert_aligner.zip

Feb 17, 2021 - X-SRL Dataset and mBERT Word Aligner

ZIP Archive - 37.7 KB -

Code

Topological Field Labeler for German

Nov 13, 2023 - Neural Techniques for German Dependency Parsing

Do, Bich-Ngoc; Rehbein, Ines, 2023, "Topological Field Labeler for German", https://doi.org/10.11588/data/YYNQFF, heiDATA, V1

This resource contains the code of the topological labeler used in the paper: Do and Rehbein (2020). "Parsers Know Best: German PP Attachment Revisited". For this tool, labeling topological field is formulated as a sequence labeling task. We also include in this resource two pre-...

baseline-marmot.tar.gz

Nov 13, 2023 - Topological Field Labeler for German

Gzip Archive - 67.1 MB -

Data

baseline.tar.gz

Nov 13, 2023 - Topological Field Labeler for German

Gzip Archive - 66.3 MB -

Data

embeddings.tar.gz

Nov 13, 2023 - Topological Field Labeler for German

Gzip Archive - 40.5 MB -

Data

README.md

Nov 13, 2023 - Topological Field Labeler for German

Markdown Text - 82 B -

Documentation

Add Data

Share Dataverse

Link Dataverse

Reset Modifications