X-SRL Dataset and mBERT Word Aligner (doi:10.11588/data/HVXXIJ)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

(external link)

Document Description

Citation

Title:

X-SRL Dataset and mBERT Word Aligner

Identification Number:

doi:10.11588/data/HVXXIJ

Distributor:

heiDATA

Date of Distribution:

2021-02-17

Version:

1

Bibliographic Citation:

Daza, Angel, 2021, "X-SRL Dataset and mBERT Word Aligner", https://doi.org/10.11588/data/HVXXIJ, heiDATA, V1

Study Description

Citation

Title:

X-SRL Dataset and mBERT Word Aligner

Identification Number:

doi:10.11588/data/HVXXIJ

Authoring Entity:

Daza, Angel (Leibniz Institute for the German Language / Department of Computational Linguistics, Heidelberg University)

Date of Production:

2020

Distributor:

heiDATA

Access Authority:

Daza, Angel

Holdings Information:

https://doi.org/10.11588/data/HVXXIJ

Study Scope

Keywords:

Arts and Humanities, Computer and Information Science, word alignment, annotation projection, multilingual semantic role labeling, SRL, multilingual BERT

Topic Classification:

Semantic Role Labeling

Abstract:

This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source annotations (for example labeled English sentences) into the target side (for example a German translation of the sentence) by transferring the label into the best-aligned target word. This newly labeled data can be used to train different multilingual SOTA models to improve performance, especially for the lower-resource languages.

Kind of Data:

program source code

Methodology and Processing

Sources Statement

Data Access

Other Study Description Materials

Related Publications

Citation

Title:

<p>Daza, Angel and Frank, Anette (2020). X-SRL: A Parallel Cross-lingual Semantic Role Labeling Dataset. In <em>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing</em>, November 16-20, 2020, Online.</p>

Identification Number:

2010.01998

Bibliographic Citation:

<p>Daza, Angel and Frank, Anette (2020). X-SRL: A Parallel Cross-lingual Semantic Role Labeling Dataset. In <em>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing</em>, November 16-20, 2020, Online.</p>

Other Study-Related Materials

Label:

README.md

Notes:

text/markdown

Other Study-Related Materials

Label:

xsrl_mbert_aligner.zip

Notes:

application/zip