1 to 10 of 28 Results
Feb 17, 2021
Daza, Angel, 2021, "X-SRL Dataset and mBERT Word Aligner", https://doi.org/10.11588/data/HVXXIJ, heiDATA, V1
This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source annotations (for example labeled English sentences) into the target side (for example a German translation of th... |
Jan 20, 2021
van den Berg, Esther; Korfhage, Katharina; Ruppenhofer, Josef; Wiegand, Michael; Markert, Katja, 2020, "German Twitter Titling Corpus", https://doi.org/10.11588/data/AOSUY6, heiDATA, V2, UNF:6:14BxjwJS7Q3mfI6ei7iBBw== [fileUNF]
The German Titling Twitter Corpus consists of 1904 stance-annotated tweets collected in June/July 2018 mentioning 24 German politicians with a doctoral degree. The Addendum contains an additional 296 stance-annotated tweets from each month of 2018 mentioning 10 politicians with a... |
Mar 26, 2020
Rehbein, Ines; Ruppenhofer, Josef; Do, Bich-Ngoc, 2020, "tweeDe", https://doi.org/10.11588/data/S90S35, heiDATA, V1
A German UD Twitter treebank, with >12,000 tokens from 519 tweets, annotated in the Universal Dependencies framework |
Mar 26, 2020
Rehbein, Ines; Ruppenhofer, Josef; Zimmermann, Victor, 2020, "Pre-trained POS tagging models for German social media", https://doi.org/10.11588/data/W3JBV4, heiDATA, V1
Pre-trained POS tagging models for the HunPos tagger (Halácsy et al. 2007) the biLSTM-char-CRF tagger (Reimers & Gurevych 2017) Online-Flors (Yin et al. 2015). References: Halácsy, P., Kornai, A., and Oravecz, C. (2007). HunPos: An open source trigram tagger. In Proceedings of th... |
Mar 26, 2020
Rehbein, Ines; Ruppenhofer, Josef; Zimmermann, Victor, 2020, "A harmonised testsuite for social media POS tagging (DE)", https://doi.org/10.11588/data/KXLMHN, heiDATA, V1
A harmonised POS testsuite of web data, CMC and Twitter microtext, with word forms and STTS pos tags (+ some additional CMC-specific tags). UD pos tags have been automatically converted, based on the STTS pos tags. The data does not contain (manually corrected) lemma information.... |
Mar 26, 2020
Rehbein, Ines; Steen, Julius; Do, Bich-Ngoc; Frank, Anette, 2020, "Converter for content-to-head style syntactic dependencies", https://doi.org/10.11588/data/HE3BAZ, heiDATA, V1
A set of Python scripts that convert function-head style encodings in dependency treebanks in a content-head style encoding (as used in the UD treebanks) and vice versa (for adpositions, copula and coordination). For more information, see (Rehbein, Steen, Do & Frank 2017). |
Mar 26, 2020
Rehbein, Ines; Ruppenhofer, Josef, 2020, "MACE-AL-TREE", https://doi.org/10.11588/data/THPEBR, heiDATA, V1
An method for detecting noise in automatically annotated dependency parse trees, combining MACE (Hovy et al. 2013) with Active Learning. |
Mar 26, 2020
Rehbein, Ines; Ruppenhofer, Josef; Steen, Julius, 2020, "MACE-AL", https://doi.org/10.11588/data/C2OQN4, heiDATA, V1
A method for detecting noise in automatically annotated sequence-labelled data, combining MACE (Hovy et al. 2013) with Active Learning. |
Mar 26, 2020
Rehbein, Ines; Ruppenhofer, Josef, 2020, "German causal language annotations and lexicon (verbs, nouns, prepositions) (DE)", https://doi.org/10.11588/data/ZHI94V, heiDATA, V1
Annotations of causal verbs, nouns and prepositions in context and lexicon file for causal verbs, nouns and prepositions. |
Jan 23, 2020
Daza, Angel, 2020, "Encoder-Decoder Model for Semantic Role Labeling", https://doi.org/10.11588/data/TOI9NQ, heiDATA, V1
Abstract (Daza & Frank 2019): We propose a Cross-lingual Encoder-Decoder model that simultaneously translates and generates sentences with Semantic Role Labeling annotations in a resource-poor target language. Unlike annotation projection techniques, our model does not need paral... |