21 to 30 of 51 Results
Mar 26, 2021 - IWR Computer Graphics
Mara, Hubert, 2019, "HeiCuBeDa Hilprecht - Heidelberg Cuneiform Benchmark Dataset for the Hilprecht Collection", https://doi.org/10.11588/data/IE8CCN, heiDATA, V2
The number of known cuneiform tablets is assumed to be in the hundreds of thousands. A fraction has been published by printing photographs and manual tracings in books, which is collected by the online Cuneiform Digital Library Initiative (CDLI) catalog including some of these im... |
Feb 17, 2021 - Empirical Linguistics and Computational Language Modeling (LiMo)
Daza, Angel, 2021, "X-SRL Dataset and mBERT Word Aligner", https://doi.org/10.11588/data/HVXXIJ, heiDATA, V1
This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source annotations (for example labeled English sentences) into the target side (for example a German translation of th... |
Jan 20, 2021 - Empirical Linguistics and Computational Language Modeling (LiMo)
van den Berg, Esther; Korfhage, Katharina; Ruppenhofer, Josef; Wiegand, Michael; Markert, Katja, 2020, "German Twitter Titling Corpus", https://doi.org/10.11588/data/AOSUY6, heiDATA, V2, UNF:6:14BxjwJS7Q3mfI6ei7iBBw== [fileUNF]
The German Titling Twitter Corpus consists of 1904 stance-annotated tweets collected in June/July 2018 mentioning 24 German politicians with a doctoral degree. The Addendum contains an additional 296 stance-annotated tweets from each month of 2018 mentioning 10 politicians with a... |
Oct 26, 2020 - OwnReality. To Each His Own Reality
Schepp, Moritz, 2020, "OwnReality API-only web application", https://doi.org/10.11588/data/KZHLS8, heiDATA, V1
This dataset contains the data platform for the research project "OwnReality. To Each His Own Reality". During the course of the project, data was gathered and entered into a database. In general, this platform allows the integration of that data into web based systems such as co... |
Jul 31, 2020 - Cluster of Excellence - Asia and Europe in a Global Context
Arnold, Matthias; Dober, Agnes, 2020, "Cataloging Cultural Objects (CCO) – The CCO Commons examples in VRA Core 4 XML", https://doi.org/10.11588/data/KKTC9G, heiDATA, V1
“Cataloging Cultural Objects - a Guide to Describing Cultural Works and Their Images” (CCO) provides a data content standard for catalogers of cultural heritage. It is a guidebook for how to populate data elements and where to apply controlled vocabulary standards. The guide is f... |
Mar 26, 2020 - Empirical Linguistics and Computational Language Modeling (LiMo)
Rehbein, Ines; Ruppenhofer, Josef; Do, Bich-Ngoc, 2020, "tweeDe", https://doi.org/10.11588/data/S90S35, heiDATA, V1
A German UD Twitter treebank, with >12,000 tokens from 519 tweets, annotated in the Universal Dependencies framework |
Mar 26, 2020 - Empirical Linguistics and Computational Language Modeling (LiMo)
Rehbein, Ines; Ruppenhofer, Josef; Zimmermann, Victor, 2020, "Pre-trained POS tagging models for German social media", https://doi.org/10.11588/data/W3JBV4, heiDATA, V1
Pre-trained POS tagging models for the HunPos tagger (Halácsy et al. 2007) the biLSTM-char-CRF tagger (Reimers & Gurevych 2017) Online-Flors (Yin et al. 2015). References: Halácsy, P., Kornai, A., and Oravecz, C. (2007). HunPos: An open source trigram tagger. In Proceedings of th... |
Mar 26, 2020 - Empirical Linguistics and Computational Language Modeling (LiMo)
Rehbein, Ines; Ruppenhofer, Josef; Zimmermann, Victor, 2020, "A harmonised testsuite for social media POS tagging (DE)", https://doi.org/10.11588/data/KXLMHN, heiDATA, V1
A harmonised POS testsuite of web data, CMC and Twitter microtext, with word forms and STTS pos tags (+ some additional CMC-specific tags). UD pos tags have been automatically converted, based on the STTS pos tags. The data does not contain (manually corrected) lemma information.... |
Mar 26, 2020 - Empirical Linguistics and Computational Language Modeling (LiMo)
Rehbein, Ines; Steen, Julius; Do, Bich-Ngoc; Frank, Anette, 2020, "Converter for content-to-head style syntactic dependencies", https://doi.org/10.11588/data/HE3BAZ, heiDATA, V1
A set of Python scripts that convert function-head style encodings in dependency treebanks in a content-head style encoding (as used in the UD treebanks) and vice versa (for adpositions, copula and coordination). For more information, see (Rehbein, Steen, Do & Frank 2017). |
Mar 26, 2020 - Empirical Linguistics and Computational Language Modeling (LiMo)
Rehbein, Ines; Ruppenhofer, Josef, 2020, "MACE-AL-TREE", https://doi.org/10.11588/data/THPEBR, heiDATA, V1
An method for detecting noise in automatically annotated dependency parse trees, combining MACE (Hovy et al. 2013) with Active Learning. |