91 to 93 of 93 Results
Jun 18, 2014 - Statistical Natural Language Processing Group
Hieber, Felix; Schamoni, Shigehiko; Sokolov, Artem; Riezler, Stefan, 2014, "WikiCLIR: A Cross-Lingual Retrieval Dataset from Wikipedia", https://doi.org/10.11588/data/10003, heiDATA, V1
WikiCLIR is a large-scale (German-English) retrieval data set for Cross-Language Information Retrieval (CLIR). It contains a total of 245,294 German single-sentence queries with 3,200,393 automatically extracted relevance judgments for 1,226,741 English Wikipedia articles as docu... |
Aug 13, 2014 - Database Systems Research Group
Strötgen, Jannik; Gertz, Michael, 2014, "WikiWarsDE Corpus", https://doi.org/10.11588/data/10026, heiDATA, V1
The WikiWarsDE corpus is a German corpus containing Wikipedia articles with annotations of temporal expressions. Its creation was motivated by the English WikiWars corpus (Mazur & Dale 2010). WikiWarsDE was developed to support research on temporal information extraction and norm... |
Feb 17, 2021 - Empirical Linguistics and Computational Language Modeling (LiMo)
Daza, Angel, 2021, "X-SRL Dataset and mBERT Word Aligner", https://doi.org/10.11588/data/HVXXIJ, heiDATA, V1
This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source annotations (for example labeled English sentences) into the target side (for example a German translation of th... |