Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

1 to 10 of 15 Results
Jun 18, 2014
Hieber, Felix; Schamoni, Shigehiko; Sokolov, Artem; Riezler, Stefan, 2014, "WikiCLIR: A Cross-Lingual Retrieval Dataset from Wikipedia", https://doi.org/10.11588/data/10003, heiDATA, V1
WikiCLIR is a large-scale (German-English) retrieval data set for Cross-Language Information Retrieval (CLIR). It contains a total of 245,294 German single-sentence queries with 3,200,393 automatically extracted relevance judgments for 1,226,741 English Wikipedia articles as docu...
Plain Text - 1.8 KB - MD5: f2d15639b962977ea19a20308bccbfc4
README
Gzip Archive - 846.8 MB - MD5: 8f51894ff1c6ba2987d07dde62b3143d
data
data set
Jun 16, 2014
Sokolov, Artem; Jehl Laura; Hieber Felix; Ruppert, Eugen; Riezler, Stefan, 2014, "BoostCLIR: JP-EN Relevance Marked Patent Corpus", https://doi.org/10.11588/data/10001, heiDATA, V1
BoostCLIR is a bilingual (Japanese-English) corpus of patent abstracts, extracted from the MAREC patent data, and the data from the NTCIR PatentMT workshop collections, accompanied with relevance judgements for the task of patent prior-art search. Important: The English side of t...
Gzip Archive - 241.8 MB - MD5: 35fde8d24e6e80bf932490549c991a3f
data
data set
Plain Text - 1.5 KB - MD5: 544fa4db045f692d07a7d4596da99741
README
README
Jun 16, 2014
Wäschle, Katharina; Riezler, Stefan, 2014, "PatTR: Patent Translation Resource", https://doi.org/10.11588/data/10002, heiDATA, V3
PatTR is a sentence-parallel corpus extracted from the MAREC patent collection. The current version contains more than 22 million German-English and 18 million French-English parallel sentences collected from all patent text sections as well as 5 million German-French sentence pa...
Gzip Archive - 234.3 MB - MD5: 3bd140f68ab0eefe239e3e893012c991
de-en
data set de-en, Part 1/3 (License information: see part 1)
Gzip Archive - 1.3 GB - MD5: 2d1336fe8eecd100c01488f5e3e9bc97
de-en
data set de-en, Part 2/3
Gzip Archive - 1.3 GB - MD5: b838211b8ddc04001d79f7e1e2e066cb
de-en
data set de-en, Part 2/3 (License information: see part 1)
Add Data

Sign up or log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.