heiDATA

Metrics

187,706 Downloads

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Subject: Computer and Information Science

1 to 10 of 84 Results

BoostCLIR: JP-EN Relevance Marked Patent Corpus Jun 16, 2014 - Statistical Natural Language Processing Group Sokolov, Artem; Jehl Laura; Hieber Felix; Ruppert, Eugen; Riezler, Stefan, 2014, "BoostCLIR: JP-EN Relevance Marked Patent Corpus", https://doi.org/10.11588/data/10001, heiDATA, V1 BoostCLIR is a bilingual (Japanese-English) corpus of patent abstracts, extracted from the MAREC patent data, and the data from the NTCIR PatentMT workshop collections, accompanied with relevance judgements for the task of patent prior-art search. Important: The English side of t...
PatTR: Patent Translation Resource Jun 16, 2014 - Statistical Natural Language Processing Group Wäschle, Katharina; Riezler, Stefan, 2014, "PatTR: Patent Translation Resource", https://doi.org/10.11588/data/10002, heiDATA, V3 PatTR is a sentence-parallel corpus extracted from the MAREC patent collection. The current version contains more than 22 million German-English and 18 million French-English parallel sentences collected from all patent text sections as well as 5 million German-French sentence pa...
WikiCLIR: A Cross-Lingual Retrieval Dataset from Wikipedia Jun 18, 2014 - Statistical Natural Language Processing Group Hieber, Felix; Schamoni, Shigehiko; Sokolov, Artem; Riezler, Stefan, 2014, "WikiCLIR: A Cross-Lingual Retrieval Dataset from Wikipedia", https://doi.org/10.11588/data/10003, heiDATA, V1 WikiCLIR is a large-scale (German-English) retrieval data set for Cross-Language Information Retrieval (CLIR). It contains a total of 245,294 German single-sentence queries with 3,200,393 automatically extracted relevance judgments for 1,226,741 English Wikipedia articles as docu...
WikiWarsDE Corpus Aug 13, 2014 - Database Systems Research Group Strötgen, Jannik; Gertz, Michael, 2014, "WikiWarsDE Corpus", https://doi.org/10.11588/data/10026, heiDATA, V1 The WikiWarsDE corpus is a German corpus containing Wikipedia articles with annotations of temporal expressions. Its creation was motivated by the English WikiWars corpus (Mazur & Dale 2010). WikiWarsDE was developed to support research on temporal information extraction and norm...
MARC21-MARCXML-Konverter Nov 2, 2016 - Perspektive Bibliothek Boiger, Wolfgang, 2016, "MARC21-MARCXML-Konverter", https://doi.org/10.11588/data/10091, heiDATA, V1 Quellcode für eine Perl-Implementierung eines MARC21-MARCXML-Konverters.
Text und Data Mining an wissenschaftlichen Repositorien und Publikationsservern in Deutschland - Zusammenfassung der Ergebnisse einer Umfrage im Februar und März 2016 Nov 2, 2016 - Perspektive Bibliothek Drees, Bastian, 2016, "Text und Data Mining an wissenschaftlichen Repositorien und Publikationsservern in Deutschland - Zusammenfassung der Ergebnisse einer Umfrage im Februar und März 2016", https://doi.org/10.11588/data/10090, heiDATA, V2 Es wurden die auf den Homepages angegebenen Ansprechpartner wissenschaftlicher Repositorien und Publikationsserver in Deutschland zu ihren Erfahrungen mit Text und Data Mining befragt. Die Befragung fand zwischen dem 22. und 26.2.2016 per E-Mail statt. Es wurden Ansprechpartner v...
Source Code, Data and Additional Material for the Thesis: "Identification of Software Features in Issue Tracking System Data" Feb 14, 2017 - PhD related Material - Faculty of Mathematics and Computer Science Merten, Thorsten, 2017, "Source Code, Data and Additional Material for the Thesis: "Identification of Software Features in Issue Tracking System Data"", https://doi.org/10.11588/data/10089, heiDATA, V2 This dataset provides the code and the data sets used in the PHD thesis "Identification of Software Features in Issue Tracking System Data" as well as the files that represent the results measured in experiments. For problem studies (e.g. chapters 10 and 11) the folders include t...
Selectional Preference Embeddings (EMNLP 2017) Jan 31, 2019 - AIPHES Heinzerling, Benjamin, 2019, "Selectional Preference Embeddings (EMNLP 2017)", https://doi.org/10.11588/data/FJQ4XL, heiDATA, V1 Joint embeddings of selectional preferences, words, and fine-grained entity types. The vocabulary consists of: verbs and their dependency relation separated by "@", e.g. "sink@nsubj" or "elect@dobj" words and short noun phrases, e.g. "Titanic" fine-grained entity types using the...
SRL4ORL: Improving Opinion Role Labeling Using Multi-Task Learning With Semantic Role Labeling [Source Code] Feb 4, 2019 - AIPHES Marasovic, Ana, 2019, "SRL4ORL: Improving Opinion Role Labeling Using Multi-Task Learning With Semantic Role Labeling [Source Code]", https://doi.org/10.11588/data/LWN9XE, heiDATA, V1 This repository contains code for reproducing experiments done in Marasovic and Frank (2018). Paper abstract: For over a decade, machine learning has been used to extract opinion-holder-target structures from text to answer the question "Who expressed what kind of sentiment towar...
Abstract Anaphora Resolution [Source Code] Feb 4, 2019 - AIPHES Marasovic, Ana, 2019, "Abstract Anaphora Resolution [Source Code]", https://doi.org/10.11588/data/UDMPY5, heiDATA, V1 Abstract Anaphora Resolution (AAR) aims to find the interpretation of nominal expressions (e.g., this result, those two actions) and pronominal expressions (e.g., this, that, it) that refer to abstract-object-antecedents such as facts, events, plans, actions, or situations. The f...

BoostCLIR: JP-EN Relevance Marked Patent Corpus

Jun 16, 2014 - Statistical Natural Language Processing Group

Sokolov, Artem; Jehl Laura; Hieber Felix; Ruppert, Eugen; Riezler, Stefan, 2014, "BoostCLIR: JP-EN Relevance Marked Patent Corpus", https://doi.org/10.11588/data/10001, heiDATA, V1

BoostCLIR is a bilingual (Japanese-English) corpus of patent abstracts, extracted from the MAREC patent data, and the data from the NTCIR PatentMT workshop collections, accompanied with relevance judgements for the task of patent prior-art search. Important: The English side of t...

PatTR: Patent Translation Resource

Jun 16, 2014 - Statistical Natural Language Processing Group

Wäschle, Katharina; Riezler, Stefan, 2014, "PatTR: Patent Translation Resource", https://doi.org/10.11588/data/10002, heiDATA, V3

PatTR is a sentence-parallel corpus extracted from the MAREC patent collection. The current version contains more than 22 million German-English and 18 million French-English parallel sentences collected from all patent text sections as well as 5 million German-French sentence pa...

WikiCLIR: A Cross-Lingual Retrieval Dataset from Wikipedia

Jun 18, 2014 - Statistical Natural Language Processing Group

Hieber, Felix; Schamoni, Shigehiko; Sokolov, Artem; Riezler, Stefan, 2014, "WikiCLIR: A Cross-Lingual Retrieval Dataset from Wikipedia", https://doi.org/10.11588/data/10003, heiDATA, V1

WikiCLIR is a large-scale (German-English) retrieval data set for Cross-Language Information Retrieval (CLIR). It contains a total of 245,294 German single-sentence queries with 3,200,393 automatically extracted relevance judgments for 1,226,741 English Wikipedia articles as docu...

WikiWarsDE Corpus

Aug 13, 2014 - Database Systems Research Group

Strötgen, Jannik; Gertz, Michael, 2014, "WikiWarsDE Corpus", https://doi.org/10.11588/data/10026, heiDATA, V1

The WikiWarsDE corpus is a German corpus containing Wikipedia articles with annotations of temporal expressions. Its creation was motivated by the English WikiWars corpus (Mazur & Dale 2010). WikiWarsDE was developed to support research on temporal information extraction and norm...

MARC21-MARCXML-Konverter

Nov 2, 2016 - Perspektive Bibliothek

Boiger, Wolfgang, 2016, "MARC21-MARCXML-Konverter", https://doi.org/10.11588/data/10091, heiDATA, V1

Quellcode für eine Perl-Implementierung eines MARC21-MARCXML-Konverters.

Text und Data Mining an wissenschaftlichen Repositorien und Publikationsservern in Deutschland - Zusammenfassung der Ergebnisse einer Umfrage im Februar und März 2016

Nov 2, 2016 - Perspektive Bibliothek

Drees, Bastian, 2016, "Text und Data Mining an wissenschaftlichen Repositorien und Publikationsservern in Deutschland - Zusammenfassung der Ergebnisse einer Umfrage im Februar und März 2016", https://doi.org/10.11588/data/10090, heiDATA, V2

Es wurden die auf den Homepages angegebenen Ansprechpartner wissenschaftlicher Repositorien und Publikationsserver in Deutschland zu ihren Erfahrungen mit Text und Data Mining befragt. Die Befragung fand zwischen dem 22. und 26.2.2016 per E-Mail statt. Es wurden Ansprechpartner v...

Source Code, Data and Additional Material for the Thesis: "Identification of Software Features in Issue Tracking System Data"

Feb 14, 2017 - PhD related Material - Faculty of Mathematics and Computer Science

Merten, Thorsten, 2017, "Source Code, Data and Additional Material for the Thesis: "Identification of Software Features in Issue Tracking System Data"", https://doi.org/10.11588/data/10089, heiDATA, V2

This dataset provides the code and the data sets used in the PHD thesis "Identification of Software Features in Issue Tracking System Data" as well as the files that represent the results measured in experiments. For problem studies (e.g. chapters 10 and 11) the folders include t...

Selectional Preference Embeddings (EMNLP 2017)

Jan 31, 2019 - AIPHES

Heinzerling, Benjamin, 2019, "Selectional Preference Embeddings (EMNLP 2017)", https://doi.org/10.11588/data/FJQ4XL, heiDATA, V1

Joint embeddings of selectional preferences, words, and fine-grained entity types. The vocabulary consists of: verbs and their dependency relation separated by "@", e.g. "sink@nsubj" or "elect@dobj" words and short noun phrases, e.g. "Titanic" fine-grained entity types using the...

SRL4ORL: Improving Opinion Role Labeling Using Multi-Task Learning With Semantic Role Labeling [Source Code]

Feb 4, 2019 - AIPHES

Marasovic, Ana, 2019, "SRL4ORL: Improving Opinion Role Labeling Using Multi-Task Learning With Semantic Role Labeling [Source Code]", https://doi.org/10.11588/data/LWN9XE, heiDATA, V1

This repository contains code for reproducing experiments done in Marasovic and Frank (2018). Paper abstract: For over a decade, machine learning has been used to extract opinion-holder-target structures from text to answer the question "Who expressed what kind of sentiment towar...

Abstract Anaphora Resolution [Source Code]

Feb 4, 2019 - AIPHES

Marasovic, Ana, 2019, "Abstract Anaphora Resolution [Source Code]", https://doi.org/10.11588/data/UDMPY5, heiDATA, V1

Abstract Anaphora Resolution (AAR) aims to find the interpretation of nominal expressions (e.g., this result, those two actions) and pronominal expressions (e.g., this, that, it) that refer to abstract-object-antecedents such as facts, events, plans, actions, or situations. The f...

Add Data

Share Dataverse

Link Dataverse

Reset Modifications