heiDATA

Metrics

187,706 Downloads

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Publication Year: 2019 Subject: Computer and Information Science Subject: Arts and Humanities

1 to 10 of 20 Results

Twitter Titling Corpus Aug 23, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) van den Berg, Esther; Korfhage, Katharina; Ruppenhofer, Josef; Wiegand, Michael; Markert, Katja, 2019, "Twitter Titling Corpus", https://doi.org/10.11588/data/IOHXDF, heiDATA, V1, UNF:6:+F3lLKziwMvjy+xyktkilw== [fileUNF] The Twitter Titling Corpus contains 4002 stance-annotated tweets collected between 20 June 2017 and 30 August 2017 mentioning 6 presidents. Each tweet is annotated for the naming form used to refer to the president, for the purpose of a study on the relation between naming variat...
The MSC Data Set Oct 7, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Marasović, Ana; Zhou, Mengfei; Frank, Anette, 2019, "The MSC Data Set", https://doi.org/10.11588/data/JEESIQ, heiDATA, V1 From this page you can download resources we created for modal sense classification as reported in Zhou et al. (2015), Marasović et al. (2016) and Marasović and Frank (2015) (see "Related Publication" below): Heuristically sense-annotated training data acquired from EUROPARL and...
Sentiment View Lexicon (EN) Sep 5, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Wiegand, Michael; Ruppenhofer, Josef; Schulder, Marc, 2019, "Sentiment View Lexicon (EN)", https://doi.org/10.11588/data/2JK48O, heiDATA, V1 This gold standard contains sentiment expressions (verbs, nouns and adjectives) that have been annotated according to their (prior) sentiment view. Each sentiment expression is labelled either as actor or speaker view.
Sentiment Compound Data (DE) Sep 5, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Wiegand, Michael; Bocionek, Christine; Ruppenhofer, Josef, 2019, "Sentiment Compound Data (DE)", https://doi.org/10.11588/data/LSTRK3, heiDATA, V1 This dataset contains gold standards that are required for building a classifier that automatically extracts opinion (noun) compounds.
Opinion role extractor Sep 2, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Wiegand, Michael, 2019, "Opinion role extractor", https://doi.org/10.11588/data/3W7AQP, heiDATA, V1 System for the Extraction of Subjective Expressions, Sentiment Sources and Sentiment Targets from German Text
Negative Sampling for Learning Knowledge Graph Embeddings Aug 19, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Kotnis, Bhushan, 2019, "Negative Sampling for Learning Knowledge Graph Embeddings", https://doi.org/10.11588/data/YYULL2, heiDATA, V1 Reimplementation of four KG factorization methods and six negative sampling methods. Abstract Knowledge graphs are large, useful, but incomplete knowledge repositories. They encode knowledge through entities and relations which define each other through the connective structure o...
Multilingual Modal Sense Classification using a Convolutional Neural Network [Source Code] Oct 7, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Marasović, Ana, 2019, "Multilingual Modal Sense Classification using a Convolutional Neural Network [Source Code]", https://doi.org/10.11588/data/ERDJDI, heiDATA, V1 Abstract Modal sense classification (MSC) is aspecial WSD task that depends on themeaning of the proposition in the modal’s scope. We explore a CNN architecture for classifying modal sense in English and German. We show that CNNs are superior to manually designed feature-based cl...
Lexicon of Abusive Words (EN) Sep 2, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Wiegand, Michael, 2019, "Lexicon of Abusive Words (EN)", https://doi.org/10.11588/data/MKPEYV, heiDATA, V1 This goldstandard contains a bootstrapped lexicon of abusive words. The lexicon comprises a large set of English negative polar expressions annotated as either abusive or not.
KGE Algorithms Aug 19, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Kotnis, Bhushan, 2019, "KGE Algorithms", https://doi.org/10.11588/data/CSXYSS, heiDATA, V1 An updated method for link prediction that uses a regularization factor that models relation argument types Abstract (Kotnis and Nastase, 2017): Learning relations based on evidence from knowledge repositories relies on processing the available relation instances. Knowledge repos...
HeiCuBeDa Hilprecht - Heidelberg Cuneiform Benchmark Dataset for the Hilprecht Collection Mar 26, 2021 - IWR Computer Graphics Mara, Hubert, 2019, "HeiCuBeDa Hilprecht - Heidelberg Cuneiform Benchmark Dataset for the Hilprecht Collection", https://doi.org/10.11588/data/IE8CCN, heiDATA, V2 The number of known cuneiform tablets is assumed to be in the hundreds of thousands. A fraction has been published by printing photographs and manual tracings in books, which is collected by the online Cuneiform Digital Library Initiative (CDLI) catalog including some of these im...

Twitter Titling Corpus

Aug 23, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

van den Berg, Esther; Korfhage, Katharina; Ruppenhofer, Josef; Wiegand, Michael; Markert, Katja, 2019, "Twitter Titling Corpus", https://doi.org/10.11588/data/IOHXDF, heiDATA, V1, UNF:6:+F3lLKziwMvjy+xyktkilw== [fileUNF]

The Twitter Titling Corpus contains 4002 stance-annotated tweets collected between 20 June 2017 and 30 August 2017 mentioning 6 presidents. Each tweet is annotated for the naming form used to refer to the president, for the purpose of a study on the relation between naming variat...

The MSC Data Set

Oct 7, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Marasović, Ana; Zhou, Mengfei; Frank, Anette, 2019, "The MSC Data Set", https://doi.org/10.11588/data/JEESIQ, heiDATA, V1

From this page you can download resources we created for modal sense classification as reported in Zhou et al. (2015), Marasović et al. (2016) and Marasović and Frank (2015) (see "Related Publication" below): Heuristically sense-annotated training data acquired from EUROPARL and...

Sentiment View Lexicon (EN)

Sep 5, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Wiegand, Michael; Ruppenhofer, Josef; Schulder, Marc, 2019, "Sentiment View Lexicon (EN)", https://doi.org/10.11588/data/2JK48O, heiDATA, V1

This gold standard contains sentiment expressions (verbs, nouns and adjectives) that have been annotated according to their (prior) sentiment view. Each sentiment expression is labelled either as actor or speaker view.

Sentiment Compound Data (DE)

Sep 5, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Wiegand, Michael; Bocionek, Christine; Ruppenhofer, Josef, 2019, "Sentiment Compound Data (DE)", https://doi.org/10.11588/data/LSTRK3, heiDATA, V1

This dataset contains gold standards that are required for building a classifier that automatically extracts opinion (noun) compounds.

Opinion role extractor

Sep 2, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Wiegand, Michael, 2019, "Opinion role extractor", https://doi.org/10.11588/data/3W7AQP, heiDATA, V1

System for the Extraction of Subjective Expressions, Sentiment Sources and Sentiment Targets from German Text

Negative Sampling for Learning Knowledge Graph Embeddings

Aug 19, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Kotnis, Bhushan, 2019, "Negative Sampling for Learning Knowledge Graph Embeddings", https://doi.org/10.11588/data/YYULL2, heiDATA, V1

Reimplementation of four KG factorization methods and six negative sampling methods. Abstract Knowledge graphs are large, useful, but incomplete knowledge repositories. They encode knowledge through entities and relations which define each other through the connective structure o...

Multilingual Modal Sense Classification using a Convolutional Neural Network [Source Code]

Oct 7, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Marasović, Ana, 2019, "Multilingual Modal Sense Classification using a Convolutional Neural Network [Source Code]", https://doi.org/10.11588/data/ERDJDI, heiDATA, V1

Abstract Modal sense classification (MSC) is aspecial WSD task that depends on themeaning of the proposition in the modal’s scope. We explore a CNN architecture for classifying modal sense in English and German. We show that CNNs are superior to manually designed feature-based cl...

Lexicon of Abusive Words (EN)

Sep 2, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Wiegand, Michael, 2019, "Lexicon of Abusive Words (EN)", https://doi.org/10.11588/data/MKPEYV, heiDATA, V1

This goldstandard contains a bootstrapped lexicon of abusive words. The lexicon comprises a large set of English negative polar expressions annotated as either abusive or not.

KGE Algorithms

Aug 19, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Kotnis, Bhushan, 2019, "KGE Algorithms", https://doi.org/10.11588/data/CSXYSS, heiDATA, V1

An updated method for link prediction that uses a regularization factor that models relation argument types Abstract (Kotnis and Nastase, 2017): Learning relations based on evidence from knowledge repositories relies on processing the available relation instances. Knowledge repos...

HeiCuBeDa Hilprecht - Heidelberg Cuneiform Benchmark Dataset for the Hilprecht Collection

Mar 26, 2021 - IWR Computer Graphics

Mara, Hubert, 2019, "HeiCuBeDa Hilprecht - Heidelberg Cuneiform Benchmark Dataset for the Hilprecht Collection", https://doi.org/10.11588/data/IE8CCN, heiDATA, V2

The number of known cuneiform tablets is assumed to be in the hundreds of thousands. A fraction has been published by printing photographs and manual tracings in books, which is collected by the online Cuneiform Digital Library Initiative (CDLI) catalog including some of these im...

Add Data

Share Dataverse

Link Dataverse

Reset Modifications