heiDATA

Metrics

193,550 Downloads

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Subject: Computer and Information Science

81 to 90 of 93 Results

Cataloging Cultural Objects (CCO) – The CCO Commons examples in VRA Core 4 XML Jul 31, 2020 - Cluster of Excellence - Asia and Europe in a Global Context Arnold, Matthias; Dober, Agnes, 2020, "Cataloging Cultural Objects (CCO) – The CCO Commons examples in VRA Core 4 XML", https://doi.org/10.11588/data/KKTC9G, heiDATA, V1 “Cataloging Cultural Objects - a Guide to Describing Cultural Works and Their Images” (CCO) provides a data content standard for catalogers of cultural heritage. It is a guidebook for how to populate data elements and where to apply controlled vocabulary standards. The guide is f...
BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018) Feb 6, 2019 - AIPHES Heinzerling, Benjamin, 2019, "BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)", https://doi.org/10.11588/data/V9CXPR, heiDATA, V1 BPEmb is a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE). In an evaluation using fine-grained entity typing as testbed, BPEmb performs competitively, and for some languages better than alternative subword approaches, while r...
BoostCLIR: JP-EN Relevance Marked Patent Corpus Jun 16, 2014 - Statistical Natural Language Processing Group Sokolov, Artem; Jehl Laura; Hieber Felix; Ruppert, Eugen; Riezler, Stefan, 2014, "BoostCLIR: JP-EN Relevance Marked Patent Corpus", https://doi.org/10.11588/data/10001, heiDATA, V1 BoostCLIR is a bilingual (Japanese-English) corpus of patent abstracts, extracted from the MAREC patent data, and the data from the NTCIR PatentMT workshop collections, accompanied with relevance judgements for the task of patent prior-art search. Important: The English side of t...
AMR parse quality prediction [Source Code] Jul 12, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Opitz, Juri, 2019, "AMR parse quality prediction [Source Code]", https://doi.org/10.11588/data/STHBGW, heiDATA, V1 Accuracy prediction for AMR parsing predicts 33 accuracy metrics for a given sentence and its (automatic) AMR parse Abstract (Opitz and Frank, 2019): Semantic proto-role labeling (SPRL) is an alternative to semantic role labeling (SRL) that moves beyond a categorical definition o...
Affixoid Dataset (DE) Oct 8, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Ruppenhofer, Josef, 2019, "Affixoid Dataset (DE)", https://doi.org/10.11588/data/QKF4LT, heiDATA, V1, UNF:6:+MGK9lTPTXx7Rclu1BpPnw== [fileUNF] The dataset contains the manual annotations for the COLING 2018 submission "Distinguishing affixoid formations from compounds" by Josef Ruppenhofer, Michael Wiegand, Rebecca Wilm and Katja Markert. 1788 complex words containing one of 7 German suffixoid candidates (e.g. -hai, -go...
ACL word segmentation correction Jul 15, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Nastase, Vivi; Hitschler, Julian, 2019, "ACL word segmentation correction", https://doi.org/10.11588/data/VK99LU, heiDATA, V1 The data in this collection consists of two parallel directories, one ("raw") containing the raw text of 18850 articles from the ACL 2013/02 collection, the other ("re-segmented") the word-resegmented version of these articles, obtained using nematus, a seq2seq neural model used...
Accompanying Code for Chapter 4 of the PhD Thesis "Global Inference and Local Syntax Representations for Event Extraction" Nov 24, 2021 - PhD related material - Faculty of Modern Languages Judea, Alex, 2021, "Accompanying Code for Chapter 4 of the PhD Thesis "Global Inference and Local Syntax Representations for Event Extraction"", https://doi.org/10.11588/data/CZZEKX, heiDATA, V1 This release contains the source code used for Chapter 4 of the PhD thesis "Global Inference and Local Syntax Representations for Event Extraction". The code served as a testbed for the assumption that dependency graphs can be an important information source for event extraction....
Accompanying Code and Models for Chapter 5 of the PhD Thesis "Global Inference and Local Syntax Representations for Event Extraction" Nov 24, 2021 - PhD related material - Faculty of Modern Languages Judea, Alex, 2021, "Accompanying Code and Models for Chapter 5 of the PhD Thesis "Global Inference and Local Syntax Representations for Event Extraction"", https://doi.org/10.11588/data/Z1RKOI, heiDATA, V1 This release contains the source code used for Chapter 5 of the PhD thesis "Global Inference and Local Syntax Representations for Event Extraction". It contains the implementation of the modular, graph-based event extractor described there. Furthermore, the release contains five...
Abstract graphs, abstract paths, grounded paths for Freebase and NELL Jul 15, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Nastase, Vivi; Kotnis, Bhushan, 2019, "Abstract graphs, abstract paths, grounded paths for Freebase and NELL", https://doi.org/10.11588/data/AVLFPZ, heiDATA, V1 We describe a method for representing knowledge graphs that capture an intensional representation of the original extensional information. This representation is very compact, and it abstracts away from individual links, allowing us to find better path candidates, as shown by the...
Abstract Anaphora Resolution [Source Code] Feb 4, 2019 - AIPHES Marasovic, Ana, 2019, "Abstract Anaphora Resolution [Source Code]", https://doi.org/10.11588/data/UDMPY5, heiDATA, V1 Abstract Anaphora Resolution (AAR) aims to find the interpretation of nominal expressions (e.g., this result, those two actions) and pronominal expressions (e.g., this, that, it) that refer to abstract-object-antecedents such as facts, events, plans, actions, or situations. The f...

Cataloging Cultural Objects (CCO) – The CCO Commons examples in VRA Core 4 XML

Jul 31, 2020 - Cluster of Excellence - Asia and Europe in a Global Context

Arnold, Matthias; Dober, Agnes, 2020, "Cataloging Cultural Objects (CCO) – The CCO Commons examples in VRA Core 4 XML", https://doi.org/10.11588/data/KKTC9G, heiDATA, V1

“Cataloging Cultural Objects - a Guide to Describing Cultural Works and Their Images” (CCO) provides a data content standard for catalogers of cultural heritage. It is a guidebook for how to populate data elements and where to apply controlled vocabulary standards. The guide is f...

BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)

Feb 6, 2019 - AIPHES

Heinzerling, Benjamin, 2019, "BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)", https://doi.org/10.11588/data/V9CXPR, heiDATA, V1

BPEmb is a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE). In an evaluation using fine-grained entity typing as testbed, BPEmb performs competitively, and for some languages better than alternative subword approaches, while r...

BoostCLIR: JP-EN Relevance Marked Patent Corpus

Jun 16, 2014 - Statistical Natural Language Processing Group

Sokolov, Artem; Jehl Laura; Hieber Felix; Ruppert, Eugen; Riezler, Stefan, 2014, "BoostCLIR: JP-EN Relevance Marked Patent Corpus", https://doi.org/10.11588/data/10001, heiDATA, V1

BoostCLIR is a bilingual (Japanese-English) corpus of patent abstracts, extracted from the MAREC patent data, and the data from the NTCIR PatentMT workshop collections, accompanied with relevance judgements for the task of patent prior-art search. Important: The English side of t...

AMR parse quality prediction [Source Code]

Jul 12, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Opitz, Juri, 2019, "AMR parse quality prediction [Source Code]", https://doi.org/10.11588/data/STHBGW, heiDATA, V1

Accuracy prediction for AMR parsing predicts 33 accuracy metrics for a given sentence and its (automatic) AMR parse Abstract (Opitz and Frank, 2019): Semantic proto-role labeling (SPRL) is an alternative to semantic role labeling (SRL) that moves beyond a categorical definition o...

Affixoid Dataset (DE)

Oct 8, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Ruppenhofer, Josef, 2019, "Affixoid Dataset (DE)", https://doi.org/10.11588/data/QKF4LT, heiDATA, V1, UNF:6:+MGK9lTPTXx7Rclu1BpPnw== [fileUNF]

The dataset contains the manual annotations for the COLING 2018 submission "Distinguishing affixoid formations from compounds" by Josef Ruppenhofer, Michael Wiegand, Rebecca Wilm and Katja Markert. 1788 complex words containing one of 7 German suffixoid candidates (e.g. -hai, -go...

ACL word segmentation correction

Jul 15, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Nastase, Vivi; Hitschler, Julian, 2019, "ACL word segmentation correction", https://doi.org/10.11588/data/VK99LU, heiDATA, V1

The data in this collection consists of two parallel directories, one ("raw") containing the raw text of 18850 articles from the ACL 2013/02 collection, the other ("re-segmented") the word-resegmented version of these articles, obtained using nematus, a seq2seq neural model used...

Accompanying Code for Chapter 4 of the PhD Thesis "Global Inference and Local Syntax Representations for Event Extraction"

Nov 24, 2021 - PhD related material - Faculty of Modern Languages

Judea, Alex, 2021, "Accompanying Code for Chapter 4 of the PhD Thesis "Global Inference and Local Syntax Representations for Event Extraction"", https://doi.org/10.11588/data/CZZEKX, heiDATA, V1

This release contains the source code used for Chapter 4 of the PhD thesis "Global Inference and Local Syntax Representations for Event Extraction". The code served as a testbed for the assumption that dependency graphs can be an important information source for event extraction....

Accompanying Code and Models for Chapter 5 of the PhD Thesis "Global Inference and Local Syntax Representations for Event Extraction"

Nov 24, 2021 - PhD related material - Faculty of Modern Languages

Judea, Alex, 2021, "Accompanying Code and Models for Chapter 5 of the PhD Thesis "Global Inference and Local Syntax Representations for Event Extraction"", https://doi.org/10.11588/data/Z1RKOI, heiDATA, V1

This release contains the source code used for Chapter 5 of the PhD thesis "Global Inference and Local Syntax Representations for Event Extraction". It contains the implementation of the modular, graph-based event extractor described there. Furthermore, the release contains five...

Abstract graphs, abstract paths, grounded paths for Freebase and NELL

Jul 15, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Nastase, Vivi; Kotnis, Bhushan, 2019, "Abstract graphs, abstract paths, grounded paths for Freebase and NELL", https://doi.org/10.11588/data/AVLFPZ, heiDATA, V1

We describe a method for representing knowledge graphs that capture an intensional representation of the original extensional information. This representation is very compact, and it abstracts away from individual links, allowing us to find better path candidates, as shown by the...

Abstract Anaphora Resolution [Source Code]

Feb 4, 2019 - AIPHES

Marasovic, Ana, 2019, "Abstract Anaphora Resolution [Source Code]", https://doi.org/10.11588/data/UDMPY5, heiDATA, V1

Abstract Anaphora Resolution (AAR) aims to find the interpretation of nominal expressions (e.g., this result, those two actions) and pronominal expressions (e.g., this, that, it) that refer to abstract-object-antecedents such as facts, events, plans, actions, or situations. The f...

Add Data

Share Dataverse

Link Dataverse

Reset Modifications