heiDATA

Metrics

189,636 Downloads

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Subject: Arts and Humanities Subject: Computer and Information Science

1 to 10 of 51 Results

A harmonised testsuite for social media POS tagging (DE) Mar 26, 2020 - Empirical Linguistics and Computational Language Modeling (LiMo) Rehbein, Ines; Ruppenhofer, Josef; Zimmermann, Victor, 2020, "A harmonised testsuite for social media POS tagging (DE)", https://doi.org/10.11588/data/KXLMHN, heiDATA, V1 A harmonised POS testsuite of web data, CMC and Twitter microtext, with word forms and STTS pos tags (+ some additional CMC-specific tags). UD pos tags have been automatically converted, based on the STTS pos tags. The data does not contain (manually corrected) lemma information....
Abstract graphs, abstract paths, grounded paths for Freebase and NELL Jul 15, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Nastase, Vivi; Kotnis, Bhushan, 2019, "Abstract graphs, abstract paths, grounded paths for Freebase and NELL", https://doi.org/10.11588/data/AVLFPZ, heiDATA, V1 We describe a method for representing knowledge graphs that capture an intensional representation of the original extensional information. This representation is very compact, and it abstracts away from individual links, allowing us to find better path candidates, as shown by the...
ACL word segmentation correction Jul 15, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Nastase, Vivi; Hitschler, Julian, 2019, "ACL word segmentation correction", https://doi.org/10.11588/data/VK99LU, heiDATA, V1 The data in this collection consists of two parallel directories, one ("raw") containing the raw text of 18850 articles from the ACL 2013/02 collection, the other ("re-segmented") the word-resegmented version of these articles, obtained using nematus, a seq2seq neural model used...
Affixoid Dataset (DE) Oct 8, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Ruppenhofer, Josef, 2019, "Affixoid Dataset (DE)", https://doi.org/10.11588/data/QKF4LT, heiDATA, V1, UNF:6:+MGK9lTPTXx7Rclu1BpPnw== [fileUNF] The dataset contains the manual annotations for the COLING 2018 submission "Distinguishing affixoid formations from compounds" by Josef Ruppenhofer, Michael Wiegand, Rebecca Wilm and Katja Markert. 1788 complex words containing one of 7 German suffixoid candidates (e.g. -hai, -go...
AMR parse quality prediction [Source Code] Jul 12, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Opitz, Juri, 2019, "AMR parse quality prediction [Source Code]", https://doi.org/10.11588/data/STHBGW, heiDATA, V1 Accuracy prediction for AMR parsing predicts 33 accuracy metrics for a given sentence and its (automatic) AMR parse Abstract (Opitz and Frank, 2019): Semantic proto-role labeling (SPRL) is an alternative to semantic role labeling (SRL) that moves beyond a categorical definition o...
Cataloging Cultural Objects (CCO) – The CCO Commons examples in VRA Core 4 XML Jul 31, 2020 - Cluster of Excellence - Asia and Europe in a Global Context Arnold, Matthias; Dober, Agnes, 2020, "Cataloging Cultural Objects (CCO) – The CCO Commons examples in VRA Core 4 XML", https://doi.org/10.11588/data/KKTC9G, heiDATA, V1 “Cataloging Cultural Objects - a Guide to Describing Cultural Works and Their Images” (CCO) provides a data content standard for catalogers of cultural heritage. It is a guidebook for how to populate data elements and where to apply controlled vocabulary standards. The guide is f...
CITADEL: Computational Investigation of the Topographical and Architectural Designs in an Evolving Landscape (Research Data) Jul 20, 2023 - arthistoricum.net@heiDATA Pattee, Aaron, 2023, "CITADEL: Computational Investigation of the Topographical and Architectural Designs in an Evolving Landscape (Research Data)", https://doi.org/10.11588/data/ZDOC7O, heiDATA, V1 The data found in this repository contain the basis for the historical, architectural, and geo-spatial analyses discussed in the dissertation entitled: CITADEL – Computation Investigation of the Topographical and Architectural Designs in an Evolving Landscape. These data include...
CO-NNECT Feb 26, 2024 - RATIO_EXPLAIN Becker, Maria, 2024, "CO-NNECT", https://doi.org/10.11588/data/SAJAD3, heiDATA, V1 This repository contains our path generation framework Co-NNECT, in which we combine two models for establishing knowledge relations and paths between concepts from sentences, as a form of explicitation of implicit knowledge: COREC-LM (COmmonsense knowledge RElation Classificatio...
CoCo-Ex Feb 26, 2024 - RATIO_EXPLAIN Becker, Maria, 2024, "CoCo-Ex", https://doi.org/10.11588/data/K8MCIW, heiDATA, V1 CoCo-Ex extracts meaningful concepts from natural language texts and maps them to conjunct concept nodes in ConceptNet, utilizing the maximum of relational information stored in the ConceptNet knowledge graph.
Converter for content-to-head style syntactic dependencies Mar 26, 2020 - Empirical Linguistics and Computational Language Modeling (LiMo) Rehbein, Ines; Steen, Julius; Do, Bich-Ngoc; Frank, Anette, 2020, "Converter for content-to-head style syntactic dependencies", https://doi.org/10.11588/data/HE3BAZ, heiDATA, V1 A set of Python scripts that convert function-head style encodings in dependency treebanks in a content-head style encoding (as used in the UD treebanks) and vice versa (for adpositions, copula and coordination). For more information, see (Rehbein, Steen, Do & Frank 2017).

A harmonised testsuite for social media POS tagging (DE)

Mar 26, 2020 - Empirical Linguistics and Computational Language Modeling (LiMo)

Rehbein, Ines; Ruppenhofer, Josef; Zimmermann, Victor, 2020, "A harmonised testsuite for social media POS tagging (DE)", https://doi.org/10.11588/data/KXLMHN, heiDATA, V1

A harmonised POS testsuite of web data, CMC and Twitter microtext, with word forms and STTS pos tags (+ some additional CMC-specific tags). UD pos tags have been automatically converted, based on the STTS pos tags. The data does not contain (manually corrected) lemma information....

Abstract graphs, abstract paths, grounded paths for Freebase and NELL

Jul 15, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Nastase, Vivi; Kotnis, Bhushan, 2019, "Abstract graphs, abstract paths, grounded paths for Freebase and NELL", https://doi.org/10.11588/data/AVLFPZ, heiDATA, V1

We describe a method for representing knowledge graphs that capture an intensional representation of the original extensional information. This representation is very compact, and it abstracts away from individual links, allowing us to find better path candidates, as shown by the...

ACL word segmentation correction

Jul 15, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Nastase, Vivi; Hitschler, Julian, 2019, "ACL word segmentation correction", https://doi.org/10.11588/data/VK99LU, heiDATA, V1

The data in this collection consists of two parallel directories, one ("raw") containing the raw text of 18850 articles from the ACL 2013/02 collection, the other ("re-segmented") the word-resegmented version of these articles, obtained using nematus, a seq2seq neural model used...

Affixoid Dataset (DE)

Oct 8, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Ruppenhofer, Josef, 2019, "Affixoid Dataset (DE)", https://doi.org/10.11588/data/QKF4LT, heiDATA, V1, UNF:6:+MGK9lTPTXx7Rclu1BpPnw== [fileUNF]

The dataset contains the manual annotations for the COLING 2018 submission "Distinguishing affixoid formations from compounds" by Josef Ruppenhofer, Michael Wiegand, Rebecca Wilm and Katja Markert. 1788 complex words containing one of 7 German suffixoid candidates (e.g. -hai, -go...

AMR parse quality prediction [Source Code]

Jul 12, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Opitz, Juri, 2019, "AMR parse quality prediction [Source Code]", https://doi.org/10.11588/data/STHBGW, heiDATA, V1

Accuracy prediction for AMR parsing predicts 33 accuracy metrics for a given sentence and its (automatic) AMR parse Abstract (Opitz and Frank, 2019): Semantic proto-role labeling (SPRL) is an alternative to semantic role labeling (SRL) that moves beyond a categorical definition o...

Cataloging Cultural Objects (CCO) – The CCO Commons examples in VRA Core 4 XML

Jul 31, 2020 - Cluster of Excellence - Asia and Europe in a Global Context

Arnold, Matthias; Dober, Agnes, 2020, "Cataloging Cultural Objects (CCO) – The CCO Commons examples in VRA Core 4 XML", https://doi.org/10.11588/data/KKTC9G, heiDATA, V1

“Cataloging Cultural Objects - a Guide to Describing Cultural Works and Their Images” (CCO) provides a data content standard for catalogers of cultural heritage. It is a guidebook for how to populate data elements and where to apply controlled vocabulary standards. The guide is f...

CITADEL: Computational Investigation of the Topographical and Architectural Designs in an Evolving Landscape (Research Data)

Jul 20, 2023 - arthistoricum.net@heiDATA

Pattee, Aaron, 2023, "CITADEL: Computational Investigation of the Topographical and Architectural Designs in an Evolving Landscape (Research Data)", https://doi.org/10.11588/data/ZDOC7O, heiDATA, V1

The data found in this repository contain the basis for the historical, architectural, and geo-spatial analyses discussed in the dissertation entitled: CITADEL – Computation Investigation of the Topographical and Architectural Designs in an Evolving Landscape. These data include...

CO-NNECT

Feb 26, 2024 - RATIO_EXPLAIN

Becker, Maria, 2024, "CO-NNECT", https://doi.org/10.11588/data/SAJAD3, heiDATA, V1

This repository contains our path generation framework Co-NNECT, in which we combine two models for establishing knowledge relations and paths between concepts from sentences, as a form of explicitation of implicit knowledge: COREC-LM (COmmonsense knowledge RElation Classificatio...

CoCo-Ex

Feb 26, 2024 - RATIO_EXPLAIN

Becker, Maria, 2024, "CoCo-Ex", https://doi.org/10.11588/data/K8MCIW, heiDATA, V1

CoCo-Ex extracts meaningful concepts from natural language texts and maps them to conjunct concept nodes in ConceptNet, utilizing the maximum of relational information stored in the ConceptNet knowledge graph.

Converter for content-to-head style syntactic dependencies

Mar 26, 2020 - Empirical Linguistics and Computational Language Modeling (LiMo)

Rehbein, Ines; Steen, Julius; Do, Bich-Ngoc; Frank, Anette, 2020, "Converter for content-to-head style syntactic dependencies", https://doi.org/10.11588/data/HE3BAZ, heiDATA, V1

A set of Python scripts that convert function-head style encodings in dependency treebanks in a content-head style encoding (as used in the UD treebanks) and vice versa (for adpositions, copula and coordination). For more information, see (Rehbein, Steen, Do & Frank 2017).

Add Data

Share Dataverse

Link Dataverse

Reset Modifications