heiDATA

Metrics

201,384 Downloads

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Subject: Computer and Information Science

11 to 20 of 93 Results

BoostCLIR: JP-EN Relevance Marked Patent Corpus Jun 16, 2014 - Statistical Natural Language Processing Group Sokolov, Artem; Jehl Laura; Hieber Felix; Ruppert, Eugen; Riezler, Stefan, 2014, "BoostCLIR: JP-EN Relevance Marked Patent Corpus", https://doi.org/10.11588/data/10001, heiDATA, V1 BoostCLIR is a bilingual (Japanese-English) corpus of patent abstracts, extracted from the MAREC patent data, and the data from the NTCIR PatentMT workshop collections, accompanied with relevance judgements for the task of patent prior-art search. Important: The English side of t...
BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018) Feb 6, 2019 - AIPHES Heinzerling, Benjamin, 2019, "BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)", https://doi.org/10.11588/data/V9CXPR, heiDATA, V1 BPEmb is a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE). In an evaluation using fine-grained entity typing as testbed, BPEmb performs competitively, and for some languages better than alternative subword approaches, while r...
Cataloging Cultural Objects (CCO) – The CCO Commons examples in VRA Core 4 XML Jul 31, 2020 - Cluster of Excellence - Asia and Europe in a Global Context Arnold, Matthias; Dober, Agnes, 2020, "Cataloging Cultural Objects (CCO) – The CCO Commons examples in VRA Core 4 XML", https://doi.org/10.11588/data/KKTC9G, heiDATA, V1 “Cataloging Cultural Objects - a Guide to Describing Cultural Works and Their Images” (CCO) provides a data content standard for catalogers of cultural heritage. It is a guidebook for how to populate data elements and where to apply controlled vocabulary standards. The guide is f...
CITADEL: Computational Investigation of the Topographical and Architectural Designs in an Evolving Landscape (Research Data) Jul 20, 2023 - arthistoricum.net@heiDATA Pattee, Aaron, 2023, "CITADEL: Computational Investigation of the Topographical and Architectural Designs in an Evolving Landscape (Research Data)", https://doi.org/10.11588/data/ZDOC7O, heiDATA, V1 The data found in this repository contain the basis for the historical, architectural, and geo-spatial analyses discussed in the dissertation entitled: CITADEL – Computation Investigation of the Topographical and Architectural Designs in an Evolving Landscape. These data include...
CO-NNECT Feb 26, 2024 - RATIO_EXPLAIN Becker, Maria, 2024, "CO-NNECT", https://doi.org/10.11588/data/SAJAD3, heiDATA, V1 This repository contains our path generation framework Co-NNECT, in which we combine two models for establishing knowledge relations and paths between concepts from sentences, as a form of explicitation of implicit knowledge: COREC-LM (COmmonsense knowledge RElation Classificatio...
CoCo-Ex Feb 26, 2024 - RATIO_EXPLAIN Becker, Maria, 2024, "CoCo-Ex", https://doi.org/10.11588/data/K8MCIW, heiDATA, V1 CoCo-Ex extracts meaningful concepts from natural language texts and maps them to conjunct concept nodes in ConceptNet, utilizing the maximum of relational information stored in the ConceptNet knowledge graph.
Collagen breaks at weak sacrificial bonds taming its mechanoradicals [Data] Mar 24, 2023 - HITS MBM Rennekamp, Benedikt; Gräter, Frauke, 2023, "Collagen breaks at weak sacrificial bonds taming its mechanoradicals [Data]", https://doi.org/10.11588/data/HJ6SVM, heiDATA, V1, UNF:6:Da6omTKjrSvNcDwyJxuHOg== [fileUNF] This dataset contains input files for MD simulations, derived breakage counts from these simulations that are used to generate the figures in the publication, and the experimental, uncropped SDS-PAGE gels of the presented results in the related publication. Abstract of related pu...
Converter for content-to-head style syntactic dependencies Mar 26, 2020 - Empirical Linguistics and Computational Language Modeling (LiMo) Rehbein, Ines; Steen, Julius; Do, Bich-Ngoc; Frank, Anette, 2020, "Converter for content-to-head style syntactic dependencies", https://doi.org/10.11588/data/HE3BAZ, heiDATA, V1 A set of Python scripts that convert function-head style encodings in dependency treebanks in a content-head style encoding (as used in the UD treebanks) and vice versa (for adpositions, copula and coordination). For more information, see (Rehbein, Steen, Do & Frank 2017).
COREC – A neural multi-label COmmonsense RElation Classification system Oct 22, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Becker, Maria, 2019, "COREC – A neural multi-label COmmonsense RElation Classification system", https://doi.org/10.11588/data/E5EHBV, heiDATA, V1 We examine the learnability of Commonsense knowledge relations as represented in CONCEPTNET. We develop a neural open world multi-label classification system that focuses on the evaluation of classification accuracy for individual relations. Based on an in-depth study of the spec...
Datasets for Dependency Tree Reranking Nov 13, 2023 - Neural Techniques for German Dependency Parsing Do, Bich-Ngoc; Rehbein, Ines, 2023, "Datasets for Dependency Tree Reranking", https://doi.org/10.11588/data/E5NOYH, heiDATA, V1 This resource contains the datasets for dependency tree reranking in 3 languages: English, German and Czech. The creation, analysis and experiment results of the datasets are described in the paper: Do and Rehbein (2020). "Neural Reranking for Dependency Parsing: An Evaluation".

BoostCLIR: JP-EN Relevance Marked Patent Corpus

Jun 16, 2014 - Statistical Natural Language Processing Group

Sokolov, Artem; Jehl Laura; Hieber Felix; Ruppert, Eugen; Riezler, Stefan, 2014, "BoostCLIR: JP-EN Relevance Marked Patent Corpus", https://doi.org/10.11588/data/10001, heiDATA, V1

BoostCLIR is a bilingual (Japanese-English) corpus of patent abstracts, extracted from the MAREC patent data, and the data from the NTCIR PatentMT workshop collections, accompanied with relevance judgements for the task of patent prior-art search. Important: The English side of t...

BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)

Feb 6, 2019 - AIPHES

Heinzerling, Benjamin, 2019, "BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)", https://doi.org/10.11588/data/V9CXPR, heiDATA, V1

BPEmb is a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE). In an evaluation using fine-grained entity typing as testbed, BPEmb performs competitively, and for some languages better than alternative subword approaches, while r...

Cataloging Cultural Objects (CCO) – The CCO Commons examples in VRA Core 4 XML

Jul 31, 2020 - Cluster of Excellence - Asia and Europe in a Global Context

Arnold, Matthias; Dober, Agnes, 2020, "Cataloging Cultural Objects (CCO) – The CCO Commons examples in VRA Core 4 XML", https://doi.org/10.11588/data/KKTC9G, heiDATA, V1

“Cataloging Cultural Objects - a Guide to Describing Cultural Works and Their Images” (CCO) provides a data content standard for catalogers of cultural heritage. It is a guidebook for how to populate data elements and where to apply controlled vocabulary standards. The guide is f...

CITADEL: Computational Investigation of the Topographical and Architectural Designs in an Evolving Landscape (Research Data)

Jul 20, 2023 - arthistoricum.net@heiDATA

Pattee, Aaron, 2023, "CITADEL: Computational Investigation of the Topographical and Architectural Designs in an Evolving Landscape (Research Data)", https://doi.org/10.11588/data/ZDOC7O, heiDATA, V1

The data found in this repository contain the basis for the historical, architectural, and geo-spatial analyses discussed in the dissertation entitled: CITADEL – Computation Investigation of the Topographical and Architectural Designs in an Evolving Landscape. These data include...

CO-NNECT

Feb 26, 2024 - RATIO_EXPLAIN

Becker, Maria, 2024, "CO-NNECT", https://doi.org/10.11588/data/SAJAD3, heiDATA, V1

This repository contains our path generation framework Co-NNECT, in which we combine two models for establishing knowledge relations and paths between concepts from sentences, as a form of explicitation of implicit knowledge: COREC-LM (COmmonsense knowledge RElation Classificatio...

CoCo-Ex

Feb 26, 2024 - RATIO_EXPLAIN

Becker, Maria, 2024, "CoCo-Ex", https://doi.org/10.11588/data/K8MCIW, heiDATA, V1

CoCo-Ex extracts meaningful concepts from natural language texts and maps them to conjunct concept nodes in ConceptNet, utilizing the maximum of relational information stored in the ConceptNet knowledge graph.

Collagen breaks at weak sacrificial bonds taming its mechanoradicals [Data]

Mar 24, 2023 - HITS MBM

Rennekamp, Benedikt; Gräter, Frauke, 2023, "Collagen breaks at weak sacrificial bonds taming its mechanoradicals [Data]", https://doi.org/10.11588/data/HJ6SVM, heiDATA, V1, UNF:6:Da6omTKjrSvNcDwyJxuHOg== [fileUNF]

This dataset contains input files for MD simulations, derived breakage counts from these simulations that are used to generate the figures in the publication, and the experimental, uncropped SDS-PAGE gels of the presented results in the related publication. Abstract of related pu...

Converter for content-to-head style syntactic dependencies

Mar 26, 2020 - Empirical Linguistics and Computational Language Modeling (LiMo)

Rehbein, Ines; Steen, Julius; Do, Bich-Ngoc; Frank, Anette, 2020, "Converter for content-to-head style syntactic dependencies", https://doi.org/10.11588/data/HE3BAZ, heiDATA, V1

A set of Python scripts that convert function-head style encodings in dependency treebanks in a content-head style encoding (as used in the UD treebanks) and vice versa (for adpositions, copula and coordination). For more information, see (Rehbein, Steen, Do & Frank 2017).

COREC – A neural multi-label COmmonsense RElation Classification system

Oct 22, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Becker, Maria, 2019, "COREC – A neural multi-label COmmonsense RElation Classification system", https://doi.org/10.11588/data/E5EHBV, heiDATA, V1

We examine the learnability of Commonsense knowledge relations as represented in CONCEPTNET. We develop a neural open world multi-label classification system that focuses on the evaluation of classification accuracy for individual relations. Based on an in-depth study of the spec...

Datasets for Dependency Tree Reranking

Nov 13, 2023 - Neural Techniques for German Dependency Parsing

Do, Bich-Ngoc; Rehbein, Ines, 2023, "Datasets for Dependency Tree Reranking", https://doi.org/10.11588/data/E5NOYH, heiDATA, V1

This resource contains the datasets for dependency tree reranking in 3 languages: English, German and Czech. The creation, analysis and experiment results of the datasets are described in the paper: Do and Rehbein (2020). "Neural Reranking for Dependency Parsing: An Evaluation".

Add Data

Share Dataverse

Link Dataverse

Reset Modifications