heiDATA

Metrics

197,264 Downloads

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

There was an error with your search parameters. Please clear your search and try again.

1 to 10 of 71 Results

A filled-up phonological buffer does not block conceptualization: evidence from a reaction time paradigm [Research Data] Oct 14, 2022 - Heidelberg University Language and Cognition Lab Gerwien, Johannes; Stutterheim, Christiane v.; Rummel, Jan, 2022, "A filled-up phonological buffer does not block conceptualization: evidence from a reaction time paradigm [Research Data]", https://doi.org/10.11588/data/6USPPW, heiDATA, V1 This dataset was published as part of the supplementary material for the research article: Gerwien, J. & Stutterheim, C. v. & Rummel, J. (2022). What is the interference in "verbal interference"?. Acta Psychologica (230). We provide reaction time data from two language production...
A harmonised testsuite for social media POS tagging (DE) Mar 26, 2020 - Empirical Linguistics and Computational Language Modeling (LiMo) Rehbein, Ines; Ruppenhofer, Josef; Zimmermann, Victor, 2020, "A harmonised testsuite for social media POS tagging (DE)", https://doi.org/10.11588/data/KXLMHN, heiDATA, V1 A harmonised POS testsuite of web data, CMC and Twitter microtext, with word forms and STTS pos tags (+ some additional CMC-specific tags). UD pos tags have been automatically converted, based on the STTS pos tags. The data does not contain (manually corrected) lemma information....
A Series of Soluble Thieno-fused Coronene Nanoribbons of Precise Lengths [Data] Nov 23, 2022 - Institute of Organic Chemistry - AK Mastalerz Yang, Xuan; Elbert, Sven Michael; Rominger, Frank; Mastalerz, Michael, 2022, "A Series of Soluble Thieno-fused Coronene Nanoribbons of Precise Lengths [Data]", https://doi.org/10.11588/data/30AOAT, heiDATA, V1 Among graphene nanoribbons (GNRs), reports on coronene-based GNRs were very rare, despite the unique optoelectronic properties of coronene. Herein, the synthesis of a series of structurally precise and soluble thieno-fused coronene nanoribbons (CR-1 to CR-4) with up to four coron...
Abstract Anaphora Resolution [Source Code] Feb 4, 2019 - AIPHES Marasovic, Ana, 2019, "Abstract Anaphora Resolution [Source Code]", https://doi.org/10.11588/data/UDMPY5, heiDATA, V1 Abstract Anaphora Resolution (AAR) aims to find the interpretation of nominal expressions (e.g., this result, those two actions) and pronominal expressions (e.g., this, that, it) that refer to abstract-object-antecedents such as facts, events, plans, actions, or situations. The f...
Abstract graphs, abstract paths, grounded paths for Freebase and NELL Jul 15, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Nastase, Vivi; Kotnis, Bhushan, 2019, "Abstract graphs, abstract paths, grounded paths for Freebase and NELL", https://doi.org/10.11588/data/AVLFPZ, heiDATA, V1 We describe a method for representing knowledge graphs that capture an intensional representation of the original extensional information. This representation is very compact, and it abstracts away from individual links, allowing us to find better path candidates, as shown by the...
ACL word segmentation correction Jul 15, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Nastase, Vivi; Hitschler, Julian, 2019, "ACL word segmentation correction", https://doi.org/10.11588/data/VK99LU, heiDATA, V1 The data in this collection consists of two parallel directories, one ("raw") containing the raw text of 18850 articles from the ACL 2013/02 collection, the other ("re-segmented") the word-resegmented version of these articles, obtained using nematus, a seq2seq neural model used...
Affixoid Dataset (DE) Oct 8, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Ruppenhofer, Josef, 2019, "Affixoid Dataset (DE)", https://doi.org/10.11588/data/QKF4LT, heiDATA, V1, UNF:6:+MGK9lTPTXx7Rclu1BpPnw== [fileUNF] The dataset contains the manual annotations for the COLING 2018 submission "Distinguishing affixoid formations from compounds" by Josef Ruppenhofer, Michael Wiegand, Rebecca Wilm and Katja Markert. 1788 complex words containing one of 7 German suffixoid candidates (e.g. -hai, -go...
BoostCLIR: JP-EN Relevance Marked Patent Corpus Jun 16, 2014 - Statistical Natural Language Processing Group Sokolov, Artem; Jehl Laura; Hieber Felix; Ruppert, Eugen; Riezler, Stefan, 2014, "BoostCLIR: JP-EN Relevance Marked Patent Corpus", https://doi.org/10.11588/data/10001, heiDATA, V1 BoostCLIR is a bilingual (Japanese-English) corpus of patent abstracts, extracted from the MAREC patent data, and the data from the NTCIR PatentMT workshop collections, accompanied with relevance judgements for the task of patent prior-art search. Important: The English side of t...
BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018) Feb 6, 2019 - AIPHES Heinzerling, Benjamin, 2019, "BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)", https://doi.org/10.11588/data/V9CXPR, heiDATA, V1 BPEmb is a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE). In an evaluation using fine-grained entity typing as testbed, BPEmb performs competitively, and for some languages better than alternative subword approaches, while r...
CARDIO:DE [V1.01] Dec 20, 2023 - NLU anonymized Christoph Dieterich, 2022, "CARDIO:DE [V1.01]", https://doi.org/10.11588/data/AFYQDY, heiDATA, V7 Version information CARDIO:DE 1.01 Resolved minor annotation bugs Updated TSV format (importable to INCEpTION release > 29.7) Added UIMA CAS corpus version Added annotation layer information files for INCEpTION import ( cardiode_[exp]_dependencies_inception.zip) Abstract: We pres...

A filled-up phonological buffer does not block conceptualization: evidence from a reaction time paradigm [Research Data]

Oct 14, 2022 - Heidelberg University Language and Cognition Lab

Gerwien, Johannes; Stutterheim, Christiane v.; Rummel, Jan, 2022, "A filled-up phonological buffer does not block conceptualization: evidence from a reaction time paradigm [Research Data]", https://doi.org/10.11588/data/6USPPW, heiDATA, V1

This dataset was published as part of the supplementary material for the research article: Gerwien, J. & Stutterheim, C. v. & Rummel, J. (2022). What is the interference in "verbal interference"?. Acta Psychologica (230). We provide reaction time data from two language production...

A harmonised testsuite for social media POS tagging (DE)

Mar 26, 2020 - Empirical Linguistics and Computational Language Modeling (LiMo)

Rehbein, Ines; Ruppenhofer, Josef; Zimmermann, Victor, 2020, "A harmonised testsuite for social media POS tagging (DE)", https://doi.org/10.11588/data/KXLMHN, heiDATA, V1

A harmonised POS testsuite of web data, CMC and Twitter microtext, with word forms and STTS pos tags (+ some additional CMC-specific tags). UD pos tags have been automatically converted, based on the STTS pos tags. The data does not contain (manually corrected) lemma information....

A Series of Soluble Thieno-fused Coronene Nanoribbons of Precise Lengths [Data]

Nov 23, 2022 - Institute of Organic Chemistry - AK Mastalerz

Yang, Xuan; Elbert, Sven Michael; Rominger, Frank; Mastalerz, Michael, 2022, "A Series of Soluble Thieno-fused Coronene Nanoribbons of Precise Lengths [Data]", https://doi.org/10.11588/data/30AOAT, heiDATA, V1

Among graphene nanoribbons (GNRs), reports on coronene-based GNRs were very rare, despite the unique optoelectronic properties of coronene. Herein, the synthesis of a series of structurally precise and soluble thieno-fused coronene nanoribbons (CR-1 to CR-4) with up to four coron...

Abstract Anaphora Resolution [Source Code]

Feb 4, 2019 - AIPHES

Marasovic, Ana, 2019, "Abstract Anaphora Resolution [Source Code]", https://doi.org/10.11588/data/UDMPY5, heiDATA, V1

Abstract Anaphora Resolution (AAR) aims to find the interpretation of nominal expressions (e.g., this result, those two actions) and pronominal expressions (e.g., this, that, it) that refer to abstract-object-antecedents such as facts, events, plans, actions, or situations. The f...

Abstract graphs, abstract paths, grounded paths for Freebase and NELL

Jul 15, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Nastase, Vivi; Kotnis, Bhushan, 2019, "Abstract graphs, abstract paths, grounded paths for Freebase and NELL", https://doi.org/10.11588/data/AVLFPZ, heiDATA, V1

We describe a method for representing knowledge graphs that capture an intensional representation of the original extensional information. This representation is very compact, and it abstracts away from individual links, allowing us to find better path candidates, as shown by the...

ACL word segmentation correction

Jul 15, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Nastase, Vivi; Hitschler, Julian, 2019, "ACL word segmentation correction", https://doi.org/10.11588/data/VK99LU, heiDATA, V1

The data in this collection consists of two parallel directories, one ("raw") containing the raw text of 18850 articles from the ACL 2013/02 collection, the other ("re-segmented") the word-resegmented version of these articles, obtained using nematus, a seq2seq neural model used...

Affixoid Dataset (DE)

Oct 8, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Ruppenhofer, Josef, 2019, "Affixoid Dataset (DE)", https://doi.org/10.11588/data/QKF4LT, heiDATA, V1, UNF:6:+MGK9lTPTXx7Rclu1BpPnw== [fileUNF]

The dataset contains the manual annotations for the COLING 2018 submission "Distinguishing affixoid formations from compounds" by Josef Ruppenhofer, Michael Wiegand, Rebecca Wilm and Katja Markert. 1788 complex words containing one of 7 German suffixoid candidates (e.g. -hai, -go...

BoostCLIR: JP-EN Relevance Marked Patent Corpus

Jun 16, 2014 - Statistical Natural Language Processing Group

Sokolov, Artem; Jehl Laura; Hieber Felix; Ruppert, Eugen; Riezler, Stefan, 2014, "BoostCLIR: JP-EN Relevance Marked Patent Corpus", https://doi.org/10.11588/data/10001, heiDATA, V1

BoostCLIR is a bilingual (Japanese-English) corpus of patent abstracts, extracted from the MAREC patent data, and the data from the NTCIR PatentMT workshop collections, accompanied with relevance judgements for the task of patent prior-art search. Important: The English side of t...

BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)

Feb 6, 2019 - AIPHES

Heinzerling, Benjamin, 2019, "BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)", https://doi.org/10.11588/data/V9CXPR, heiDATA, V1

BPEmb is a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE). In an evaluation using fine-grained entity typing as testbed, BPEmb performs competitively, and for some languages better than alternative subword approaches, while r...

CARDIO:DE [V1.01]

Dec 20, 2023 - NLU anonymized

Christoph Dieterich, 2022, "CARDIO:DE [V1.01]", https://doi.org/10.11588/data/AFYQDY, heiDATA, V7

Version information CARDIO:DE 1.01 Resolved minor annotation bugs Updated TSV format (importable to INCEpTION release > 29.7) Added UIMA CAS corpus version Added annotation layer information files for INCEpTION import ( cardiode_[exp]_dependencies_inception.zip) Abstract: We pres...

Add Data

Share Dataverse

Link Dataverse

Reset Modifications