heiDATA

Metrics

193,550 Downloads

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Subject: Computer and Information Science Subject: Arts and Humanities

31 to 40 of 51 Results

Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates Mar 21, 2023 - Ground truth data for HTR on South Asian Scripts Derrick, Tom; British Library, 2023, "Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates", https://doi.org/10.11588/data/AIQSXL, heiDATA, V1 This dataset comprises 81 digitised images (TIFF files) drawn from a selection of early printed Bengali books (1713-1914) digitised through the Two Centuries of Indian Print project (https://www.bl.uk/projects/two-centuries-of-indian-print). Also contained are ground truth transc...
GermEval-2018 Corpus (DE) Sep 2, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Wiegand, Michael, 2019, "GermEval-2018 Corpus (DE)", https://doi.org/10.11588/data/0B5VML, heiDATA, V1 This dataset comprises the training and test data (German tweets) from the GermEval 2018 Shared on Offensive Language Detection.
German Twitter Titling Corpus Jan 20, 2021 - Empirical Linguistics and Computational Language Modeling (LiMo) van den Berg, Esther; Korfhage, Katharina; Ruppenhofer, Josef; Wiegand, Michael; Markert, Katja, 2020, "German Twitter Titling Corpus", https://doi.org/10.11588/data/AOSUY6, heiDATA, V2, UNF:6:14BxjwJS7Q3mfI6ei7iBBw== [fileUNF] The German Titling Twitter Corpus consists of 1904 stance-annotated tweets collected in June/July 2018 mentioning 24 German politicians with a doctoral degree. The Addendum contains an additional 296 stance-annotated tweets from each month of 2018 mentioning 10 politicians with a...
German causal language annotations and lexicon (verbs, nouns, prepositions) (DE) Mar 26, 2020 - Empirical Linguistics and Computational Language Modeling (LiMo) Rehbein, Ines; Ruppenhofer, Josef, 2020, "German causal language annotations and lexicon (verbs, nouns, prepositions) (DE)", https://doi.org/10.11588/data/ZHI94V, heiDATA, V1 Annotations of causal verbs, nouns and prepositions in context and lexicon file for causal verbs, nouns and prepositions.
GER_SET: Situation Entity Type labelled corpus for German Dec 10, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Becker, Maria, 2019, "GER_SET: Situation Entity Type labelled corpus for German", https://doi.org/10.11588/data/BBQYD0, heiDATA, V1 Semantic clause types, also called Situation Entity (SE) types (Smith, 2003) are linguistic characterizations of aspectual properties shown to be useful for tasks like argumentation structure analysis (Becker et al., 2016), genre characterization (Palmer and Friedrich, 2014), and...
Genre-sensitive Neural Situation Entity classifier (DE, EN) Oct 22, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Becker, Maria, 2019, "Genre-sensitive Neural Situation Entity classifier (DE, EN)", https://doi.org/10.11588/data/XXKWU0, heiDATA, V1 This is a Classifier for situation entity types as described in Becker et al., 2017. These clause types depend on a combination of syntactic-semantic and contextual features. We explore this task in a deeplearning framework, where tuned word representations capture lexical, synta...
ErKon3D - Quantifying Deformation in Aegean Sealing Practices [Dataset] Jun 7, 2023 - IWR Computer Graphics Mara, Hubert, 2023, "ErKon3D - Quantifying Deformation in Aegean Sealing Practices [Dataset]", https://doi.org/10.11588/data/UMJXI0, heiDATA, V1 In Bronze Aegean society, seals played an important role by authenticating, securing and marking. The study of the seals and their engraved motifs provides valuable insight into the social and political organization and administration of Aegean societies. A key research question...
Encoder-Decoder Model for Semantic Role Labeling Jan 23, 2020 - Empirical Linguistics and Computational Language Modeling (LiMo) Daza, Angel, 2020, "Encoder-Decoder Model for Semantic Role Labeling", https://doi.org/10.11588/data/TOI9NQ, heiDATA, V1 Abstract (Daza & Frank 2019): We propose a Cross-lingual Encoder-Decoder model that simultaneously translates and generates sentences with Semantic Role Labeling annotations in a resource-poor target language. Unlike annotation projection techniques, our model does not need paral...
DeModify Jul 15, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Nastase, Vivi; Fritz, Devon; Frank, Anette, 2019, "DeModify", https://doi.org/10.11588/data/KIWEMF, heiDATA, V1 deModify consists of 3631 instances, each with three annotations obtained through CrowdFlower. An instance is a short story in which a modifier is annotated with respect to its impact on the information in the story, assessed through its deletion from the context: crucial, not-cr...
Datasets for Dependency Tree Reranking Nov 13, 2023 - Neural Techniques for German Dependency Parsing Do, Bich-Ngoc; Rehbein, Ines, 2023, "Datasets for Dependency Tree Reranking", https://doi.org/10.11588/data/E5NOYH, heiDATA, V1 This resource contains the datasets for dependency tree reranking in 3 languages: English, German and Czech. The creation, analysis and experiment results of the datasets are described in the paper: Do and Rehbein (2020). "Neural Reranking for Dependency Parsing: An Evaluation".

Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates

Mar 21, 2023 - Ground truth data for HTR on South Asian Scripts

Derrick, Tom; British Library, 2023, "Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates", https://doi.org/10.11588/data/AIQSXL, heiDATA, V1

This dataset comprises 81 digitised images (TIFF files) drawn from a selection of early printed Bengali books (1713-1914) digitised through the Two Centuries of Indian Print project (https://www.bl.uk/projects/two-centuries-of-indian-print). Also contained are ground truth transc...

GermEval-2018 Corpus (DE)

Sep 2, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Wiegand, Michael, 2019, "GermEval-2018 Corpus (DE)", https://doi.org/10.11588/data/0B5VML, heiDATA, V1

This dataset comprises the training and test data (German tweets) from the GermEval 2018 Shared on Offensive Language Detection.

German Twitter Titling Corpus

Jan 20, 2021 - Empirical Linguistics and Computational Language Modeling (LiMo)

van den Berg, Esther; Korfhage, Katharina; Ruppenhofer, Josef; Wiegand, Michael; Markert, Katja, 2020, "German Twitter Titling Corpus", https://doi.org/10.11588/data/AOSUY6, heiDATA, V2, UNF:6:14BxjwJS7Q3mfI6ei7iBBw== [fileUNF]

The German Titling Twitter Corpus consists of 1904 stance-annotated tweets collected in June/July 2018 mentioning 24 German politicians with a doctoral degree. The Addendum contains an additional 296 stance-annotated tweets from each month of 2018 mentioning 10 politicians with a...

German causal language annotations and lexicon (verbs, nouns, prepositions) (DE)

Mar 26, 2020 - Empirical Linguistics and Computational Language Modeling (LiMo)

Rehbein, Ines; Ruppenhofer, Josef, 2020, "German causal language annotations and lexicon (verbs, nouns, prepositions) (DE)", https://doi.org/10.11588/data/ZHI94V, heiDATA, V1

Annotations of causal verbs, nouns and prepositions in context and lexicon file for causal verbs, nouns and prepositions.

GER_SET: Situation Entity Type labelled corpus for German

Dec 10, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Becker, Maria, 2019, "GER_SET: Situation Entity Type labelled corpus for German", https://doi.org/10.11588/data/BBQYD0, heiDATA, V1

Semantic clause types, also called Situation Entity (SE) types (Smith, 2003) are linguistic characterizations of aspectual properties shown to be useful for tasks like argumentation structure analysis (Becker et al., 2016), genre characterization (Palmer and Friedrich, 2014), and...

Genre-sensitive Neural Situation Entity classifier (DE, EN)

Oct 22, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Becker, Maria, 2019, "Genre-sensitive Neural Situation Entity classifier (DE, EN)", https://doi.org/10.11588/data/XXKWU0, heiDATA, V1

This is a Classifier for situation entity types as described in Becker et al., 2017. These clause types depend on a combination of syntactic-semantic and contextual features. We explore this task in a deeplearning framework, where tuned word representations capture lexical, synta...

ErKon3D - Quantifying Deformation in Aegean Sealing Practices [Dataset]

Jun 7, 2023 - IWR Computer Graphics

Mara, Hubert, 2023, "ErKon3D - Quantifying Deformation in Aegean Sealing Practices [Dataset]", https://doi.org/10.11588/data/UMJXI0, heiDATA, V1

In Bronze Aegean society, seals played an important role by authenticating, securing and marking. The study of the seals and their engraved motifs provides valuable insight into the social and political organization and administration of Aegean societies. A key research question...

Encoder-Decoder Model for Semantic Role Labeling

Jan 23, 2020 - Empirical Linguistics and Computational Language Modeling (LiMo)

Daza, Angel, 2020, "Encoder-Decoder Model for Semantic Role Labeling", https://doi.org/10.11588/data/TOI9NQ, heiDATA, V1

Abstract (Daza & Frank 2019): We propose a Cross-lingual Encoder-Decoder model that simultaneously translates and generates sentences with Semantic Role Labeling annotations in a resource-poor target language. Unlike annotation projection techniques, our model does not need paral...

DeModify

Jul 15, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Nastase, Vivi; Fritz, Devon; Frank, Anette, 2019, "DeModify", https://doi.org/10.11588/data/KIWEMF, heiDATA, V1

deModify consists of 3631 instances, each with three annotations obtained through CrowdFlower. An instance is a short story in which a modifier is annotated with respect to its impact on the information in the story, assessed through its deletion from the context: crucial, not-cr...

Datasets for Dependency Tree Reranking

Nov 13, 2023 - Neural Techniques for German Dependency Parsing

Do, Bich-Ngoc; Rehbein, Ines, 2023, "Datasets for Dependency Tree Reranking", https://doi.org/10.11588/data/E5NOYH, heiDATA, V1

This resource contains the datasets for dependency tree reranking in 3 languages: English, German and Czech. The creation, analysis and experiment results of the datasets are described in the paper: Do and Rehbein (2020). "Neural Reranking for Dependency Parsing: An Evaluation".

Add Data

Share Dataverse

Link Dataverse

Reset Modifications