heiDATA

Metrics

197,264 Downloads

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Subject: Arts and Humanities Subject: Computer and Information Science

21 to 30 of 51 Results

Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates Mar 21, 2023 - Ground truth data for HTR on South Asian Scripts Derrick, Tom; British Library, 2023, "Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates", https://doi.org/10.11588/data/AIQSXL, heiDATA, V1 This dataset comprises 81 digitised images (TIFF files) drawn from a selection of early printed Bengali books (1713-1914) digitised through the Two Centuries of Indian Print project (https://www.bl.uk/projects/two-centuries-of-indian-print). Also contained are ground truth transc...
Head Selection Parsers and LSTM Labelers Nov 13, 2023 - Neural Techniques for German Dependency Parsing Do, Bich-Ngoc; Rehbein, Ines; Frank, Anette, 2023, "Head Selection Parsers and LSTM Labelers", https://doi.org/10.11588/data/BPWWJL, heiDATA, V1 This resource contains code, data and pre-trained models for various types of neural dependency parsers and LSTM labelers used in the papers: Do et al. (2017). "What Do We Need to Know About an Unknown Word When Parsing German" Do and Rehbein (2017). "Evaluating LSTM Models for G...
HeiCuBeDa Hilprecht - Heidelberg Cuneiform Benchmark Dataset for the Hilprecht Collection Mar 26, 2021 - IWR Computer Graphics Mara, Hubert, 2019, "HeiCuBeDa Hilprecht - Heidelberg Cuneiform Benchmark Dataset for the Hilprecht Collection", https://doi.org/10.11588/data/IE8CCN, heiDATA, V2 The number of known cuneiform tablets is assumed to be in the hundreds of thousands. A fraction has been published by printing photographs and manual tracings in books, which is collected by the online Cuneiform Digital Library Initiative (CDLI) catalog including some of these im...
IKAT-DE Feb 26, 2024 - RATIO_EXPLAIN Becker, Maria, 2024, "IKAT-DE", https://doi.org/10.11588/data/4BA5LY, heiDATA, V1 A corpus consisting of high-quality human annotations of missing and implied information in argumentative texts (German version). The data is further annotated with semantic clause types and commonsense knowledge relations.
IKAT-EN Feb 26, 2024 - RATIO_EXPLAIN Becker, Maria, 2024, "IKAT-EN", https://doi.org/10.11588/data/RUBM2E, heiDATA, V1, UNF:6:To3aHa8xO8P28fzpCz1Qvw== [fileUNF] A corpus consisting of high-quality human annotations of missing and implied information in argumentative texts (English version). The data is further annotated with semantic clause types and commonsense knowledge relations.
Jing bao ground truth – text block crops and annotations Nov 2, 2023 - Heidelberg Centre for Transcultural Studies (HCTS) Henke, Konstantin; Arnold, Matthias, 2023, "Jing bao ground truth – text block crops and annotations", https://doi.org/10.11588/data/PVYWKB, heiDATA, V1 This is the data set related to the paper "Language Model Assisted OCR Classification for Republican Chinese Newspaper Text", JDADH 11/2023. In this work, we present methods to obtain a neural optical character recognition (OCR) tool for article blocks in a Republican Chinese new...
KGE Algorithms Aug 19, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Kotnis, Bhushan, 2019, "KGE Algorithms", https://doi.org/10.11588/data/CSXYSS, heiDATA, V1 An updated method for link prediction that uses a regularization factor that models relation argument types Abstract (Kotnis and Nastase, 2017): Learning relations based on evidence from knowledge repositories relies on processing the available relation instances. Knowledge repos...
Lexicon of Abusive Words (EN) Sep 2, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Wiegand, Michael, 2019, "Lexicon of Abusive Words (EN)", https://doi.org/10.11588/data/MKPEYV, heiDATA, V1 This goldstandard contains a bootstrapped lexicon of abusive words. The lexicon comprises a large set of English negative polar expressions annotated as either abusive or not.
LIDO-Handbuch für die Erfassung und Publikation von Metadaten zu kulturellen Objekten - Band 2: Malerei und Skulptur [Anwendungsbeispiele] Aug 16, 2023 - arthistoricum.net@heiDATA Knaus, Gudrun; Kailus, Angela; Stein, Regine, 2022, "LIDO-Handbuch für die Erfassung und Publikation von Metadaten zu kulturellen Objekten - Band 2: Malerei und Skulptur [Anwendungsbeispiele]", https://doi.org/10.11588/data/CHEPS6, heiDATA, V3 LIDO (Lightweight Information Describing Objects) ist ein XML-Schema für die standardkonforme Bereitstellung von Metadaten über kulturelle Objekte in einer Vielzahl von digitalen Kontexten. Basierend auf diesem internationalen Standard dient das "LIDO-Handbuch für die Erfassung u...
LLMs4Implicit-Knowledge-Generation Public Feb 26, 2024 - RATIO_EXPLAIN Becker, Maria, 2024, "LLMs4Implicit-Knowledge-Generation Public", https://doi.org/10.11588/data/5VTJ26, heiDATA, V1 Code for equipping pretrained language models (BART, GPT-2, XLNet) with commonsense knowledge for generating implicit knowledge statements between two sentences, by (i) finetuning the models on corpora enriched with implicit information; and by (ii) constraining models with key c...

Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates

Mar 21, 2023 - Ground truth data for HTR on South Asian Scripts

Derrick, Tom; British Library, 2023, "Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates", https://doi.org/10.11588/data/AIQSXL, heiDATA, V1

This dataset comprises 81 digitised images (TIFF files) drawn from a selection of early printed Bengali books (1713-1914) digitised through the Two Centuries of Indian Print project (https://www.bl.uk/projects/two-centuries-of-indian-print). Also contained are ground truth transc...

Head Selection Parsers and LSTM Labelers

Nov 13, 2023 - Neural Techniques for German Dependency Parsing

Do, Bich-Ngoc; Rehbein, Ines; Frank, Anette, 2023, "Head Selection Parsers and LSTM Labelers", https://doi.org/10.11588/data/BPWWJL, heiDATA, V1

This resource contains code, data and pre-trained models for various types of neural dependency parsers and LSTM labelers used in the papers: Do et al. (2017). "What Do We Need to Know About an Unknown Word When Parsing German" Do and Rehbein (2017). "Evaluating LSTM Models for G...

HeiCuBeDa Hilprecht - Heidelberg Cuneiform Benchmark Dataset for the Hilprecht Collection

Mar 26, 2021 - IWR Computer Graphics

Mara, Hubert, 2019, "HeiCuBeDa Hilprecht - Heidelberg Cuneiform Benchmark Dataset for the Hilprecht Collection", https://doi.org/10.11588/data/IE8CCN, heiDATA, V2

The number of known cuneiform tablets is assumed to be in the hundreds of thousands. A fraction has been published by printing photographs and manual tracings in books, which is collected by the online Cuneiform Digital Library Initiative (CDLI) catalog including some of these im...

IKAT-DE

Feb 26, 2024 - RATIO_EXPLAIN

Becker, Maria, 2024, "IKAT-DE", https://doi.org/10.11588/data/4BA5LY, heiDATA, V1

A corpus consisting of high-quality human annotations of missing and implied information in argumentative texts (German version). The data is further annotated with semantic clause types and commonsense knowledge relations.

IKAT-EN

Feb 26, 2024 - RATIO_EXPLAIN

Becker, Maria, 2024, "IKAT-EN", https://doi.org/10.11588/data/RUBM2E, heiDATA, V1, UNF:6:To3aHa8xO8P28fzpCz1Qvw== [fileUNF]

A corpus consisting of high-quality human annotations of missing and implied information in argumentative texts (English version). The data is further annotated with semantic clause types and commonsense knowledge relations.

Jing bao ground truth – text block crops and annotations

Nov 2, 2023 - Heidelberg Centre for Transcultural Studies (HCTS)

Henke, Konstantin; Arnold, Matthias, 2023, "Jing bao ground truth – text block crops and annotations", https://doi.org/10.11588/data/PVYWKB, heiDATA, V1

This is the data set related to the paper "Language Model Assisted OCR Classification for Republican Chinese Newspaper Text", JDADH 11/2023. In this work, we present methods to obtain a neural optical character recognition (OCR) tool for article blocks in a Republican Chinese new...

KGE Algorithms

Aug 19, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Kotnis, Bhushan, 2019, "KGE Algorithms", https://doi.org/10.11588/data/CSXYSS, heiDATA, V1

An updated method for link prediction that uses a regularization factor that models relation argument types Abstract (Kotnis and Nastase, 2017): Learning relations based on evidence from knowledge repositories relies on processing the available relation instances. Knowledge repos...

Lexicon of Abusive Words (EN)

Sep 2, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Wiegand, Michael, 2019, "Lexicon of Abusive Words (EN)", https://doi.org/10.11588/data/MKPEYV, heiDATA, V1

This goldstandard contains a bootstrapped lexicon of abusive words. The lexicon comprises a large set of English negative polar expressions annotated as either abusive or not.

LIDO-Handbuch für die Erfassung und Publikation von Metadaten zu kulturellen Objekten - Band 2: Malerei und Skulptur [Anwendungsbeispiele]

Aug 16, 2023 - arthistoricum.net@heiDATA

Knaus, Gudrun; Kailus, Angela; Stein, Regine, 2022, "LIDO-Handbuch für die Erfassung und Publikation von Metadaten zu kulturellen Objekten - Band 2: Malerei und Skulptur [Anwendungsbeispiele]", https://doi.org/10.11588/data/CHEPS6, heiDATA, V3

LIDO (Lightweight Information Describing Objects) ist ein XML-Schema für die standardkonforme Bereitstellung von Metadaten über kulturelle Objekte in einer Vielzahl von digitalen Kontexten. Basierend auf diesem internationalen Standard dient das "LIDO-Handbuch für die Erfassung u...

LLMs4Implicit-Knowledge-Generation Public

Feb 26, 2024 - RATIO_EXPLAIN

Becker, Maria, 2024, "LLMs4Implicit-Knowledge-Generation Public", https://doi.org/10.11588/data/5VTJ26, heiDATA, V1

Code for equipping pretrained language models (BART, GPT-2, XLNet) with commonsense knowledge for generating implicit knowledge statements between two sentences, by (i) finetuning the models on corpora enriched with implicit information; and by (ii) constraining models with key c...

Add Data

Share Dataverse

Link Dataverse

Reset Modifications