heiDATA

Metrics

199,291 Downloads

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Subject: Arts and Humanities

71 to 80 of 167 Results

Georeferencing of the "Lorscher Codex" Dec 17, 2015 - Universitätsbibliothek Heidelberg Antretter, Marlene; Eller, Dirk; Elstermann, Hannes; Geissler, Stefan; Grüning, Simon; Horn, Sebastian; Kohler, Matthias; Kuck, Kevin; Lingnau, Anna; Nozik, Alexandra; Odenwald, Jakob; Rieger, Felix; Schubert, Christopher; Wenze, Felix; Zimmermann, Karin, 2015, "Georeferencing of the "Lorscher Codex"", https://doi.org/10.11588/data/10063, heiDATA, V1 Collection of historic place names from the Lorscher Codex (1176-1200). Most of the (deserted) villages and cities named in the codex had been in possession of Abbey Lorsch, many of them are mentioned for the first time here. In the project the property should be visualised in mo...
GER_SET: Situation Entity Type labelled corpus for German Dec 10, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Becker, Maria, 2019, "GER_SET: Situation Entity Type labelled corpus for German", https://doi.org/10.11588/data/BBQYD0, heiDATA, V1 Semantic clause types, also called Situation Entity (SE) types (Smith, 2003) are linguistic characterizations of aspectual properties shown to be useful for tasks like argumentation structure analysis (Becker et al., 2016), genre characterization (Palmer and Friedrich, 2014), and...
German causal language annotations and lexicon (verbs, nouns, prepositions) (DE) Mar 26, 2020 - Empirical Linguistics and Computational Language Modeling (LiMo) Rehbein, Ines; Ruppenhofer, Josef, 2020, "German causal language annotations and lexicon (verbs, nouns, prepositions) (DE)", https://doi.org/10.11588/data/ZHI94V, heiDATA, V1 Annotations of causal verbs, nouns and prepositions in context and lexicon file for causal verbs, nouns and prepositions.
German Twitter Titling Corpus Jan 20, 2021 - Empirical Linguistics and Computational Language Modeling (LiMo) van den Berg, Esther; Korfhage, Katharina; Ruppenhofer, Josef; Wiegand, Michael; Markert, Katja, 2020, "German Twitter Titling Corpus", https://doi.org/10.11588/data/AOSUY6, heiDATA, V2, UNF:6:14BxjwJS7Q3mfI6ei7iBBw== [fileUNF] The German Titling Twitter Corpus consists of 1904 stance-annotated tweets collected in June/July 2018 mentioning 24 German politicians with a doctoral degree. The Addendum contains an additional 296 stance-annotated tweets from each month of 2018 mentioning 10 politicians with a...
GermEval-2018 Corpus (DE) Sep 2, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo) Wiegand, Michael, 2019, "GermEval-2018 Corpus (DE)", https://doi.org/10.11588/data/0B5VML, heiDATA, V1 This dataset comprises the training and test data (German tweets) from the GermEval 2018 Shared on Offensive Language Detection.
Ground Truth data for printed Devanagari Oct 26, 2022 - Ground truth data for HTR on South Asian Scripts Merkel-Hilf, Nicole, 2022, "Ground Truth data for printed Devanagari", https://doi.org/10.11588/data/EGOKEI, heiDATA, V1 Ground truth (GT) data (jpg and alto xml files) for an OCR model that recognizes printed text in Devanagari script. The GT data was trained on Transkribus with the HTR+ engine. The training was performed on appr. 220 pages with appr. 27,000 words. The validation set was 10% of th...
Ground Truth data for printed Malayalam Feb 24, 2023 - Ground truth data for HTR on South Asian Scripts Tübingen University Library, 2023, "Ground Truth data for printed Malayalam", https://doi.org/10.11588/data/L2KRZO, heiDATA, V1 Ground Truth (GT) data (JPG, PAGE and ALTO XML files) which can be used to train OCR models that recognize printed text in Malayalam script. The training material is gathered from 19th and 20th centuries prints. The GT data was trained in Transkribus with the HTR+ and the PyLaia...
Ground Truth Model for Pracalit for Sanskrit and Newar MSS 16th to 19th C. Dec 8, 2022 - Ground truth data for HTR on South Asian Scripts O'Neill, Alexander, 2022, "Ground Truth Model for Pracalit for Sanskrit and Newar MSS 16th to 19th C.", https://doi.org/10.11588/data/WI9184, heiDATA, V1 Ground truth data for a an OCR model. Will be continually updated. Originally trained on Transkribus with a PyLaia model created from ground truth data based on transcripts into Pracalit Unicode of four Nepalese manuscripts. The manuscripts used to create this model are Staatsbib...
Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates Mar 21, 2023 - Ground truth data for HTR on South Asian Scripts Derrick, Tom; British Library, 2023, "Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates", https://doi.org/10.11588/data/AIQSXL, heiDATA, V1 This dataset comprises 81 digitised images (TIFF files) drawn from a selection of early printed Bengali books (1713-1914) digitised through the Two Centuries of Indian Print project (https://www.bl.uk/projects/two-centuries-of-indian-print). Also contained are ground truth transc...
Head Selection Parsers and LSTM Labelers Nov 13, 2023 - Neural Techniques for German Dependency Parsing Do, Bich-Ngoc; Rehbein, Ines; Frank, Anette, 2023, "Head Selection Parsers and LSTM Labelers", https://doi.org/10.11588/data/BPWWJL, heiDATA, V1 This resource contains code, data and pre-trained models for various types of neural dependency parsers and LSTM labelers used in the papers: Do et al. (2017). "What Do We Need to Know About an Unknown Word When Parsing German" Do and Rehbein (2017). "Evaluating LSTM Models for G...

Georeferencing of the "Lorscher Codex"

Dec 17, 2015 - Universitätsbibliothek Heidelberg

Antretter, Marlene; Eller, Dirk; Elstermann, Hannes; Geissler, Stefan; Grüning, Simon; Horn, Sebastian; Kohler, Matthias; Kuck, Kevin; Lingnau, Anna; Nozik, Alexandra; Odenwald, Jakob; Rieger, Felix; Schubert, Christopher; Wenze, Felix; Zimmermann, Karin, 2015, "Georeferencing of the "Lorscher Codex"", https://doi.org/10.11588/data/10063, heiDATA, V1

Collection of historic place names from the Lorscher Codex (1176-1200). Most of the (deserted) villages and cities named in the codex had been in possession of Abbey Lorsch, many of them are mentioned for the first time here. In the project the property should be visualised in mo...

GER_SET: Situation Entity Type labelled corpus for German

Dec 10, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Becker, Maria, 2019, "GER_SET: Situation Entity Type labelled corpus for German", https://doi.org/10.11588/data/BBQYD0, heiDATA, V1

Semantic clause types, also called Situation Entity (SE) types (Smith, 2003) are linguistic characterizations of aspectual properties shown to be useful for tasks like argumentation structure analysis (Becker et al., 2016), genre characterization (Palmer and Friedrich, 2014), and...

German causal language annotations and lexicon (verbs, nouns, prepositions) (DE)

Mar 26, 2020 - Empirical Linguistics and Computational Language Modeling (LiMo)

Rehbein, Ines; Ruppenhofer, Josef, 2020, "German causal language annotations and lexicon (verbs, nouns, prepositions) (DE)", https://doi.org/10.11588/data/ZHI94V, heiDATA, V1

Annotations of causal verbs, nouns and prepositions in context and lexicon file for causal verbs, nouns and prepositions.

German Twitter Titling Corpus

Jan 20, 2021 - Empirical Linguistics and Computational Language Modeling (LiMo)

van den Berg, Esther; Korfhage, Katharina; Ruppenhofer, Josef; Wiegand, Michael; Markert, Katja, 2020, "German Twitter Titling Corpus", https://doi.org/10.11588/data/AOSUY6, heiDATA, V2, UNF:6:14BxjwJS7Q3mfI6ei7iBBw== [fileUNF]

The German Titling Twitter Corpus consists of 1904 stance-annotated tweets collected in June/July 2018 mentioning 24 German politicians with a doctoral degree. The Addendum contains an additional 296 stance-annotated tweets from each month of 2018 mentioning 10 politicians with a...

GermEval-2018 Corpus (DE)

Sep 2, 2019 - Empirical Linguistics and Computational Language Modeling (LiMo)

Wiegand, Michael, 2019, "GermEval-2018 Corpus (DE)", https://doi.org/10.11588/data/0B5VML, heiDATA, V1

This dataset comprises the training and test data (German tweets) from the GermEval 2018 Shared on Offensive Language Detection.

Ground Truth data for printed Devanagari

Oct 26, 2022 - Ground truth data for HTR on South Asian Scripts

Merkel-Hilf, Nicole, 2022, "Ground Truth data for printed Devanagari", https://doi.org/10.11588/data/EGOKEI, heiDATA, V1

Ground truth (GT) data (jpg and alto xml files) for an OCR model that recognizes printed text in Devanagari script. The GT data was trained on Transkribus with the HTR+ engine. The training was performed on appr. 220 pages with appr. 27,000 words. The validation set was 10% of th...

Ground Truth data for printed Malayalam

Feb 24, 2023 - Ground truth data for HTR on South Asian Scripts

Tübingen University Library, 2023, "Ground Truth data for printed Malayalam", https://doi.org/10.11588/data/L2KRZO, heiDATA, V1

Ground Truth (GT) data (JPG, PAGE and ALTO XML files) which can be used to train OCR models that recognize printed text in Malayalam script. The training material is gathered from 19th and 20th centuries prints. The GT data was trained in Transkribus with the HTR+ and the PyLaia...

Ground Truth Model for Pracalit for Sanskrit and Newar MSS 16th to 19th C.

Dec 8, 2022 - Ground truth data for HTR on South Asian Scripts

O'Neill, Alexander, 2022, "Ground Truth Model for Pracalit for Sanskrit and Newar MSS 16th to 19th C.", https://doi.org/10.11588/data/WI9184, heiDATA, V1

Ground truth data for a an OCR model. Will be continually updated. Originally trained on Transkribus with a PyLaia model created from ground truth data based on transcripts into Pracalit Unicode of four Nepalese manuscripts. The manuscripts used to create this model are Staatsbib...

Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates

Mar 21, 2023 - Ground truth data for HTR on South Asian Scripts

Derrick, Tom; British Library, 2023, "Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates", https://doi.org/10.11588/data/AIQSXL, heiDATA, V1

This dataset comprises 81 digitised images (TIFF files) drawn from a selection of early printed Bengali books (1713-1914) digitised through the Two Centuries of Indian Print project (https://www.bl.uk/projects/two-centuries-of-indian-print). Also contained are ground truth transc...

Head Selection Parsers and LSTM Labelers

Nov 13, 2023 - Neural Techniques for German Dependency Parsing

Do, Bich-Ngoc; Rehbein, Ines; Frank, Anette, 2023, "Head Selection Parsers and LSTM Labelers", https://doi.org/10.11588/data/BPWWJL, heiDATA, V1

This resource contains code, data and pre-trained models for various types of neural dependency parsers and LSTM labelers used in the papers: Do et al. (2017). "What Do We Need to Know About an Unknown Word When Parsing German" Do and Rehbein (2017). "Evaluating LSTM Models for G...

Add Data

Share Dataverse

Link Dataverse

Reset Modifications