FID4SA@heiDATA

Data publications of the FID4SA – Specialized Information Service South Asia.

Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

1 to 10 of 31 Results

Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates Mar 21, 2023 - Ground truth data for HTR on South Asian Scripts Derrick, Tom; British Library, 2023, "Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates", https://doi.org/10.11588/data/AIQSXL, heiDATA, V1 This dataset comprises 81 digitised images (TIFF files) drawn from a selection of early printed Bengali books (1713-1914) digitised through the Two Centuries of Indian Print project (https://www.bl.uk/projects/two-centuries-of-indian-print). Also contained are ground truth transc...
REID2019.zip Mar 21, 2023 - Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates ZIP Archive - 1002.2 MB - MD5: 2e97b3f935d9b834d057e9d423be1b30
Ground Truth data for printed Malayalam Feb 24, 2023 - Ground truth data for HTR on South Asian Scripts Tübingen University Library, 2023, "Ground Truth data for printed Malayalam", https://doi.org/10.11588/data/L2KRZO, heiDATA, V1 Ground Truth (GT) data (JPG, PAGE and ALTO XML files) which can be used to train OCR models that recognize printed text in Malayalam script. The training material is gathered from 19th and 20th centuries prints. The GT data was trained in Transkribus with the HTR+ and the PyLaia...
39A8599.zip Feb 24, 2023 - Ground Truth data for printed Malayalam ZIP Archive - 6.6 MB - MD5: a5eabde1cb44fb2ad2be83228e534b41
CiXIV130_1874.zip Feb 24, 2023 - Ground Truth data for printed Malayalam ZIP Archive - 11.2 MB - MD5: a82c90b56669a1a829ad754bffb871cf
CiXIV131-4_1877.zip Feb 24, 2023 - Ground Truth data for printed Malayalam ZIP Archive - 12.3 MB - MD5: 1d0c81551baa135228be4cf9b63f6648
CiXIV270.zip Feb 24, 2023 - Ground Truth data for printed Malayalam ZIP Archive - 9.8 MB - MD5: 887edc8349eb421a04fa71dacf4dfdf8
CiXIV285_1850.zip Feb 24, 2023 - Ground Truth data for printed Malayalam ZIP Archive - 16.9 MB - MD5: 87c28600177975b0964ad9457147af51
Ground Truth Model for Pracalit for Sanskrit and Newar MSS 16th to 19th C. Dec 8, 2022 - Ground truth data for HTR on South Asian Scripts O'Neill, Alexander, 2022, "Ground Truth Model for Pracalit for Sanskrit and Newar MSS 16th to 19th C.", https://doi.org/10.11588/data/WI9184, heiDATA, V1 Ground truth data for a an OCR model. Will be continually updated. Originally trained on Transkribus with a PyLaia model created from ground truth data based on transcripts into Pracalit Unicode of four Nepalese manuscripts. The manuscripts used to create this model are Staatsbib...
HTR_Train_Set_Pracalit_for_Sanskrit_and_Newar_MSS_16th_to_19th_C.zip Dec 8, 2022 - Ground Truth Model for Pracalit for Sanskrit and Newar MSS 16th to 19th C. ZIP Archive - 479.7 MB - MD5: 56e2cc32f0d0081fe109b596166f215f

Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates

Mar 21, 2023 - Ground truth data for HTR on South Asian Scripts

Derrick, Tom; British Library, 2023, "Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates", https://doi.org/10.11588/data/AIQSXL, heiDATA, V1

This dataset comprises 81 digitised images (TIFF files) drawn from a selection of early printed Bengali books (1713-1914) digitised through the Two Centuries of Indian Print project (https://www.bl.uk/projects/two-centuries-of-indian-print). Also contained are ground truth transc...

REID2019.zip

Mar 21, 2023 - Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates

ZIP Archive - 1002.2 MB -

Ground Truth data for printed Malayalam

Feb 24, 2023 - Ground truth data for HTR on South Asian Scripts

Tübingen University Library, 2023, "Ground Truth data for printed Malayalam", https://doi.org/10.11588/data/L2KRZO, heiDATA, V1

Ground Truth (GT) data (JPG, PAGE and ALTO XML files) which can be used to train OCR models that recognize printed text in Malayalam script. The training material is gathered from 19th and 20th centuries prints. The GT data was trained in Transkribus with the HTR+ and the PyLaia...

39A8599.zip

Feb 24, 2023 - Ground Truth data for printed Malayalam

ZIP Archive - 6.6 MB -

CiXIV130_1874.zip

Feb 24, 2023 - Ground Truth data for printed Malayalam

ZIP Archive - 11.2 MB -

CiXIV131-4_1877.zip

Feb 24, 2023 - Ground Truth data for printed Malayalam

ZIP Archive - 12.3 MB -

CiXIV270.zip

Feb 24, 2023 - Ground Truth data for printed Malayalam

ZIP Archive - 9.8 MB -

CiXIV285_1850.zip

Feb 24, 2023 - Ground Truth data for printed Malayalam

ZIP Archive - 16.9 MB -

Ground Truth Model for Pracalit for Sanskrit and Newar MSS 16th to 19th C.

Dec 8, 2022 - Ground truth data for HTR on South Asian Scripts

O'Neill, Alexander, 2022, "Ground Truth Model for Pracalit for Sanskrit and Newar MSS 16th to 19th C.", https://doi.org/10.11588/data/WI9184, heiDATA, V1

Ground truth data for a an OCR model. Will be continually updated. Originally trained on Transkribus with a PyLaia model created from ground truth data based on transcripts into Pracalit Unicode of four Nepalese manuscripts. The manuscripts used to create this model are Staatsbib...

HTR_Train_Set_Pracalit_for_Sanskrit_and_Newar_MSS_16th_to_19th_C.zip

Dec 8, 2022 - Ground Truth Model for Pracalit for Sanskrit and Newar MSS 16th to 19th C.

ZIP Archive - 479.7 MB -

Add Data

Share Dataverse

Link Dataverse

Reset Modifications