A collection of Ground Truth Data for Handwritten Text Recognition on South Asian scripts provided by FID4SA - Specialized Information Service South Asia.
Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

1 to 2 of 2 Results
Dec 8, 2022
O'Neill, Alexander, 2022, "Ground Truth Model for Pracalit for Sanskrit and Newar MSS 16th to 19th C.", https://doi.org/10.11588/data/WI9184, heiDATA, V1
Ground truth data for a an OCR model. Will be continually updated. Originally trained on Transkribus with a PyLaia model created from ground truth data based on transcripts into Pracalit Unicode of four Nepalese manuscripts. The manuscripts used to create this model are Staatsbib...
Oct 26, 2022
Merkel-Hilf, Nicole, 2022, "Ground Truth data for printed Devanagari", https://doi.org/10.11588/data/EGOKEI, heiDATA, V1
Ground truth (GT) data (jpg and alto xml files) for an OCR model that recognizes printed text in Devanagari script. The GT data was trained on Transkribus with the HTR+ engine. The training was performed on appr. 220 pages with appr. 27,000 words. The validation set was 10% of th...
Add Data

Sign up or log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.