1 to 3 of 3 Results
Dec 8, 2022 - Ground truth data for HTR on South Asian Scripts
O'Neill, Alexander, 2022, "Ground Truth Model for Pracalit for Sanskrit and Newar MSS 16th to 19th C.", https://doi.org/10.11588/data/WI9184, heiDATA, V1
Ground truth data for a an OCR model. Will be continually updated. Originally trained on Transkribus with a PyLaia model created from ground truth data based on transcripts into Pracalit Unicode of four Nepalese manuscripts. The manuscripts used to create this model are Staatsbib... |
Oct 26, 2022 - Ground truth data for HTR on South Asian Scripts
Merkel-Hilf, Nicole, 2022, "Ground Truth data for printed Devanagari", https://doi.org/10.11588/data/EGOKEI, heiDATA, V1
Ground truth (GT) data (jpg and alto xml files) for an OCR model that recognizes printed text in Devanagari script. The GT data was trained on Transkribus with the HTR+ engine. The training was performed on appr. 220 pages with appr. 27,000 words. The validation set was 10% of th... |
Oct 26, 2022
A collection of Ground Truth data for handwritten and printed text recognition for South Asian scripts provided by FID4SA - Specialized Information Service South Asia. Interested researchers can download the data archived here and use it as training data for their own text recogn... |