1 to 2 of 2 Results
Feb 24, 2023 - Ground truth data for HTR on South Asian Scripts
Tübingen University Library, 2023, "Ground Truth data for printed Malayalam", https://doi.org/10.11588/data/L2KRZO, heiDATA, V1
Ground Truth (GT) data (JPG, PAGE and ALTO XML files) which can be used to train OCR models that recognize printed text in Malayalam script. The training material is gathered from 19th and 20th centuries prints. The GT data was trained in Transkribus with the HTR+ and the PyLaia... |
Oct 26, 2022 - Ground truth data for HTR on South Asian Scripts
Merkel-Hilf, Nicole, 2022, "Ground Truth data for printed Devanagari", https://doi.org/10.11588/data/EGOKEI, heiDATA, V1
Ground truth (GT) data (jpg and alto xml files) for an OCR model that recognizes printed text in Devanagari script. The GT data was trained on Transkribus with the HTR+ engine. The training was performed on appr. 220 pages with appr. 27,000 words. The validation set was 10% of th... |