HeiCuBeDa Hilprecht - Heidelberg Cuneiform Benchmark Dataset for the Hilprecht Collection

Version 1.2

Mara, Hubert, 2019, "HeiCuBeDa Hilprecht - Heidelberg Cuneiform Benchmark Dataset for the Hilprecht Collection", https://doi.org/10.11588/data/IE8CCN, heiDATA, V1

Learn about Data Citation Standards.

Contact Owner

Dataset Metrics

1,728 Downloads

Description	The number of known cuneiform tablets is assumed to be in the hundreds of thousands. A fraction has been published by printing photographs and manual tracings in books, which is collected by the online Cuneiform Digital Library Initiative (CDLI) catalog including some of these images and providing metadata for more than 100.000 tablets. While 3D-acquisition of tablets is the most modern way for their documentation, the number of 3D-datasets is relatively small and often not openly accessible. However, the Hilprecht Archive Online (HAO) provides 1977 high-resolution 3D scans of tablets under an Open Access license. While both the HAO and the CDLI are accessible publicly, large-scale machine learning and pattern recognition on cuneiform tablets remains elusive, because the data is only accessible by navigating web pages, the tablet identifiers between collections are inconsistent, and the 3D data is unprepared and challenging for automated processing. We enable large-scale analysis of cuneiform tablets by this HeiCuBeda for Hilprecht assembly, which is a cross-referenced benchmark dataset of processed cuneiform tablets: (i) frontally aligned 3D tablets with pre-computed high-dimensional surface features, (ii) six-views raster images for off-the-shelf image processing, and (iii) metadata, transcriptions, and transliterations, for a subset of 707 tablets, for learning alignment between 3D data, image and linguistic expression. This is the first dataset of its kind, and of its size, in cuneiform research. This benchmark dataset is prepared for ease-of-use and immediate availability for computational researches, lowering the barrier to experiment and apply standard methods of analysis. A script in Python is provided to retrieve and compute an updated JSON database of the CDLI’s metadata and raster images. (2019-03-12)
Subject	Arts and Humanities; Computer and Information Science
Related Publication	GigaMesh and Gilgamesh - 3D Multiscale Integral Invariant Cuneiform Character Extraction doi: 10.2312/VAST/VAST10/131-138
License/Data Use Agreement	Custom Dataset Terms

	1 to 10 of 46 Files	Download
	HeiCuBeDa_00_Supplementary_Documentation.pdf Adobe PDF - 27.7 MB Published Jun 6, 2019 150 Downloads MD5: 90a53c69242d42a731f1f88d6349eaba Supplementary Documentation about the contents of the HeiCuBeDa and HeiCu3Da bundles.	Access File File Access Public Download Options Adobe PDF Download Metadata Data File Citation EndNote XML RIS BibTeX
	HeiCuBeDa_01_Logo_1977.pdf Adobe PDF - 9.5 KB Published Jun 6, 2019 47 Downloads MD5: 498c292e654de630ef9e72da5444c1c7 Logo for the HeiCuBeDa Hilprecht dataset consisting of 1977 cuneiform tablets.	Access File File Access Public Download Options Adobe PDF Download Metadata Data File Citation EndNote XML RIS BibTeX
	HeiCuBeDa_A1_Images_Sideviews_MSII_Filter.zip ZIP Archive - 8.6 GB Published Jun 6, 2019 105 Downloads MD5: d8d2dcd5f3d750f0df0ac7053c669305 A complete set of six side views for each of the 1977 3D-datasets using the MSII filter response to highlight surface details i.e. cuneiform script and sealings. Recommended for learning tasks. The images are stored as PNGs.	Access File File Access Public Download Options ZIP Archive Download Metadata Data File Citation EndNote XML RIS BibTeX
	HeiCuBeDa_A2_Images_Sideviews_VirtualLight.zip ZIP Archive - 9.9 GB Published Jun 6, 2019 53 Downloads MD5: ca9e3ed757f63d8fa642ab0d0a973686 Complete set of eight side views of the 3D-models rendering using a virtual light source and a metallic surface to mimic the illumination setup of photographs. The images are stored as PNGs.	Access File File Access Public Download Options ZIP Archive Download Metadata Data File Citation EndNote XML RIS BibTeX
	HeiCuBeDa_B_Hilprecht_Database_190318.json JSON - 13.6 MB Published Jun 6, 2019 40 Downloads MD5: 692f27ec83744d4b9965f4d4ec924b04 Data collection of the properties of the 3D-datasets, changes during the cleaning and computing processes and metadata retrieved from the CDLI. As the CDLI is constantly update, there is a Python script available to fetch the latest metadata as well as images and line tracings.	Access File File Access Public Download Options JSON Download Metadata Data File Citation EndNote XML RIS BibTeX
	HeiCuBeDa_B_Scrape_CDLI_190318.py Python Source Code - 34.6 KB Published Jun 6, 2019 32 Downloads MD5: 3021eabde716a572276000a5bdb3e3c3 Python script to retrieve the latest metadata from the CDLI for the Hilprecht Archive Online.	Access File File Access Public Download Options Python Source Code Download Metadata Data File Citation EndNote XML RIS BibTeX
	HeiCuBeDa_C_3DData_with_MSII_and_FunctionValue_part01.zip ZIP Archive - 19.2 GB Published Jun 6, 2019 83 Downloads MD5: 3692a9bd9c17ca2800a127bca1db8fcd Stanford Polygon (PLY) files including the feature vectors computed using the volume based integral invariant.	Access File File Access Public Download Options ZIP Archive Download Metadata Data File Citation EndNote XML RIS BibTeX
	HeiCuBeDa_C_3DData_with_MSII_and_FunctionValue_part02.zip ZIP Archive - 5.2 GB Published Jun 6, 2019 46 Downloads MD5: fc498eab59f9a3a1eea8baf3415114dc Stanford Polygon (PLY) files including the feature vectors computed using the volume based integral invariant.	Access File File Access Public Download Options ZIP Archive Download Metadata Data File Citation EndNote XML RIS BibTeX
	HeiCuBeDa_C_3DData_with_MSII_and_FunctionValue_part03.zip ZIP Archive - 7.3 GB Published Jun 6, 2019 36 Downloads MD5: 5b464a897753677d87b7ab0463d3f021 Stanford Polygon (PLY) files including the feature vectors computed using the volume based integral invariant.	Access File File Access Public Download Options ZIP Archive Download Metadata Data File Citation EndNote XML RIS BibTeX
	HeiCuBeDa_C_3DData_with_MSII_and_FunctionValue_part04.zip ZIP Archive - 8.7 GB Published Jun 6, 2019 35 Downloads MD5: 2f6451aa99046c9a79301dc733ff41f9 Stanford Polygon (PLY) files including the feature vectors computed using the volume based integral invariant.	Access File File Access Public Download Options ZIP Archive Download Metadata Data File Citation EndNote XML RIS BibTeX

Citation Metadata

Persistent Identifier	doi:10.11588/data/IE8CCN
Publication Date	2019-06-06
Title	HeiCuBeDa Hilprecht - Heidelberg Cuneiform Benchmark Dataset for the Hilprecht Collection
Alternative URL	https://gigamesh.eu/heicubeda
Author	Mara, Hubert (IWR, Heidelberg University)
Point of Contact	Use email button above to contact. Mara, Hubert (IWR, Heidelberg University)
Description	The number of known cuneiform tablets is assumed to be in the hundreds of thousands. A fraction has been published by printing photographs and manual tracings in books, which is collected by the online Cuneiform Digital Library Initiative (CDLI) catalog including some of these images and providing metadata for more than 100.000 tablets. While 3D-acquisition of tablets is the most modern way for their documentation, the number of 3D-datasets is relatively small and often not openly accessible. However, the Hilprecht Archive Online (HAO) provides 1977 high-resolution 3D scans of tablets under an Open Access license. While both the HAO and the CDLI are accessible publicly, large-scale machine learning and pattern recognition on cuneiform tablets remains elusive, because the data is only accessible by navigating web pages, the tablet identifiers between collections are inconsistent, and the 3D data is unprepared and challenging for automated processing. We enable large-scale analysis of cuneiform tablets by this HeiCuBeda for Hilprecht assembly, which is a cross-referenced benchmark dataset of processed cuneiform tablets: (i) frontally aligned 3D tablets with pre-computed high-dimensional surface features, (ii) six-views raster images for off-the-shelf image processing, and (iii) metadata, transcriptions, and transliterations, for a subset of 707 tablets, for learning alignment between 3D data, image and linguistic expression. This is the first dataset of its kind, and of its size, in cuneiform research. This benchmark dataset is prepared for ease-of-use and immediate availability for computational researches, lowering the barrier to experiment and apply standard methods of analysis. A script in Python is provided to retrieve and compute an updated JSON database of the CDLI’s metadata and raster images. (2019-03-12)
Subject	Arts and Humanities; Computer and Information Science
Related Publication	GigaMesh and Gilgamesh - 3D Multiscale Integral Invariant Cuneiform Character Extraction doi: 10.2312/VAST/VAST10/131-138 https://doi.org/10.2312/VAST/VAST10/131-138 Multi-Scale Integral Invariants for Robust Character Extraction from Irregular Polygon Mesh Data doi: 10.11588/heidok.00013890 http://www.ub.uni-heidelberg.de/archiv/13890
Language	English
Producer	Hubert Mara (IWR, Heidelberg University) (HMara) https://www.iwr.uni-heidelberg.de/groups/forensicgl/?page=people&person=hmara Bartosz Bogacz (IWR, Heidelberg University) (BBogacz) https://www.iwr.uni-heidelberg.de/groups/forensicgl/?page=people&person=bbogacz
Production Date	2019-03-11
Production Location	Heidelberg, Germany
Contributor	Project Member : Bayer, Paul Victor
Deposit Date	2019-02-25
Date of Collection	Start Date: 2018-07-24 ; End Date: 2018-08-22 Start Date: 2019-03-01 ; End Date: 2019-03-11
Data Type	Cuneiform tablets; 3D Measurement data
Software	GigaMesh Software Framework, Version: 181100 to 190300
Related Dataset	Heidelberg Cuneiform 3D Database (HeiCu3Da) for the Hilprecht Collection: https://doi.org/10.11588/heidicon.hilprecht
Origin of Historical Sources	Hilprecht Sammlung, Jena, Germany, https://hilprecht.mpiwg-berlin.mpg.de/ Cuneiform Digital Library Initiative (CDLI) https://cdli.ucla.edu/

Dataset Terms

License/Data Use Agreement

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation shown on the dataset page.

Custom Dataset Terms — the following Custom Dataset Terms have been defined for this dataset.

Licensed under a Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0). CC by sa

	Dataset Version	Summary	Contributors	Published on
No records found.

Edit File

This file has already been deleted (or replaced) in the current version. It may not be edited.

Restrict Access

Restricting limits access to published files. People who want to use the restricted files can request access by default. If you disable request access, you must add information about access to the Terms of Access field.

Learn about restricting files and dataset access in the User Guide.

Request Access

Enable access request

You must enable request access or add terms of access to restrict file access.

Terms of Access for Restricted Files

Save Changes

Edit Embargo

The selected file or files have already been published. Contact an administrator to change the embargo date or reason of the file or files.

Delete Files

The file will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Select File(s)

Please select one or more files.

Share Dataset

Share this dataset on your favorite social media networks.

Continue

Dataset Citations

Citations for this dataset are retrieved from Crossref via DataCite using Make Data Count standards. For more information about dataset metrics, please refer to the User Guide.

Sorry, no citations were found.

Restricted Files Selected

The selected file(s) may not be downloaded because you have not been granted access.

Download Options

The files selected are too large to download as a ZIP.

You can select individual files that are below the 4.7 GB download limit from the files table, or use the Data Access API for programmatic access to the files.

Select File(s)

Please select a file or files to be downloaded.

Restricted Files Selected

The restricted file(s) selected may not be downloaded because you have not been granted access.

Click Continue to download the files you have access to download.

Ineligible Files Selected

Some file(s) cannot be transferred. (They are restricted, embargoed, or not Globus accessible.)

Click Continue to transfer the elligible files.

Delete Dataset

Are you sure you want to delete this dataset and all of its files? You cannot undelete this dataset.

Delete Draft Version

Are you sure you want to delete this draft version? Files will be reverted to the most recently published version. You cannot undelete this draft.

Unpublished Dataset Private URL

Private URL can only be used with unpublished versions of datasets.

Unpublished Dataset Private URL

Are you sure you want to disable the Private URL? If you have shared the Private URL with others they will no longer be able to use it to access your unpublished dataset.

Delete Files

The file(s) will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Compute

This dataset contains restricted files you may not compute on because you have not been granted access.

Deaccession Dataset

Are you sure you want to deaccession? The selected version(s) will no longer be viewable by the public.

Deaccession Dataset

Are you sure you want to deaccession this dataset? It will no longer be viewable by the public.

Version Differences Details

Please select two versions to view the differences.

Version Differences Details

Version:
Last Updated:

Select File(s)

Please select a file or files for access request.

Select File(s)

Embargoed files cannot be accessed. Please select an unembargoed file or files for your access request.

Edit Tags

Select existing file tags or create new tags to describe your files. Each file can have more than one tag.

Request Access

You need to Sign Up or Log In to request access.

Dataset Terms

Please confirm and/or complete the information needed below in order to request access to files in this dataset.

This dataset is made available under the following terms. Please confirm and/or complete the information needed below in order to continue.

License/Data Use Agreement

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation shown on the dataset page.

Custom terms specific to this dataset Custom Dataset Terms — the following Custom Dataset Terms have been defined for this dataset.

Name

Institution

Position

Preview Guestbook

Upon downloading files the guestbook asks for the following information.

Guestbook Name

Collected Data

Account Information

Package File Download

Use the Download URL in a Wget command or a download manager to download this package file. Download via web browser is not recommended. User Guide - Downloading a Dataverse Package via URL

Download URL

https://heidata.uni-heidelberg.de/api/access/datafile/

Compute Batch

Clear Batch

Dataset	Persistent Identifier	Change Compute Batch

Compute Batch

Submit for Review

You will not be able to make changes to this dataset while it is in review.

Publish Dataset

Are you sure you want to republish this dataset?

Select if this is a minor or major version update.

Minor Release (2.1)

Major Release (3.0)

Publish Dataset

This dataset cannot be published until IWR Computer Graphics is published by its administrator.

Publish Dataset

This dataset cannot be published until IWR Computer Graphics and heiDATA are published.

Return to Author

Return this dataset to contributor for modification.