Persistent Identifier
|
doi:10.11588/data/Z3J0DV |
Publication Date
|
2022-07-27 |
Title
| Early Chinese Periodicals Online (ECPO) [Metadata] |
Alternative URL
| https://uni-heidelberg.de/ecpo |
Other Identifier
| Heidelberg Centre for Transcultural Studies: https://d-nb.info/gnd/1168398932
Centre for Asian and Transcultural Studies: https://d-nb.info/gnd/1148669841 |
Author
| Arnold, Matthias (Heidelberg Research Architecture, Heidelberg University) - ORCID: 0000-0003-0876-6177 |
Point of Contact
|
Use email button above to contact.
Arnold, Matthias (Heidelberg Research Architecture, Heidelberg University) |
Description
| ECPO joins several important digital collections of the early Chinese press and puts them into a single overarching framework. In the first phase, several databases on early women’s periodicals and entertainment publishing were created: “Chinese Women’s Magazines in the Late Qing and Early Republican Period” (WoMag), “Chinese Entertainment Newspapers” (Xiaobao), and databases hosted at the Academia Sinica in Taiwan. These systems approach the material in two ways: in the intensive approach we record all articles, images, advertisements, and related agents and assign them to a complete set of scanned pages, while in the extensive approach we record the main characteristic features of publications. ECPO is distinguished from other existing databases of Chinese periodicals in that it not only provides image scans but also preserves materials often excluded in reprint, microfilm, or digital (even full-text) editions, such as advertising inserts and illustrations. In addition, it aims at incorporating metadata in both English and Chinese, including keywords and biographical information on editors, authors and individuals represented in illustrations and advertisements in the journals. As the material basis of the database consists mostly of image scans, the project has been running experiments on one Republican newspaper to explore approaches toward full-text generation. Computer-aided processing of image scans of historical periodicals is still challenging with the current state of technology, in particular, because processing standards for Latin-script newspapers do not apply to the Chinese context. It is only with new approaches in machine learning that it is now possible to transform material that was previously inaccessible just a few years ago. However, many challenges remain. Extremely complex layouts resulting in difficulties for reliable automatic detection of page segmentation have prevented full-text generation for these newspapers even within China. The application of artificial intelligence requires a ground truth data set. This error-free, manually corrected text with structural information is used for evaluation and training of software models for text and layout recognition. In the fall of 2021, the project successfully implemented OCR on a newspaper 晶報 Jing bao (The Crystal) sample with a character error rate below 3% (Henke 2021). On that basis, the project is now expanding and generalizing its approach. With additional funding recently received from the Research Council Cultural Dynamics in Globalized Worlds for the first half of 2022, the project is currently producing a new data set. The project’s aim is to offer a solution to automatically produce full text from Republican newspapers using neural networks and machine learning. The project’s current work will further develop its original aims and contribute to the field of research as a whole. With the disclosure of the project’s network models and data sets, its results can be reproduced and evaluated, and others can adopt its approaches in the field. Although processing non-Latin-script is still a challenge in many cases, the project hopes its work may serve as good practice examples for such initiatives. The data set provides a first and complete extract of all metadata edited by the project so far. Future versions will also incorporate the fulltext produced in our OCR pipeline. |
Subject
| Arts and Humanities; Other |
Keyword
| Datensatz / data set (GND) http://d-nb.info/gnd/4011133-7
Multilingual computing (LCSH) http://id.loc.gov/authorities/subjects/sh99004311
Zeitschrifteninhaltsauswertung (GND) https://d-nb.info/gnd/7504432-8
Zeitschrift (GND) https://d-nb.info/gnd/4067488-5 |
Topic Classification
| Multilingual computing (LCSH) http://id.loc.gov/authorities/subjects/sh99004311
Library metadata (LCSH) http://id.loc.gov/authorities/subjects/sh2017004510 |
Related Publication
| Arnold, Matthias. "Multilingual research projects: Challenges for making use of standards, authority files, and character recognition." Digital Studies / Le champ numérique. (forthcoming) DOI: 10.11588/heidok.00030918 (preprint). doi: 10.11588/heidok.00030918 https://doi.org/10.11588/heidok.00030918
Arnold, Matthias, and Henrike Rudolph. "Network Data in the Early Chinese Periodicals Online Database (ECPO)." In: Journal of Historical Network Research. 5.1 (2021): 288-302. [Special issue "Beyond guanxi: Chinese Historical Networks," ed. Song CHEN and Henrike Rudolph.] DOI 10.25517/jhnr.v5i1.118. doi: 10.25517/jhnr.v5i1.118 https://doi.org/10.25517/jhnr.v5i1.118
Arnold, Matthias und Lena Hessel. "Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO)." In: E-Science-Tage 2019: Data to Knowledge. Hg. von Vincent Heuveline, Fabian Gebhard und Nina Mohammadianbisheh. Heidelberg: heiBOOKS, 2020, 95-109. DOI: 10.11588/heibooks.598.c8420 doi: 10.11588/heibooks.598.c8420 https://www.doi.org/10.11588/heibooks.598.c8420
Sung, Doris, Liying Sun and Matthias Arnold. "The Birth of a Database of Historical Periodicals: Chinese Women’s Magazines in the Late Qing and Early Republican Period." In Tulsa Studies in Women's Literature 33, no. 2 (2014): 227-37 https://www.muse.jhu.edu/article/564237 |
Language
| Chinese; English |
Producer
| Heidelberg Centre for Transcultural Studies (HCTS) https://www.asia-europe.uni-heidelberg.de/ 
Arnold, Matthias (Heidelberg Research Architecture (HRA), Heidelberg Centre for Transcultural Studies) (HCTS) https://www.asia-europe.uni-heidelberg.de/en/people/person/persdetail/arnold.html |
Production Date
| 2022 |
Production Location
| Heidelberg Centre for Transcultural Studies, University of Heidelberg |
Contributor
| Project Leader : Mittler, Barbara
Project Leader : Judge, Joan
Project Leader : Yu, Chien-ming
Project Manager : Sun, Liying
Project Manager : Sung, Doris
Project Manager : Arnold, Matthias
Project Manager : Lien, Lingling
Project Member : Torkler, Jörg
Data Curator : Hessel, Lena
Data Curator : Xie, Jia
Data Curator : Yip, Suk Man
Project Member : Henke, Konstantin
Data Curator : Paterson, Duncan
Data Collector : Zhang, Shimin
Data Collector : Ewendt, Kevin
Data Collector : Kuo, Chih-wen
Data Collector : Kuo, Teng-feng
Data Collector : Liu, Ming
Data Collector : Pi, Chenying
Data Collector : Zheng, Xinyu
Data Collector : Zuo, Ruyi |
Funding Information
| DFG: EXC 270
Chiang Ching-kuo Foundation for International Scholarly Exchange: Research Grant (2011)
Field of Focus 3 der Exzellenzstrategie: ExU 3.2
Research Council Cultural Dynamics in Globalised Worlds: ExU 3.34 |
Depositor
| Heidelberg Research Architecture |
Deposit Date
| 2022-07-22 |
Time Period
| Start Date: 1830 ; End Date: 1955 |
Date of Collection
| Start Date: 2012 ; End Date: 2022-07-22 |
Data Type
| Metadata records describing priodicals, their publishing history, the individual items (article, image, ad), and the related agents/names. |
Related Dataset
| GitHub Repo "Early Chinese Periodicals Online (ECPO)" https://github.com/exc-asia-and-europe/ecpo |