Early Chinese Periodicals Online (ECPO) [Metadata] (doi:10.11588/data/Z3J0DV)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

(external link) (external link) (external link) (external link)

Document Description

Citation

Title:

Early Chinese Periodicals Online (ECPO) [Metadata]

Identification Number:

doi:10.11588/data/Z3J0DV

Distributor:

heiDATA

Date of Distribution:

2022-07-27

Version:

1

Bibliographic Citation:

Arnold, Matthias, 2022, "Early Chinese Periodicals Online (ECPO) [Metadata]", https://doi.org/10.11588/data/Z3J0DV, heiDATA, V1

Study Description

Citation

Title:

Early Chinese Periodicals Online (ECPO) [Metadata]

Identification Number:

doi:10.11588/data/Z3J0DV

Identification Number:

https://d-nb.info/gnd/1168398932

Identification Number:

https://d-nb.info/gnd/1148669841

Authoring Entity:

Arnold, Matthias (Heidelberg Research Architecture, Heidelberg University)

Other identifications and acknowledgements:

Mittler, Barbara

Other identifications and acknowledgements:

Judge, Joan

Other identifications and acknowledgements:

Yu, Chien-ming

Other identifications and acknowledgements:

Sun, Liying

Other identifications and acknowledgements:

Sung, Doris

Other identifications and acknowledgements:

Arnold, Matthias

Other identifications and acknowledgements:

Lien, Lingling

Other identifications and acknowledgements:

Torkler, Jörg

Other identifications and acknowledgements:

Hessel, Lena

Other identifications and acknowledgements:

Xie, Jia

Other identifications and acknowledgements:

Yip, Suk Man

Other identifications and acknowledgements:

Henke, Konstantin

Other identifications and acknowledgements:

Paterson, Duncan

Other identifications and acknowledgements:

Zhang, Shimin

Other identifications and acknowledgements:

Ewendt, Kevin

Other identifications and acknowledgements:

Kuo, Chih-wen

Other identifications and acknowledgements:

Kuo, Teng-feng

Other identifications and acknowledgements:

Liu, Ming

Other identifications and acknowledgements:

Pi, Chenying

Other identifications and acknowledgements:

Zheng, Xinyu

Other identifications and acknowledgements:

Zuo, Ruyi

Producer:

Heidelberg Centre for Transcultural Studies

Arnold, Matthias

Date of Production:

2022

Grant Number:

EXC 270

Grant Number:

Research Grant (2011)

Grant Number:

ExU 3.2

Grant Number:

ExU 3.34

Distributor:

heiDATA

Access Authority:

Arnold, Matthias

Depositor:

Heidelberg Research Architecture

Date of Deposit:

2022-07-22

Holdings Information:

https://doi.org/10.11588/data/Z3J0DV

Study Scope

Keywords:

Arts and Humanities, Other, Datensatz / data set, Multilingual computing, Zeitschrifteninhaltsauswertung, Zeitschrift

Topic Classification:

Multilingual computing, Library metadata

Abstract:

ECPO joins several important digital collections of the early Chinese press and puts them into a single overarching framework. In the first phase, several databases on early women’s periodicals and entertainment publishing were created: “Chinese Women’s Magazines in the Late Qing and Early Republican Period” (WoMag), “Chinese Entertainment Newspapers” (Xiaobao), and databases hosted at the Academia Sinica in Taiwan. These systems approach the material in two ways: in the intensive approach we record all articles, images, advertisements, and related agents and assign them to a complete set of scanned pages, while in the extensive approach we record the main characteristic features of publications. ECPO is distinguished from other existing databases of Chinese periodicals in that it not only provides image scans but also preserves materials often excluded in reprint, microfilm, or digital (even full-text) editions, such as advertising inserts and illustrations. In addition, it aims at incorporating metadata in both English and Chinese, including keywords and biographical information on editors, authors and individuals represented in illustrations and advertisements in the journals. As the material basis of the database consists mostly of image scans, the project has been running experiments on one Republican newspaper to explore approaches toward full-text generation. Computer-aided processing of image scans of historical periodicals is still challenging with the current state of technology, in particular, because processing standards for Latin-script newspapers do not apply to the Chinese context. It is only with new approaches in machine learning that it is now possible to transform material that was previously inaccessible just a few years ago. However, many challenges remain. Extremely complex layouts resulting in difficulties for reliable automatic detection of page segmentation have prevented full-text generation for these newspapers even within China. The application of artificial intelligence requires a ground truth data set. This error-free, manually corrected text with structural information is used for evaluation and training of software models for text and layout recognition. In the fall of 2021, the project successfully implemented OCR on a newspaper 晶報 Jing bao (The Crystal) sample with a character error rate below 3% (Henke 2021). On that basis, the project is now expanding and generalizing its approach. With additional funding recently received from the Research Council Cultural Dynamics in Globalized Worlds for the first half of 2022, the project is currently producing a new data set. The project’s aim is to offer a solution to automatically produce full text from Republican newspapers using neural networks and machine learning. The project’s current work will further develop its original aims and contribute to the field of research as a whole. With the disclosure of the project’s network models and data sets, its results can be reproduced and evaluated, and others can adopt its approaches in the field. Although processing non-Latin-script is still a challenge in many cases, the project hopes its work may serve as good practice examples for such initiatives. The data set provides a first and complete extract of all metadata edited by the project so far. Future versions will also incorporate the fulltext produced in our OCR pipeline.

Time Period:

1830-1955

Date of Collection:

2012-2022-07-22

Kind of Data:

Metadata records describing priodicals, their publishing history, the individual items (article, image, ad), and the related agents/names.

Methodology and Processing

Sources Statement

Data Access

Other Study Description Materials

Related Studies

GitHub Repo "Early Chinese Periodicals Online (ECPO)" <a href="https://github.com/exc-asia-and-europe/ecpo">https://github.com/exc-asia-and-europe/ecpo</a>

Related Publications

Citation

Title:

Arnold, Matthias. "Multilingual research projects: Challenges for making use of standards, authority files, and character recognition." Digital Studies / Le champ numérique. (forthcoming) DOI: 10.11588/heidok.00030918 (preprint).

Identification Number:

10.11588/heidok.00030918

Bibliographic Citation:

Arnold, Matthias. "Multilingual research projects: Challenges for making use of standards, authority files, and character recognition." Digital Studies / Le champ numérique. (forthcoming) DOI: 10.11588/heidok.00030918 (preprint).

Citation

Title:

Arnold, Matthias, and Henrike Rudolph. "Network Data in the Early Chinese Periodicals Online Database (ECPO)." In: Journal of Historical Network Research. 5.1 (2021): 288-302. [Special issue "Beyond guanxi: Chinese Historical Networks," ed. Song CHEN and Henrike Rudolph.] DOI 10.25517/jhnr.v5i1.118.

Identification Number:

10.25517/jhnr.v5i1.118

Bibliographic Citation:

Arnold, Matthias, and Henrike Rudolph. "Network Data in the Early Chinese Periodicals Online Database (ECPO)." In: Journal of Historical Network Research. 5.1 (2021): 288-302. [Special issue "Beyond guanxi: Chinese Historical Networks," ed. Song CHEN and Henrike Rudolph.] DOI 10.25517/jhnr.v5i1.118.

Citation

Title:

Arnold, Matthias und Lena Hessel. "Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO)." In: E-Science-Tage 2019: Data to Knowledge. Hg. von Vincent Heuveline, Fabian Gebhard und Nina Mohammadianbisheh. Heidelberg: heiBOOKS, 2020, 95-109. DOI: 10.11588/heibooks.598.c8420

Identification Number:

10.11588/heibooks.598.c8420

Bibliographic Citation:

Arnold, Matthias und Lena Hessel. "Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO)." In: E-Science-Tage 2019: Data to Knowledge. Hg. von Vincent Heuveline, Fabian Gebhard und Nina Mohammadianbisheh. Heidelberg: heiBOOKS, 2020, 95-109. DOI: 10.11588/heibooks.598.c8420

Citation

Title:

Sung, Doris, Liying Sun and Matthias Arnold. "The Birth of a Database of Historical Periodicals: Chinese Women’s Magazines in the Late Qing and Early Republican Period." In Tulsa Studies in Women's Literature 33, no. 2 (2014): 227-37

Bibliographic Citation:

Sung, Doris, Liying Sun and Matthias Arnold. "The Birth of a Database of Historical Periodicals: Chinese Women’s Magazines in the Late Qing and Early Republican Period." In Tulsa Studies in Women's Literature 33, no. 2 (2014): 227-37

Other Study-Related Materials

Label:

agents_20220721_131236.csv

Text:

complete metadata: agents

Notes:

text/csv

Other Study-Related Materials

Label:

agents_notes20220721_131256.csv

Text:

complete metadata: agent notes

Notes:

text/csv

Other Study-Related Materials

Label:

ECPO_doi-export_2019-05-01.xml

Text:

ECPO, basic metadata of all periodicals, as of 2019-05-01

Notes:

text/xml

Other Study-Related Materials

Label:

ECPO_mods-export_2019-05-01.xml

Text:

all ECPO publications, MODS XML, 2019-05-01

Notes:

text/xml

Other Study-Related Materials

Label:

EXPLANATION-of-csv-fields_2022-07-22.xlsx

Text:

short explanation of all csv fields provided

Notes:

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

Other Study-Related Materials

Label:

mag-1-ads_20220721_131219.csv

Text:

complete metadata: advertisement

Notes:

text/csv

Other Study-Related Materials

Label:

mag-1-articles_20220721_131209.csv

Text:

complete metadata: articles

Notes:

text/csv

Other Study-Related Materials

Label:

mag-1-images_20220721_131216.csv

Text:

complete metadata: images

Notes:

text/csv

Other Study-Related Materials

Label:

mag-all-issues_20220721_131205.csv

Text:

complete metadata: issues

Notes:

text/csv

Other Study-Related Materials

Label:

mag-all_20220721_131200.csv

Text:

complete metadata: publications

Notes:

text/csv