NLP in Diagnostic Texts from Nephropathology [Research Data] (doi:10.11588/data/KS5W0H)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

NLP in Diagnostic Texts from Nephropathology [Research Data]

Identification Number:

doi:10.11588/data/KS5W0H

Distributor:

heiDATA

Date of Distribution:

2022-06-02

Version:

1

Bibliographic Citation:

Legnar, Maximilian; Daumke, Philipp; Hesser, Jürgen; Porubsky, Stefan; Popovic, Zoran; Bindzus, Jan Niklas; Siemoneit, Joern-Helge; Weis, Cleo-Aron, 2022, "NLP in Diagnostic Texts from Nephropathology [Research Data]", https://doi.org/10.11588/data/KS5W0H, heiDATA, V1

Study Description

Citation

Title:

NLP in Diagnostic Texts from Nephropathology [Research Data]

Identification Number:

doi:10.11588/data/KS5W0H

Authoring Entity:

Legnar, Maximilian (Institute of Pathology, Medical Faculty Mannheim, Heidelberg University, Germany)

Daumke, Philipp (Averbis GmbH, Freiburg, Germany)

Hesser, Jürgen (Data Analysis and Modeling, MIISM, Medical School, Interdisciplinary Center for Scientific Computing (IWR), Central Institute for Computer Engineering (ZITI), CZS Heidelberg Center for Model-Based AI, Heidelberg University)

Porubsky, Stefan (Institute of Pathology, Medical Faculty Mainz, University Hospital Mainz, Mainz, Germany)

Popovic, Zoran (Institute of Pathology, Medical Faculty Mannheim, Heidelberg University)

Bindzus, Jan Niklas (Institute of Pathology, Medical Faculty Mannheim, Heidelberg University)

Siemoneit, Joern-Helge (Institute of Pathology, Medical Faculty Mannheim, Heidelberg University)

Weis, Cleo-Aron (Institute of Pathology, Medical Faculty Mannheim, Heidelberg University)

Other identifications and acknowledgements:

Legnar, Maximillian

Other identifications and acknowledgements:

Weis, Cleo-Aron

Other identifications and acknowledgements:

Bindzus, Jan Niklas

Producer:

Institute of Pathology, Medical Faculty Mannheim, Heidelberg University

Date of Production:

2022-05-30

Distributor:

heiDATA

Access Authority:

Legnar, Maximilian

Access Authority:

Weis, Cleo-Aron

Holdings Information:

https://doi.org/10.11588/data/KS5W0H

Study Scope

Keywords:

Medicine, Health and Life Sciences, NLP, text analysis, nephropathology, pathology reports, text classification, topic modelling

Abstract:

This data set contains all annotated topic word tables from the work "NLP in Diagnostic Texts from Nephropathology", as well as all pre-processed and tf-idf-vectorized text files. The raw texts (i.e., descriptive and diagnostic sections) are explicitly not made available, since it cannot be ruled out here that it is possible to infer the patient or the person making the report. This is in accordance with our local ethics committee. <br /> <br /> Please note: This data set is not yet complete and will be completed soon. <br /> Please refer to chapter 3.1.2 of our paper to learn how to interpret the annotated topic word tables.<br /> <br /> The associated gitlab project <a href="http://gitlab.medma.uni-heidelberg.de/mlegnar/nlp-in-diagnostic-texts-from-nephropathology">http://gitlab.medma.uni-heidelberg.de/mlegnar/nlp-in-diagnostic-texts-from-nephropathology</a> contains some examples of how the .pkl files can be opened and used with python.

Methodology and Processing

Sources Statement

Data Access

Other Study Description Materials

Related Materials

<a href="http://gitlab.medma.uni-heidelberg.de/mlegnar/nlp-in-diagnostic-texts-from-nephropathology">http://gitlab.medma.uni-heidelberg.de/mlegnar/nlp-in-diagnostic-texts-from-nephropathology</a>

Other Study-Related Materials

Label:

description_texts_tfidf_vectorized_bow_preprocessed.pkl

Notes:

application/octet-stream

Other Study-Related Materials

Label:

description_texts_tfidf_vectorized_DR_preprocessed.pkl

Notes:

application/octet-stream

Other Study-Related Materials

Label:

diagnosis_texts_tfidf_vectorized_bow_preprocessed.pkl

Notes:

application/octet-stream

Other Study-Related Materials

Label:

diagnosis_texts_tfidf_vectorized_DR_preprocessed.pkl

Notes:

application/octet-stream

Other Study-Related Materials

Label:

WordsPerCluster_German_BERT.xlsx

Notes:

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

Other Study-Related Materials

Label:

WordsPerCluster_GSDPMM.xlsx

Notes:

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

Other Study-Related Materials

Label:

WordsPerCluster_HDBSCAN.xlsx

Notes:

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

Other Study-Related Materials

Label:

WordsPerCluster_kmeans.xlsx

Notes:

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

Other Study-Related Materials

Label:

WordsPerCluster_LDA.xlsx

Notes:

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

Other Study-Related Materials

Label:

WordsPerCluster_Patho_BERT.xlsx

Notes:

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

Other Study-Related Materials

Label:

WordsPerCluster_top2vec.xlsx

Notes:

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet