View: |
Part 1: Document Description
|
Citation |
|
---|---|
Title: |
NLP in Diagnostic Texts from Nephropathology [Research Data] |
Identification Number: |
doi:10.11588/data/KS5W0H |
Distributor: |
heiDATA |
Date of Distribution: |
2022-06-02 |
Version: |
1 |
Bibliographic Citation: |
Legnar, Maximilian; Daumke, Philipp; Hesser, Jürgen; Porubsky, Stefan; Popovic, Zoran; Bindzus, Jan Niklas; Siemoneit, Joern-Helge; Weis, Cleo-Aron, 2022, "NLP in Diagnostic Texts from Nephropathology [Research Data]", https://doi.org/10.11588/data/KS5W0H, heiDATA, V1 |
Citation |
|
Title: |
NLP in Diagnostic Texts from Nephropathology [Research Data] |
Identification Number: |
doi:10.11588/data/KS5W0H |
Authoring Entity: |
Legnar, Maximilian (Institute of Pathology, Medical Faculty Mannheim, Heidelberg University, Germany) |
Daumke, Philipp (Averbis GmbH, Freiburg, Germany) |
|
Hesser, Jürgen (Data Analysis and Modeling, MIISM, Medical School, Interdisciplinary Center for Scientific Computing (IWR), Central Institute for Computer Engineering (ZITI), CZS Heidelberg Center for Model-Based AI, Heidelberg University) |
|
Porubsky, Stefan (Institute of Pathology, Medical Faculty Mainz, University Hospital Mainz, Mainz, Germany) |
|
Popovic, Zoran (Institute of Pathology, Medical Faculty Mannheim, Heidelberg University) |
|
Bindzus, Jan Niklas (Institute of Pathology, Medical Faculty Mannheim, Heidelberg University) |
|
Siemoneit, Joern-Helge (Institute of Pathology, Medical Faculty Mannheim, Heidelberg University) |
|
Weis, Cleo-Aron (Institute of Pathology, Medical Faculty Mannheim, Heidelberg University) |
|
Other identifications and acknowledgements: |
Legnar, Maximillian |
Other identifications and acknowledgements: |
Weis, Cleo-Aron |
Other identifications and acknowledgements: |
Bindzus, Jan Niklas |
Producer: |
Institute of Pathology, Medical Faculty Mannheim, Heidelberg University |
Date of Production: |
2022-05-30 |
Distributor: |
heiDATA |
Access Authority: |
Legnar, Maximilian |
Access Authority: |
Weis, Cleo-Aron |
Holdings Information: |
https://doi.org/10.11588/data/KS5W0H |
Study Scope |
|
Keywords: |
Medicine, Health and Life Sciences, NLP, text analysis, nephropathology, pathology reports, text classification, topic modelling |
Abstract: |
This data set contains all annotated topic word tables from the work "NLP in Diagnostic Texts from Nephropathology", as well as all pre-processed and tf-idf-vectorized text files. The raw texts (i.e., descriptive and diagnostic sections) are explicitly not made available, since it cannot be ruled out here that it is possible to infer the patient or the person making the report. This is in accordance with our local ethics committee. <br /> <br /> Please note: This data set is not yet complete and will be completed soon. <br /> Please refer to chapter 3.1.2 of our paper to learn how to interpret the annotated topic word tables.<br /> <br /> The associated gitlab project <a href="http://gitlab.medma.uni-heidelberg.de/mlegnar/nlp-in-diagnostic-texts-from-nephropathology">http://gitlab.medma.uni-heidelberg.de/mlegnar/nlp-in-diagnostic-texts-from-nephropathology</a> contains some examples of how the .pkl files can be opened and used with python. |
Methodology and Processing |
|
Sources Statement |
|
Data Access |
|
Other Study Description Materials |
|
Related Materials |
|
<a href="http://gitlab.medma.uni-heidelberg.de/mlegnar/nlp-in-diagnostic-texts-from-nephropathology">http://gitlab.medma.uni-heidelberg.de/mlegnar/nlp-in-diagnostic-texts-from-nephropathology</a> |
|
Label: |
description_texts_tfidf_vectorized_bow_preprocessed.pkl |
Notes: |
application/octet-stream |
Label: |
description_texts_tfidf_vectorized_DR_preprocessed.pkl |
Notes: |
application/octet-stream |
Label: |
diagnosis_texts_tfidf_vectorized_bow_preprocessed.pkl |
Notes: |
application/octet-stream |
Label: |
diagnosis_texts_tfidf_vectorized_DR_preprocessed.pkl |
Notes: |
application/octet-stream |
Label: |
WordsPerCluster_German_BERT.xlsx |
Notes: |
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet |
Label: |
WordsPerCluster_GSDPMM.xlsx |
Notes: |
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet |
Label: |
WordsPerCluster_HDBSCAN.xlsx |
Notes: |
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet |
Label: |
WordsPerCluster_kmeans.xlsx |
Notes: |
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet |
Label: |
WordsPerCluster_LDA.xlsx |
Notes: |
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet |
Label: |
WordsPerCluster_Patho_BERT.xlsx |
Notes: |
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet |
Label: |
WordsPerCluster_top2vec.xlsx |
Notes: |
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet |