A harmonised testsuite for social media POS tagging (DE) (ICPSR doi:10.11588/data/KXLMHN)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

A harmonised testsuite for social media POS tagging (DE)

Identification Number:

doi:10.11588/data/KXLMHN

Distributor:

heiDATA

Date of Distribution:

2020-03-26

Version:

1

Bibliographic Citation:

Rehbein, Ines; Ruppenhofer, Josef; Zimmermann, Victor, 2020, "A harmonised testsuite for social media POS tagging (DE)", https://doi.org/10.11588/data/KXLMHN, heiDATA, V1

Study Description

Citation

Title:

A harmonised testsuite for social media POS tagging (DE)

Identification Number:

doi:10.11588/data/KXLMHN

Authoring Entity:

Rehbein, Ines (Leibniz Institute for the German Language)

Ruppenhofer, Josef (Leibniz Institute for the German Language)

Zimmermann, Victor (Department of Computational Linguistics, Heidelberg University)

Date of Production:

2018

Distributor:

heiDATA

Date of Distribution:

2020-03-26

Study Scope

Keywords:

Arts and Humanities, Computer and Information Science, POS tagging, German, Tweets, German web data

Topic Classification:

Social media data, POS tagging

Abstract:

<p>A harmonised POS testsuite of web data, CMC and Twitter microtext, with word forms and STTS pos tags (+ some additional CMC-specific tags). UD pos tags have been automatically converted, based on the STTS pos tags. The data does not contain (manually corrected) lemma information. The original data comes from 3 different sources: a twitter dataset with 21,181 tokens, and two datasets from the Empirist shared task 2015: web data (12,718 tokens) and computer-mediated communication (10,505 tokens).</p>

Kind of Data:

archived tab-separated format (CoNLL-U)

Methodology and Processing

Other Study-Related Materials

Label:

social-media-POS-testsuite.conllu

Notes:

application/octet-stream