Affixoid Dataset (DE) (ICPSR doi:10.11588/data/QKF4LT)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

Affixoid Dataset (DE)

Identification Number:

doi:10.11588/data/QKF4LT

Distributor:

heiDATA

Date of Distribution:

2019-10-08

Version:

1

Bibliographic Citation:

Ruppenhofer, Josef, 2019, "Affixoid Dataset (DE)", https://doi.org/10.11588/data/QKF4LT, heiDATA, V1, UNF:6:+MGK9lTPTXx7Rclu1BpPnw== [fileUNF]

Study Description

Citation

Title:

Affixoid Dataset (DE)

Identification Number:

doi:10.11588/data/QKF4LT

Authoring Entity:

Ruppenhofer, Josef (Leibniz Institute for the German Language)

Date of Production:

2018

Distributor:

heiDATA

Date of Distribution:

2019-10-08

Study Scope

Keywords:

Arts and Humanities, Computer and Information Science, morphology, entiment analysis, compound, affixoid, German

Topic Classification:

sentiment, affixoid classification

Abstract:

The dataset contains the manual annotations for the COLING 2018 submission "Distinguishing affixoid formations from compounds" by Josef Ruppenhofer, Michael Wiegand, Rebecca Wilm and Katja Markert. 1788 complex words containing one of 7 German suffixoid candidates (e.g. -hai, -gott) were annotated manually as to whether the complex forms represent regular compounds or affixoid formations. The main experiments in the paper use automatically extracted features of the complex forms in trying to correctly make this distinction. Additionally, the words were labeled for five properties related to any intensifying and evaluative meaning potentially associated with the whole word and its components. These manual feature annotations were used to establish the upper-bound performance of a classifier trained to distinguish affixoid formations from regular compounds.

Kind of Data:

textual data, CSV text file format

Methodology and Processing

File Description--f3065

File: dataset_annotations.tab

  • Number of cases: 1787

  • No. of variables per record: 1

  • Type of File: text/tab-separated-values

Notes:

UNF:6:+MGK9lTPTXx7Rclu1BpPnw==

Variable Description

List of Variables:

Variables

Achaimenidenkönig N N N N neu N

f3065 Location:

Variable Format: character

Notes: UNF:6:+MGK9lTPTXx7Rclu1BpPnw==

Other Study-Related Materials

Label:

README.txt

Notes:

text/plain