The MSC Data Sethttps://doi.org/10.11588/data/JEESIQMarasović, AnaZhou, MengfeiFrank, AnetteheiDATA2019-10-072019-10-07T10:25:15Z<p>From this page you can download resources we created for <strong>modal sense classification</strong> as reported in Zhou et al. (2015), Marasović et al. (2016) and Marasović and Frank (2015) (see "Related Publication" below):</p>
<ul>
<li>Heuristically sense-annotated training data acquired from EUROPARL and OpenSubtitles (<strong>EPOS_E</strong>, English). The dataset was used for:
<ul>
<li>the EMNLP 2015 Workshop submission "Semantically enriched models for modal sense classification" by Mengfei Zhou, Anette Frank,Annemarie Friedrich, and Alexis Palmer</li>
<li>the LiLT submission "Modal Sense Classification At Large: Paraphrase-Driven Sense Projection, Semantically Enriched Classification Models and Cross-Genre Evaluations" by Ana Marasović, Mengfei Zou, Alexis Palmer, Anette Frank</li>
<li>the RepL4NLP submission "Multilingual Modal Sense Classification using a Convolutional Neural Network" by Ana Marasović and Anette Frank.</li>
</ul>
</li>
<li>Composition of training and testing used for the classification experiments. The dataset was used for:
<ul>
<li>the EMNLP 2015 Workshop submission "Semantically enriched models for modal sense classification" by submission Mengfei Zhou, Anette Frank,Annemarie Friedrich, and Alexis Palmer</li>
<li>the RepL4NLP submission "Multilingual Modal Sense Classification using a Convolutional Neural Network" by Ana Marasović and Anette Frank.</li>
</ul>
</li>
<li>Manually annotated subsection of <strong>MASC</strong> (English). The dataset was used for the LiLT submission "Modal Sense Classification At Large: Paraphrase-Driven Sense Projection, Semantically Enriched Classification Models and Cross-Genre Evaluations" by Ana Marasović, Mengfei Zou, Alexis Palmer, Anette Frank.</li>
<li>Heuristically modal sense annotated training data and manually annotated test data from EUROPARL and OpenSubtitles (<strong>EPOS_G</strong>, German). The dataset was used for the RepL4NLP submission "Multilingual Modal Sense Classification using a Convolutional Neural Network" by Ana Marasović and Anette Frank.</li>
</ul>
<p> </p>Arts and HumanitiesComputer and Information Sciencemodal sense classificationsemanticsmachine learningannotationmodality<p>Zhou, M., Frank, A., Friedrich, A., and Palmer, A. (2015). Semantically enriched models for modal sense classification. In <em>Proceedings of the EMNLP 2015 Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics</em>, pages 44–53, 18 September 2015, Lisboa, Portugal.</p>, url, https://www.aclweb.org/anthology/W15-2705, https://www.aclweb.org/anthology/W15-2705<p>Marasović, A., Zhou, M., Palmer, A., and Frank, A. (2016). Modal sense classification at large: Paraphrasedriven sense projection, semantically enriched classification models and cross-genre evaluations. In <em>Linguistic Issues in Language Technology, Special issue on Modality in Natural Language Understanding</em>, volume 14 (2), Stanford, CA. CSLI Publications.</p>, url, http://csli-lilt.stanford.edu/ojs/index.php/LiLT/article/view/65/65, http://csli-lilt.stanford.edu/ojs/index.php/LiLT/article/view/65/65<p>Marasović, A. and Frank, A. (2016). Multilingual modal sense classification using a convolutional neural network. In <em>Proceedings of the 1st Workshop on Representation Learning for NLP,</em> pages 111–120, August 11, 2016, Berlin, Germany. Association for Computational Linguistics.</p>, url, https://www.aclweb.org/anthology/W16-1613, https://www.aclweb.org/anthology/W16-16132015textual data<p>The <strong>MSC Data Set</strong> is licensed under a <a href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0). <img src="https://i.creativecommons.org/l/by-sa/4.0/80x15.png" alt="CC by sa" /></a></p>
<p> </p>
<p>Please note, that the dataset contains data from several corpora. Their use must be carried out in accordance with the conditions laid down in the EUROPARL, OpenSubtitle corpus and MASC license:</p>
<ul>
<li>EPOS_E and EPOS_D contain data from EUROPARL (licensed under a <a href="https://creativecommons.org/share-your-work/public-domain/cc0/"><span class="st">Creative Commons</span> Zero (CC0)</a> <img src="https://licensebuttons.net/p/zero/1.0/88x31.png" />) and OpenSubtitle corpus (Open For Reuse With Restrictions, see <a href="https://www.opensubtitles.org/en/disclaimer">https://www.opensubtitles.org/en/disclaimer</a>)</li>
<li>MASC contains data from MASC corpus (licensed <a href="https://creativecommons.org/licenses/by/3.0/us/">Creative Commons Attribution 3.0 United States License) </a></li>
</ul>