{"dcterms:modified":"2024-01-18","dcterms:creator":"heiDATA","@type":"ore:ResourceMap","schema:additionalType":"Dataverse OREMap Format v1.0.0","dvcore:generatedBy":{"@type":"schema:SoftwareApplication","schema:name":"Dataverse","schema:version":"6.1 build 1590-f5d1299","schema:url":"https://github.com/iqss/dataverse"},"@id":"https://heidata.uni-heidelberg.de/api/datasets/export?exporter=OAI_ORE&persistentId=https://doi.org/10.11588/data/10002","ore:describes":{"citation:dsDescription":{"citation:dsDescriptionValue":"PatTR is a sentence-parallel corpus extracted from the MAREC patent collection. The current version contains more than 22 million German-English and 18 million French-English parallel sentences collected from all patent text sections as well as 5 million German-French sentence pairs from patent titles, abstracts and claims.
The corpus is sorted by language pairs and by text sections of a patent document, namely title, abstrac\r\nt, claims and description. Parallel data from title, abstract and claims sections were extracted from documents belonging to the European Patent Office (\r\nEPO) and the World Intellectual Property Organization (WIPO) corpora in MAREC. Both resources feature multilingual documents that contain for example both an English and a German abstract.
Since there are no multilingual descriptions, data from this section were collected by exploiting patent families to align German and French documents from the EPO corpus to English documents from the United S\r\ntates Patent and Trademark Office (USPTO) corpus, following Utiyama, Masao and Isahara, Hitoshi: A Japanese-English patent parallel corpus. MT summit XI (2007), 475--482.
All sections were sentence-aligned using the Gargantua aligner. Preprocessing was done automatically. Sentence boundaries were detected using the Europarl processing tools.
For a detailed description of the corpus construction process, please see the publications above."},"timePeriodCovered":{"citation:timePeriodCoveredStart":"1976","citation:timePeriodCoveredEnd":"2008-06"},"citation:producer":{"citation:producerName":"Wäschle, Katharina","citation:producerAffiliation":"Department of Computational Linguistics"},"author":[{"citation:authorName":"Wäschle, Katharina","citation:authorAffiliation":"Department of Computational Linguistics"},{"citation:authorName":"Riezler, Stefan","citation:authorAffiliation":"Department of Computational Linguistics"}],"citation:datasetContact":{"citation:datasetContactName":"Prof. Dr. Stefan Riezler","citation:datasetContactAffiliation":"Department of Computational Linguistics","citation:datasetContactEmail":"riezler@cl.uni-heidelberg.de"},"publication":[{"publicationCitation":"Wäschle, K. and Riezler, S. (2012b). Analyzing Parallelism and Domain Similarities in the MAREC Patent Corpus. Multidisciplinary Information Retrieval, pp. 12-27.","publicationURL":"http://www.cl.uni-heidelberg.de/~riezler/publications/papers/IRF2012.pdf"},{"publicationCitation":"Wäschle, K. and Riezler, S. (2012b). Analyzing Parallelism and Domain Similarities in the MAREC Patent Corpus. Multidisciplinary Information Retrieval, pp. 12-27.","publicationURL":"http://www.cl.uni-heidelberg.de/~riezler/publications/papers/IRF2012.pdf"},{"publicationCitation":"Wäschle, K. and Riezler, S. (2012a). Structural and Topical Dimensions in Multi-Task Patent Translation. Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), Avignon, France.","publicationURL":"http://aclweb.org/anthology//E/E12/E12-1083.pdf"}],"title":"PatTR: Patent Translation Resource","citation:productionPlace":"Heidelberg, Germany","citation:relatedMaterial":"MAREC patent collection: http://www.ir-facility.org/prototypes/marec","dateOfDeposit":"2014-05-22","citation:productionDate":"2012","subject":"Computer and Information Science","@id":"https://doi.org/10.11588/data/10002","@type":["ore:Aggregation","schema:Dataset"],"schema:version":"3.1","schema:name":"PatTR: Patent Translation Resource","schema:dateModified":"Wed Apr 05 16:25:35 CEST 2017","schema:datePublished":"2014-06-05","schema:creativeWorkStatus":"RELEASED","dvcore:termsOfUse":"PatTR is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. ","dvcore:citationRequirements":"Please cite Wäschle & Riezler (2012b), if you use the corpus in your work.","dvcore:fileTermsOfAccess":{"dvcore:fileRequestAccess":false,"dvcore:originalArchive":"http://www.cl.uni-heidelberg.de/statnlpgroup/pattr/","dvcore:sizeOfCollection":"22M German-English parallel sentences, 18M French-English parallel sentences, > 5M German-French sentence pairs from patent titles, abstracts and claims"},"schema:includedInDataCatalog":"heiDATA","schema:isPartOf":{"schema:name":"Statistical Natural Language Processing Group","@id":"https://heidata.uni-heidelberg.de/dataverse/statnlpgroup","schema:description":"The Statistical Natural Language Processing Group is part of the Department of Computational Linguistics.\r\n
\r\nOur research addresses various aspects of the problem of the confusion of languages, by means of statistical learning techniques.\r\n
\r\nResearch topics include the following:\r\n