Negative Sampling for Learning Knowledge Graph Embeddings

Version 1.1

Kotnis, Bhushan, 2019, "Negative Sampling for Learning Knowledge Graph Embeddings", https://doi.org/10.11588/data/YYULL2, heiDATA, V1

Learn about Data Citation Standards.

Contact Owner

Dataset Metrics

44 Downloads

Description	Reimplementation of four KG factorization methods and six negative sampling methods. Abstract Knowledge graphs are large, useful, but incomplete knowledge repositories. They encode knowledge through entities and relations which define each other through the connective structure of the graph. This has inspired methods for the joint embedding of entities and relations in continuous low-dimensional vector spaces, that can be used to induce new edges in the graph, i.e., link prediction in knowledge graphs. Learning these representations relies on contrasting positive instances with negative ones. Knowledge graphs include only positive relation instances, leaving the door open for a variety of methods for selecting negative examples. In this paper we present an empirical study on the impact of negative sampling on the learned embeddings, assessed through the task of link prediction. We use state-of-the-art knowledge graph embeddings -- \rescal , TransE, DistMult and ComplEX -- and evaluate on benchmark datasets -- FB15k and WN18. We compare well known methods for negative sampling and additionally propose embedding based sampling methods. We note a marked difference in the impact of these sampling methods on the two datasets, with the "traditional" corrupting positives method leading to best results on WN18, while embedding based methods benefiting the task on FB15k.
Subject	Arts and Humanities; Computer and Information Science
Keyword	knowledge graphs, negative sampling, embedding models, linkprediction
Related Publication	Bhushan Kotnis and Vivi Nastase. 2018. Analysis of the Impact of Negative Sampling on Link Prediction in Knowledge Graphs. In Proceedings of Workshop on Knowledge Base Construction, Reasoning and Mining, Los Angeles, California USA, Feb 2018 (KBCOM'18). arXiv: 1708.06816
License/Data Use Agreement	Custom Dataset Terms

	1 File
	kge-rl-master.zip ZIP Archive - 19.4 KB Published Aug 19, 2019 44 Downloads MD5: d2e8ac74e3f20d2cdec2225962c7e2f0 Code	Access File File Access Public Download Options ZIP Archive Download Metadata Data File Citation EndNote XML RIS BibTeX

Citation Metadata

Persistent Identifier	doi:10.11588/data/YYULL2
Publication Date	2019-08-19
Title	Negative Sampling for Learning Knowledge Graph Embeddings
Subtitle	Analysis of the Impact of Negative Sampling on Link Prediction in Knowledge Graphs
Alternative URL	https://github.com/bhushank/kge-rl
Author	Kotnis, Bhushan (Department of Computational Linguistics, Heidelberg University, Germany (2016-2018), NEC Laboratories Europe GmbH (since 2018))
Point of Contact	Use email button above to contact. Kotnis, Bhushan (NEC Laboratories Europe GmbH)
Description	Reimplementation of four KG factorization methods and six negative sampling methods. Abstract Knowledge graphs are large, useful, but incomplete knowledge repositories. They encode knowledge through entities and relations which define each other through the connective structure of the graph. This has inspired methods for the joint embedding of entities and relations in continuous low-dimensional vector spaces, that can be used to induce new edges in the graph, i.e., link prediction in knowledge graphs. Learning these representations relies on contrasting positive instances with negative ones. Knowledge graphs include only positive relation instances, leaving the door open for a variety of methods for selecting negative examples. In this paper we present an empirical study on the impact of negative sampling on the learned embeddings, assessed through the task of link prediction. We use state-of-the-art knowledge graph embeddings -- \rescal , TransE, DistMult and ComplEX -- and evaluate on benchmark datasets -- FB15k and WN18. We compare well known methods for negative sampling and additionally propose embedding based sampling methods. We note a marked difference in the impact of these sampling methods on the two datasets, with the "traditional" corrupting positives method leading to best results on WN18, while embedding based methods benefiting the task on FB15k.
Subject	Arts and Humanities; Computer and Information Science
Keyword	knowledge graphs negative sampling embedding models linkprediction
Topic Classification	knowledge discovery in knowledge graphs
Related Publication	Bhushan Kotnis and Vivi Nastase. 2018. Analysis of the Impact of Negative Sampling on Link Prediction in Knowledge Graphs. In Proceedings of Workshop on Knowledge Base Construction, Reasoning and Mining, Los Angeles, California USA, Feb 2018 (KBCOM'18). arXiv: 1708.06816 https://arxiv.org/pdf/1708.06816.pdf
Production Date	2018
Production Location	Heidelberg University
Data Type	program source code
Related Material	FB15K datasets "This FREEBASE FB15k DATA consists of a collection of triplets (synset, relation_type, triplet) extracted from Freebase (http://www.freebase.com). This data set can be seen as a 3-mode tensor depicting ternary relationships between synsets." (see README, in FB15k) Link to the FB15K dataset: https://everest.hds.utc.fr/lib/exe/fetch.php?media=en:fb15k.tgz License: Creative Commons Attribution 2.5 License. When using this data, one should cite the original paper: Bordes, A., Usunier, N., García-Durán, A., Weston, J., & Yakhnenko, O. (2013). Translating Embeddings for Modeling Multirelational Data. In Proceedings of Neural Information Processing Systems (NIPS 26), Lake Taho, NV, USA. Dec. 2013.
Related Dataset	Kotnis, Bhushan, 2019, "KGE Algorithms", https://doi.org/10.11588/data/CSXYSS, heiDATA
Data Source	M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich. 2016. A Review of Relational Machine Learning for Knowledge Graphs. In Proc. IEEE 104, 1 (Jan 2016), 11–33. https://doi.org/10.1109/JPROC.2015.2483592 Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2012. Factorizing YAGO: Scalable Machine Learning for Linked Data. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM, New York, NY,USA, 271–280. https://doi.org/10.1145/2187836.2187874 Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multirelational Data. In Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2787–2795. https://papers.nips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data.pdf Maximilian Nickel, Lorenzo Rosasco, and Tomaso Poggio. 2016. Holographic Embeddings of Knowledge Graphs. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI'16). AAAI Press, 1955–1961. https://dl.acm.org/citation.cfm?id=3016100.3016172 PyTorch. https://pytorch.org/

Dataset Terms

License/Data Use Agreement

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation shown on the dataset page.

Custom Dataset Terms — the following Custom Dataset Terms have been defined for this dataset.

Licensed under GNU General Public License v3 (GPL v3).

Dataset Version	Summary	Contributors	Published on
No records found.

Edit File

This file has already been deleted (or replaced) in the current version. It may not be edited.

Restrict Access

Restricting limits access to published files. People who want to use the restricted files can request access by default. If you disable request access, you must add information about access to the Terms of Access field.

Learn about restricting files and dataset access in the User Guide.

Request Access

Enable access request

You must enable request access or add terms of access to restrict file access.

Terms of Access for Restricted Files

Save Changes

Edit Embargo

The selected file or files have already been published. Contact an administrator to change the embargo date or reason of the file or files.

Delete Files

The file will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Select File(s)

Please select one or more files.

Share Dataset

Share this dataset on your favorite social media networks.

Continue

Dataset Citations

Citations for this dataset are retrieved from Crossref via DataCite using Make Data Count standards. For more information about dataset metrics, please refer to the User Guide.

Sorry, no citations were found.

Restricted Files Selected

The selected file(s) may not be downloaded because you have not been granted access.

Download Options

The files selected are too large to download as a ZIP.

You can select individual files that are below the 4.7 GB download limit from the files table, or use the Data Access API for programmatic access to the files.

Select File(s)

Please select a file or files to be downloaded.

Restricted Files Selected

The restricted file(s) selected may not be downloaded because you have not been granted access.

Click Continue to download the files you have access to download.

Ineligible Files Selected

Some file(s) cannot be transferred. (They are restricted, embargoed, or not Globus accessible.)

Click Continue to transfer the elligible files.

Delete Dataset

Are you sure you want to delete this dataset and all of its files? You cannot undelete this dataset.

Delete Draft Version

Are you sure you want to delete this draft version? Files will be reverted to the most recently published version. You cannot undelete this draft.

Unpublished Dataset Private URL

Private URL can only be used with unpublished versions of datasets.

Unpublished Dataset Private URL

Are you sure you want to disable the Private URL? If you have shared the Private URL with others they will no longer be able to use it to access your unpublished dataset.

Delete Files

The file(s) will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Compute

This dataset contains restricted files you may not compute on because you have not been granted access.

Deaccession Dataset

Are you sure you want to deaccession? This is permanent and the selected version(s) will no longer be viewable by the public.

Deaccession Dataset

Are you sure you want to deaccession this dataset? This is permanent an it will no longer be viewable by the public.

Version Differences Details

Please select two versions to view the differences.

Version Differences Details

Version:
Last Updated:

Select File(s)

Please select a file or files for access request.

Select File(s)

Embargoed files cannot be accessed. Please select an unembargoed file or files for your access request.

Edit Tags

Select existing file tags or create new tags to describe your files. Each file can have more than one tag.

Request Access

You need to Sign Up or Log In to request access.

Dataset Terms

Please confirm and/or complete the information needed below in order to request access to files in this dataset.

This dataset is made available under the following terms. Please confirm and/or complete the information needed below in order to continue.

License/Data Use Agreement

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation shown on the dataset page.

Custom terms specific to this dataset Custom Dataset Terms — the following Custom Dataset Terms have been defined for this dataset.

Name

Institution

Position

Preview Guestbook

Upon downloading files the guestbook asks for the following information.

Guestbook Name

Collected Data

Account Information

Package File Download

Use the Download URL in a Wget command or a download manager to download this package file. Download via web browser is not recommended. User Guide - Downloading a Dataverse Package via URL

Download URL

https://heidata.uni-heidelberg.de/api/access/datafile/

Compute Batch

Clear Batch

Dataset	Persistent Identifier	Change Compute Batch

Compute Batch

Submit for Review

You will not be able to make changes to this dataset while it is in review.

Publish Dataset

Are you sure you want to republish this dataset?

Select if this is a minor or major version update.

Minor Release (1.2)

Major Release (2.0)

Publish Dataset

This dataset cannot be published until Empirical Linguistics and Computational Language Modeling (LiMo) is published by its administrator.

Publish Dataset

This dataset cannot be published until Empirical Linguistics and Computational Language Modeling (LiMo) and heiDATA are published.

Return to Author

Return this dataset to contributor for modification. The reason for return entered below will be sent by email to the author.