NEGATIVE-SAMPLING WORD-EMBEDDING METHOD

Madina Erzhankyzy Bokan

Аннотация


One of the most famous authors of the method is Tomas Mikolov. His software and method of theoretical application are the major ones for our consideration today. It is better to pay attention that it is more mathematically oriented. The use of embedding models to turn KGs into vector space has become a well-known field of research. In recent years, a plethora of embedding learning approaches have been proposed in the literature. Many of these models rely on data already stored in the input KG. Following the closed world assumption, the knowledge not presented in the KG cannot be judged untrue; instead, it may only be labeled as unknown. On the other hand, embedding models, like most machine learning algorithms, require negative instances to learn embeddings efficiently. To deal with this, a variety of negative sample generating strategies have been developed. The author himself had more to do with mathematics, and his method concerns, first of all, a mathematical solution for a theoretical, and then a practical solution for creating this and the method we are analyzing. Dense vector word representations have lately gained popularity as fixed-length features for machine learning algorithms, and Mikolov’s system is now widely used. We investigate one of its main components, Negative Sampling, and offer efficient distributed methods that allow us to scale to indicate and exclude the possibility of probability loss in a similar value. Furthermore, this method is laser-focused on a single action in the broad sense for processing the recognition of the above-mentioned vector or words. It is important to pay attention to mathematical theory and understand the importance of the neural network in this field.

Ключевые слова


graphs, negative sampling, method of word embedding, Тomas Mikolov, train, mathematical basic theory, sequence, matrix, vectors.

Полный текст:

PDF (English)

Литература


Shenron (April 2009). History of programming development. U.S. history of computers and developing.

https://www.historyofthings.com/history-of-computers

word2vec and Advances in Neural Information Processing Systems 26: 27th Annual Conference on

Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013,

Lake Tahoe, Nevada, United States, pages 3111–3119, 2013. https://tensorflowkorea.files.wordpress.com/2017/03/cs224n-2017winter-notes-all.pdf

Embedding process and Negative-Sampling Word-Embedding Method by Tomas Mikolov. USA (

Lake Tahoe, Nevada, 2013)

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. Distributed representations of words and

phrases and their compositionality. In Advances in Neural Information Processing Systems 26: 27th

Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held

December 5-8, 2013, Lake Tahoe, Nevada, United States, pages 3111–3119, 2013.

Han, L., Kashyap, A. L., Finin, T., Mayfield, J., & Weese, J. (2013, June). UMBC_EBIQUITY-CORE: Semantic textual similarity systems. In Second Joint Conference on Lexical and Computational Semantics

(* SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

(pp. 44-52).

Walker, A. J. (1974). New fast method for generating discrete random numbers with arbitrary frequency distributions. Electronics Letters, 8(10), 127-128. https://doi.org/10.1049/el:19740097

Mikolov, T. (2008). Language models for automatic speech recognition of czech lectures. Proc. of

Student EEICT. https://spark.apache.org/docs/latest/mllibfeature-extraction.html.

Chelba, C., Mikolov, T., Schuster, M., Ge, Q., Brants, T., Koehn, P., & Robinson, T. (2013). One billion word benchmark for measuring progress in statistical language modeling. arXiv preprint arXiv:1312.3005. https://doi.org/10.21437/interspeech.2014-564

Kiros, R., Zemel, R., & Salakhutdinov, R. R. (2014). A multiplicative model for learning distributed

text-based attribute representations. Advances in neural information processing systems, 27.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of

words and phrases and their compositionality. Advances in neural information processing systems, 26.

Mikolov, T., Kopecky, J., Burget, L., & Glembek, O. (2009, April). Neural network based language

models for highly inflective languages. In 2009 IEEE international conference on acoustics, speech

and signal processing (pp. 4725-4728). IEEE. https://doi.org/10.1109/icassp.2009.4960686

Grbovic, M., Djuric, N., Radosavljevic, V., Silvestri, F., & Bhamidipati, N. (2015, August). Context-and

content-aware embeddings for query rewriting in sponsored search. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval (pp. 383-392).




DOI: http://dx.doi.org/10.37943/ELGD6408

Ссылки

  • Ссылки не определены.


(P): 2707-9031
(E): 2707-904X

Articles are open access under the Creative Commons License  


Нур-Султан
Бизнес-центр EXPO, блок C.1.
Казахстан, 010000

sjaitu@astanait.edu.kz