Word Embeddings everywhere, Which one to choose!

Word Embeddings form an integral part in all natural language processing architectures and tasks whether it be building recommendation systems or topic modelling. With a huge number of popular word embedding models like Google’s Word2Vec, Facebook’s FastText, GloVe and VarEmbed, it becomes very important to unravel the essence and attributes of all these embeddings. When someone needs to build a recommendation system or finding similar words or documents, he needs to know which embedding should he use which would be the best fit.

Through this talk, I’ll cover the advantages of using each of these word embedding models and what must be used in texts of languages other than English, like Turkish, German or Sanskrit. I’ll go through some evaluations and also highlight the differences between them by visualising these embeddings through Tensorboard and Gensim.

The results will reflect how these different embeddings specialise on different downstream NLP tasks.

Presented by

Anmol Gulati

Anmol is a software engineer at Google working in the Machine Intelligence team in Google Docs. He recently graduated from Indian Institute of Technology Kharagpur, India. His technical background touches machine learning, NLP, and robotics. He’s an active contributor to multiple open-source python projects and has previously added varembed embeddings wrapper to Gensim.