Word Embedding Models in API

A lot of embedding models take a few minutes to load, and it would be desirable for such a model to be loaded in the memory first. It is why such an API has been developed.

Model Preloading

To preload the model, use the script WordEmbedAPI provided. In the command-line shell / Terminal, type:

` > WordEmbedAPI /path/to/GoogleNews-vectors-negative300.bin.gz `

After a few minutes, it will be loaded.

For details about using WordEmbedAPI, please refer to: Console Scripts .

Class for Preloaded Model

After the model is loaded, it can be used like other word-embedding models using RESTfulKeyedVectors:

` >>> import shorttext >>> wmodel = shorttext.utils.wordembed.RESTfulKeyedVectors('http://localhost', port='5000') `

This model can be used like other gensim KeyedVectors.

class shorttext.utils.wordembed.RESTfulKeyedVectors(url, port='5000')

RESTfulKeyedVectors, for connecting to the API of the preloaded word-embedding vectors loaded by WordEmbedAPI.

This class inherits from gensim.models.keyedvectors.KeyedVectors.

closer_than(entity1, entity2)
Parameters:
  • entity1 (str) – word 1
  • entity2 (str) – word 2
Returns:

list of words

Return type:

list

distance(entity1, entity2)
Parameters:
  • entity1 (str) – word 1
  • entity2 (str) – word 2
Returns:

distance between two words

Return type:

float

distances(entity1, other_entities=())
Parameters:
  • entity1 (str) – word
  • other_entities (list) – list of words
Returns:

list of distances between entity1 and each word in other_entities

Return type:

list

get_vector(entity)
Parameters:entity – word
Type:str
Returns:word vectors of the given word
Return type:numpy.ndarray
most_similar(**kwargs)
Parameters:kwargs
Returns:
most_similar_to_given(entity1, entities_list)
Parameters:
  • entity1 (str) – word
  • entities_list (list) – list of words
Returns:

list of similarities between the given word and each word in entities_list

Return type:

list

rank(entity1, entity2)
Parameters:
  • entity1 (str) – word 1
  • entity2 (str) – word 2
Returns:

rank

Return type:

int

save(fname_or_handle, **kwargs)
Parameters:
  • fname_or_handle
  • kwargs
Returns:

similarity(entity1, entity2)
Parameters:
  • entity1 (str) – word 1
  • entity2 (str) – word 2
Returns:

similarity between two words

Return type:

float

Home: Homepage of shorttext