src.utils package

Submodules

src.utils.io module

src.utils.io.load_pretrained_embeddings(embeddings_file, embeddings_dim, word_to_ix, skip_header=False)[source]

Load pretrained embeddings weights. For the words that don’t have a pre-trained embedding, we assign them a randomly initialized one. :param embeddings_file: Weights file :type embeddings_file: str :param embeddings_dim: Embeddings dim :type embeddings_dim: int :param word_to_ix: Word to index mapper :type word_to_ix: dict

Returns

pre-trained embeddings matrix

Return type

np.matrix

src.utils.text module

src.utils.text.isolate_punctuation(text)[source]

Isolate punctuation in a sentence.

>>> split_punctuation('Hi there!')
'Hi there !'
Parameters

text (str) – Input sentence

Returns

Output sentence with isolated punctuation

Return type

str

src.utils.text.replace_urls(text, replace_with='<URL>')[source]

Replace urls in a sentence with a chosen string.

>>> replace_urls("I love https://github.com")
"I love <URL>"
Parameters
  • text (str) – Input sentence

  • replace_with (str, optional) – string to replace the url with. Defaults to “<URL>”.

Returns

Output sentence with replaced url

Return type

str

src.utils.vocabulary module

src.utils.vocabulary.make_char_to_ix()[source]

Make a character to index dictionary.

Returns

character to index

Return type

dict

src.utils.vocabulary.make_word_to_ix(train_sentences, char_to_split_at=' ', unk_tag='<UNK>')[source]

Make a word to index dictionary

Parameters
  • train_sentences (list) – list of sentences

  • char_to_split_at (str, optional) – str. Character to use to split the sentence (for tokenization). Defaults to ” “.

  • unk_tag (str, optional) – Unknown tag. Defaults to “<UNK>”.

Returns

[description]

Return type

[type]