src.utils package¶

Submodules¶

src.utils.io module¶

src.utils.io.load_pretrained_embeddings(embeddings_file, embeddings_dim, word_to_ix, skip_header=False)[source]¶

Load pretrained embeddings weights. For the words that don’t have a pre-trained embedding, we assign them a randomly initialized one. :param embeddings_file: Weights file :type embeddings_file: str :param embeddings_dim: Embeddings dim :type embeddings_dim: int :param word_to_ix: Word to index mapper :type word_to_ix: dict

Returns: pre-trained embeddings matrix
Return type: np.matrix

src.utils.text module¶

src.utils.text.isolate_punctuation(text)[source]¶

Isolate punctuation in a sentence.

>>> split_punctuation('Hi there!')
'Hi there !'

Parameters: text (str) – Input sentence
Returns: Output sentence with isolated punctuation
Return type: str

src.utils.text.replace_urls(text, replace_with='<URL>')[source]¶

Replace urls in a sentence with a chosen string.

>>> replace_urls("I love https://github.com")
"I love <URL>"

Parameters

text (str) – Input sentence
replace_with (str, optional) – string to replace the url with. Defaults to “<URL>”.

Returns

Output sentence with replaced url

Return type

str

src.utils.vocabulary module¶

src.utils.vocabulary.make_char_to_ix()[source]¶

Make a character to index dictionary.

Returns: character to index
Return type: dict

src.utils.vocabulary.make_word_to_ix(train_sentences, char_to_split_at=' ', unk_tag='<UNK>')[source]¶

Make a word to index dictionary

Parameters

train_sentences (list) – list of sentences
char_to_split_at (str, optional) – str. Character to use to split the sentence (for tokenization). Defaults to ” “.
unk_tag (str, optional) – Unknown tag. Defaults to “<UNK>”.

Returns

[description]

Return type

[type]