src.utils package¶
Submodules¶
src.utils.io module¶
-
src.utils.io.
load_pretrained_embeddings
(embeddings_file, embeddings_dim, word_to_ix, skip_header=False)[source]¶ Load pretrained embeddings weights. For the words that don’t have a pre-trained embedding, we assign them a randomly initialized one. :param embeddings_file: Weights file :type embeddings_file: str :param embeddings_dim: Embeddings dim :type embeddings_dim: int :param word_to_ix: Word to index mapper :type word_to_ix: dict
- Returns
pre-trained embeddings matrix
- Return type
np.matrix
src.utils.text module¶
-
src.utils.text.
isolate_punctuation
(text)[source]¶ Isolate punctuation in a sentence.
>>> split_punctuation('Hi there!') 'Hi there !'
- Parameters
text (str) – Input sentence
- Returns
Output sentence with isolated punctuation
- Return type
str
-
src.utils.text.
replace_urls
(text, replace_with='<URL>')[source]¶ Replace urls in a sentence with a chosen string.
>>> replace_urls("I love https://github.com") "I love <URL>"
- Parameters
text (str) – Input sentence
replace_with (str, optional) – string to replace the url with. Defaults to “<URL>”.
- Returns
Output sentence with replaced url
- Return type
str
src.utils.vocabulary module¶
-
src.utils.vocabulary.
make_char_to_ix
()[source]¶ Make a character to index dictionary.
- Returns
character to index
- Return type
dict
-
src.utils.vocabulary.
make_word_to_ix
(train_sentences, char_to_split_at=' ', unk_tag='<UNK>')[source]¶ Make a word to index dictionary
- Parameters
train_sentences (list) – list of sentences
char_to_split_at (str, optional) – str. Character to use to split the sentence (for tokenization). Defaults to ” “.
unk_tag (str, optional) – Unknown tag. Defaults to “<UNK>”.
- Returns
[description]
- Return type
[type]