Data Stack¶
1. Presentation¶
A sample data stack running on Docker, that contains the following components:
MariaDB, with PHPMyAdmin
Postgres, with PHPPgAdmin
Doccano data labelling interface
Nginx as reverse proxy
Sphinx auto-generated documentation
A template python module, usable in Airflow DAGS
A template machine learning package, using Pytorch
A
ml_helper
package, that provides functions to store machine learning models results and parameters in a database.A
utils
package with utilities functions.Unit-testing with pytest library
2. Installation¶
You will need to have the following software installed:
Once you’re good, create a virtual environment in install the pre-requisite python libraries:
virtualenv venv;
source venv/bin/activate;
pip install -r requirements.txt;
3. Usage¶
3.1 Launch the Docker stack¶
Run it with:
docker-compose up -d
Then visit:
localhost:3000: for Metabase
localhost:8080: for Airflow
localhost:8000: for Doccano
Add your Airflow DAGS in the dags folder.
3.3 Generating the Sphinx docs¶
Generate the Sphinx documentation with:
sphinx-apidoc ./src -o docs/source -M;
cd docs && make html && open build/html/index.html;
4. References¶
src¶
src package¶
Subpackages¶
src.ml package¶
From: https://pytorch.org/tutorials/beginner/text_sentiment_ngrams_tutorial.html
src.ml_helper package¶
-
class
src.ml_helper.model.
Base
(**kwargs)[source]¶ Bases:
object
The most base type
-
metadata
= MetaData(bind=None)¶
-
Submodules¶
API Reference¶
This page contains auto-generated API reference documentation 1.
src
¶
Subpackages¶
src.ml
¶
src.ml.pytorch_example
¶From: https://pytorch.org/tutorials/beginner/text_sentiment_ngrams_tutorial.html
Base class for all neural network modules. |
|
|
|
|
|
|
|
-
class
src.ml.pytorch_example.
TextSentiment
(vocab_size, embed_dim, num_class)[source]¶ Bases:
torch.nn.Module
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self): super(Model, self).__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call
to()
, etc.
src.ml_helper
¶
src.ml_helper.model
¶src.ml_helper.training
¶
|
|
|
|
|
|
|
|
|
src.utils
¶
src.utils.io
¶
|
Load pretrained embeddings weights. |
-
src.utils.io.
load_pretrained_embeddings
(embeddings_file, embeddings_dim, word_to_ix, skip_header=False)[source]¶ Load pretrained embeddings weights. For the words that don’t have a pre-trained embedding, we assign them a randomly initialized one. :param embeddings_file: Weights file :type embeddings_file: str :param embeddings_dim: Embeddings dim :type embeddings_dim: int :param word_to_ix: Word to index mapper :type word_to_ix: dict
- Returns
pre-trained embeddings matrix
- Return type
np.matrix
src.utils.text
¶
|
Isolate punctuation in a sentence. |
|
Replace urls in a sentence with a chosen string. |
-
src.utils.text.
isolate_punctuation
(text)[source]¶ Isolate punctuation in a sentence.
>>> split_punctuation('Hi there!') 'Hi there !'
- Parameters
text (str) – Input sentence
- Returns
Output sentence with isolated punctuation
- Return type
str
-
src.utils.text.
replace_urls
(text, replace_with='<URL>')[source]¶ Replace urls in a sentence with a chosen string.
>>> replace_urls("I love https://github.com") "I love <URL>"
- Parameters
text (str) – Input sentence
replace_with (str, optional) – string to replace the url with. Defaults to “<URL>”.
- Returns
Output sentence with replaced url
- Return type
str
src.utils.vocabulary
¶Make a character to index dictionary. |
|
|
Make a word to index dictionary |
-
src.utils.vocabulary.
make_char_to_ix
()[source]¶ Make a character to index dictionary.
- Returns
character to index
- Return type
dict
-
src.utils.vocabulary.
make_word_to_ix
(train_sentences, char_to_split_at=' ', unk_tag='<UNK>')[source]¶ Make a word to index dictionary
- Parameters
train_sentences (list) – list of sentences
char_to_split_at (str, optional) – str. Character to use to split the sentence (for tokenization). Defaults to ” “.
unk_tag (str, optional) – Unknown tag. Defaults to “<UNK>”.
- Returns
[description]
- Return type
[type]
Submodules¶
src.example_module
¶
A python example module.
- 1
Created with sphinx-autoapi