Data Stack

Python application Documentation Status

1. Presentation

A sample data stack running on Docker, that contains the following components:

  • Airflow

  • Metabase

  • MariaDB, with PHPMyAdmin

  • Postgres, with PHPPgAdmin

  • Doccano data labelling interface

  • Nginx as reverse proxy

  • Sphinx auto-generated documentation

  • A template python module, usable in Airflow DAGS

  • A template machine learning package, using Pytorch

  • A ml_helper package, that provides functions to store machine learning models results and parameters in a database.

  • A utils package with utilities functions.

  • Unit-testing with pytest library

2. Installation

You will need to have the following software installed:

Once you’re good, create a virtual environment in install the pre-requisite python libraries:

virtualenv venv;
source venv/bin/activate;
pip install -r requirements.txt;

3. Usage

3.1 Launch the Docker stack

Run it with:

docker-compose up -d

Then visit:

Add your Airflow DAGS in the dags folder.

3.2 Unit testing

Run the unit tests with:

pytest tests

3.3 Generating the Sphinx docs

Generate the Sphinx documentation with:

sphinx-apidoc ./src -o docs/source -M;
cd docs && make html && open build/html/index.html;

4. References

src

src package

Subpackages
src.ml package
Submodules
src.ml.pytorch_example module

From: https://pytorch.org/tutorials/beginner/text_sentiment_ngrams_tutorial.html

src.ml.pytorch_example.BATCH_SIZE = 16[source]
class src.ml.pytorch_example.TextSentiment(vocab_size, embed_dim, num_class)[source]

Bases: torch.nn.modules.module.Module

forward(text, offsets)[source]
init_weights()[source]
src.ml.pytorch_example.device = device(type='cpu')[source]
src.ml.pytorch_example.generate_batch(batch)[source]
src.ml.pytorch_example.main()[source]
src.ml.pytorch_example.test(model, criterion, data_)[source]
src.ml.pytorch_example.train_func(model, optimizer, criterion, scheduler, sub_train_)[source]
src.ml_helper package
Submodules
src.ml_helper.model module
class src.ml_helper.model.Base(**kwargs)[source]

Bases: object

The most base type

metadata = MetaData(bind=None)
src.ml_helper.model.ENGINE_URL = 'postgres://user:password@localhost:5432/ml_helper'[source]
class src.ml_helper.model.Epoch(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

created_at[source]
eval_F1[source]
eval_acc[source]
eval_loss[source]
model_id[source]
number[source]
training_F1[source]
training_acc[source]
training_loss[source]
uuid[source]
class src.ml_helper.model.Model(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

epoch[source]
id[source]
last_updated[source]
params[source]
uuid[source]
src.ml_helper.model.Session = sessionmaker(class_='Session', bind=Engine(postgres://user:***@localhost:5432/ml_helper), autoflush=True, autocommit=False, expire_on_commit=True)[source]
src.ml_helper.model.engine = Engine(postgres://user:***@localhost:5432/ml_helper)[source]
src.ml_helper.model.init_db()[source]
src.ml_helper.model.metadata = MetaData(bind=None)[source]
src.ml_helper.training module
src.ml_helper.training.delete_model(model_id)[source]
src.ml_helper.training.hash_parameters(params)[source]
src.ml_helper.training.register_epoch_in_db(model_id, epoch_number, **kwargs)[source]
src.ml_helper.training.register_model_in_db(model_id, params)[source]
src.ml_helper.training.retrieve_best_model_params()[source]
Submodules
src.example_module module

A python example module.

src.example_module.example_function()[source]

Basic function that returns a string

API Reference

This page contains auto-generated API reference documentation 1.

src

Subpackages
src.ml
Submodules
src.ml.pytorch_example

From: https://pytorch.org/tutorials/beginner/text_sentiment_ngrams_tutorial.html

Module Contents
Classes

TextSentiment

Base class for all neural network modules.

Functions

generate_batch(batch)

train_func(model, optimizer, criterion, scheduler, sub_train_)

test(model, criterion, data_)

main()

src.ml.pytorch_example.BATCH_SIZE = 16[source]
src.ml.pytorch_example.device[source]
class src.ml.pytorch_example.TextSentiment(vocab_size, embed_dim, num_class)[source]

Bases: torch.nn.Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

init_weights(self)[source]
forward(self, text, offsets)[source]
src.ml.pytorch_example.generate_batch(batch)[source]
src.ml.pytorch_example.train_func(model, optimizer, criterion, scheduler, sub_train_)[source]
src.ml.pytorch_example.test(model, criterion, data_)[source]
src.ml.pytorch_example.main()[source]
src.ml_helper
Submodules
src.ml_helper.model
Module Contents
Classes

Model

Epoch

Functions

init_db()

src.ml_helper.model.ENGINE_URL = postgres://user:password@localhost:5432/ml_helper[source]
src.ml_helper.model.metadata[source]
src.ml_helper.model.Base[source]
class src.ml_helper.model.Model[source]

Bases: Base

__tablename__ = model[source]
uuid[source]
id[source]
last_updated[source]
params[source]
epoch[source]
class src.ml_helper.model.Epoch[source]

Bases: Base

__tablename__ = epoch[source]
__table_args__[source]
uuid[source]
model_id[source]
created_at[source]
number[source]
training_loss[source]
eval_loss[source]
training_F1[source]
eval_F1[source]
training_acc[source]
eval_acc[source]
src.ml_helper.model.engine[source]
src.ml_helper.model.Session[source]
src.ml_helper.model.init_db()[source]
src.ml_helper.training
Module Contents
Functions

_commit_object(obj)

register_model_in_db(model_id, params)

register_epoch_in_db(model_id, epoch_number, **kwargs)

retrieve_best_model_params()

hash_parameters(params)

delete_model(model_id)

src.ml_helper.training._commit_object(obj)[source]
src.ml_helper.training.register_model_in_db(model_id, params)[source]
src.ml_helper.training.register_epoch_in_db(model_id, epoch_number, **kwargs)[source]
src.ml_helper.training.retrieve_best_model_params()[source]
src.ml_helper.training.hash_parameters(params)[source]
src.ml_helper.training.delete_model(model_id)[source]
src.utils
Submodules
src.utils.io
Module Contents
Functions

load_pretrained_embeddings(embeddings_file, embeddings_dim, word_to_ix, skip_header=False)

Load pretrained embeddings weights.

src.utils.io.load_pretrained_embeddings(embeddings_file, embeddings_dim, word_to_ix, skip_header=False)[source]

Load pretrained embeddings weights. For the words that don’t have a pre-trained embedding, we assign them a randomly initialized one. :param embeddings_file: Weights file :type embeddings_file: str :param embeddings_dim: Embeddings dim :type embeddings_dim: int :param word_to_ix: Word to index mapper :type word_to_ix: dict

Returns

pre-trained embeddings matrix

Return type

np.matrix

src.utils.text
Module Contents
Functions

isolate_punctuation(text)

Isolate punctuation in a sentence.

replace_urls(text, replace_with=’<URL>’)

Replace urls in a sentence with a chosen string.

src.utils.text.isolate_punctuation(text)[source]

Isolate punctuation in a sentence.

>>> split_punctuation('Hi there!')
'Hi there !'
Parameters

text (str) – Input sentence

Returns

Output sentence with isolated punctuation

Return type

str

src.utils.text.replace_urls(text, replace_with='<URL>')[source]

Replace urls in a sentence with a chosen string.

>>> replace_urls("I love https://github.com")
"I love <URL>"
Parameters
  • text (str) – Input sentence

  • replace_with (str, optional) – string to replace the url with. Defaults to “<URL>”.

Returns

Output sentence with replaced url

Return type

str

src.utils.vocabulary
Module Contents
Functions

make_char_to_ix()

Make a character to index dictionary.

make_word_to_ix(train_sentences, char_to_split_at=’ ‘, unk_tag=’<UNK>’)

Make a word to index dictionary

src.utils.vocabulary.make_char_to_ix()[source]

Make a character to index dictionary.

Returns

character to index

Return type

dict

src.utils.vocabulary.make_word_to_ix(train_sentences, char_to_split_at=' ', unk_tag='<UNK>')[source]

Make a word to index dictionary

Parameters
  • train_sentences (list) – list of sentences

  • char_to_split_at (str, optional) – str. Character to use to split the sentence (for tokenization). Defaults to ” “.

  • unk_tag (str, optional) – Unknown tag. Defaults to “<UNK>”.

Returns

[description]

Return type

[type]

Submodules
src.example_module

A python example module.

Module Contents
Functions

example_function()

Basic function that returns a string

src.example_module.example_function()[source]

Basic function that returns a string

1

Created with sphinx-autoapi

Indices and tables