Data Stack¶
1. Presentation¶
A sample data stack running on Docker, that contains the following components:
MariaDB, with PHPMyAdmin
Postgres, with PHPPgAdmin
Doccano data labelling interface
Nginx as reverse proxy
Sphinx auto-generated documentation
A template python module, usable in Airflow DAGS
A template machine learning package, using Pytorch
A
ml_helper
package, that provides functions to store machine learning models results and parameters in a database.A
utils
package with utilities functions.Unit-testing with pytest library
2. Installation¶
You will need to have the following software installed:
Once you’re good, create a virtual environment in install the pre-requisite python libraries:
virtualenv venv;
source venv/bin/activate;
pip install -r requirements.txt;
3. Usage¶
3.1 Launch the Docker stack¶
Run it with:
docker-compose up -d
Then visit:
localhost:3000: for Metabase
localhost:8080: for Airflow
localhost:8000: for Doccano
Add your Airflow DAGS in the dags folder.
3.3 Generating the Sphinx docs¶
Generate the Sphinx documentation with:
sphinx-apidoc ./src -o docs/source -M;
cd docs && make html && open build/html/index.html;
4. References¶
Contents: