26 Commits

Author SHA1 Message Date
Tomasz Dłuski
ee071564ef feature: dont use wait-for-it which is newline characted dependant and causes problem on windows closes #17 2022-11-02 17:34:21 +01:00
Tomasz Dłuski
83e0b0ef12 bugfix: create bucket using variable name instead of hardcoded one closes #19 2022-11-02 17:14:55 +01:00
Tomasz Dłuski
e0e8157b60 Merge pull request #18 from konstantin-frolov/master
Fix compose for connect s3 artifacts & mlflow UI
2022-10-27 20:34:28 +02:00
Konstantin Frolov
d6207174f0 Fix compose for connect s3 artifacts & mlflow UI 2022-10-26 17:06:06 +03:00
Tomasz Dłuski
9300e34d1a Update README.md 2022-05-05 14:48:58 +02:00
Tomasz Dłuski
e3b7595685 Merge pull request #14 from andife/patch-2
Update create_bucket.py
2021-12-28 18:55:46 +01:00
andife
f37ae1a227 Update create_bucket.py 2021-12-28 09:25:09 +01:00
Tomasz Dłuski
b129e9a7cc Update README.md 2021-12-03 22:22:39 +01:00
Tomasz Dłuski
57e981da9a Update README.md 2021-12-03 22:21:40 +01:00
Tomasz Dłuski
b605078792 update readme 2021-12-03 22:14:24 +01:00
Tomasz Dłuski
6e798644da s3: automatically create a bucket on startup 2021-12-03 22:11:10 +01:00
Tomasz Dłuski
4c4449110e Update README.md 2021-12-03 21:42:52 +01:00
Tomasz Dłuski
5255f67780 Update README.md 2021-12-03 21:40:14 +01:00
Tomasz Dłuski
8bba55703c Update LICENSE 2021-12-03 21:40:14 +01:00
Tomasz Dłuski
01e8abe89a add named volumes instead of mapped local directory 2021-12-03 21:34:32 +01:00
Tomasz Dłuski
d0a5dfbde0 add restart: unless-stopped 2021-12-03 21:32:42 +01:00
Tomasz Dłuski
f92d4ec230 update minio to the newest version 2021-12-03 21:29:36 +01:00
Tomasz Dłuski
2ad2c983db dont expose database to public network closes #12 2021-12-03 21:21:51 +01:00
Tomasz Dłuski
b6ecfe7d0c Merge pull request #7 from Toumash/#6
use fixed minio stable version (minio team separated console and api …
2021-07-19 12:25:36 +02:00
Tomasz Dłuski
fcd3393fa5 use fixed minio stable version (minio team separated console and api in the july feature release)
https://github.com/minio/minio/releases/tag/RELEASE.2021-07-08T01-15-01Z
2021-07-19 12:04:05 +02:00
Tomasz Dłuski
14df7c707e Merge pull request #3 from kingkastle/kingkastle-minio-methodname
minio method name is now: InvalidResponseError
2021-03-05 18:37:27 +01:00
Rafael Castillo
7506d3e43d minio method name is now: InvalidResponseError 2020-12-21 16:33:53 +01:00
Tomasz Dłuski
8f3d6ba7e2 Update README.md 2020-09-06 22:08:23 +02:00
Tomasz Dłuski
81e373d6fb Update run_create_bucket.sh 2020-09-05 16:14:53 +02:00
Tomasz Dłuski
eebc9d0c46 bugfix: adds bucket name configuration into the .env file 2020-09-05 13:58:45 +02:00
Tomasz Dłuski
bddbec77f1 adds minio autoconfigure script 2020-09-05 13:54:40 +02:00
8 changed files with 149 additions and 298 deletions

4
.env
View File

@@ -1,5 +1,5 @@
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
AWS_ACCESS_KEY_ID=admin
AWS_SECRET_ACCESS_KEY=sample_key
AWS_REGION=us-east-1
AWS_BUCKET_NAME=mlflow
MYSQL_DATABASE=mlflow

View File

@@ -1,6 +1,6 @@
MIT License
Copyright (c) 2020 Tomasz Dłuski
Copyright (c) 2021 Tomasz Dłuski
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal

View File

@@ -1,77 +1,32 @@
# MLFlow Docker Setup [![Actions Status](https://github.com/Toumash/mlflow-docker/workflows/VerifyDockerCompose/badge.svg)](https://github.com/Toumash/mlflow-docker/actions)
If you want to boot up mlflow project with one-liner - this repo is for you.
> If you want to boot up mlflow project with one-liner - this repo is for you.
> The only requirement is docker installed on your system and we are going to use Bash on linux/windows.
The only requirement is docker installed on your system and we are going to use Bash on linux/windows.
# 🚀 1-2-3! Setup guide
1. Configure `.env` file for your choice. You can put there anything you like, it will be used to configure you services
2. Run `docker compose up`
3. Open up http://localhost:5000 for MlFlow, and http://localhost:9001/ to browse your files in S3 artifact store
**👇Video tutorial how to set it up + BONUS with Microsoft Azure 👇**
[![Youtube tutorial](https://user-images.githubusercontent.com/9840635/144674240-f1ede224-410a-4b77-a7b8-450f45cc79ba.png)](https://www.youtube.com/watch?v=ma5lA19IJRA)
# Features
- Setup by one file (.env)
- Production-ready docker volumes
- Separate artifacts and data containers
- [Artifacts GUI](https://min.io/)
- Ready bash scripts to copy and paste for colleagues to use your server!
- One file setup (.env)
- Minio S3 artifact store with GUI
- MySql mlflow storage
- Ready to use bash scripts for python development!
- Automatically-created s3 buckets
## Simple setup guide
1. Configure `.env` file for your choice. You can put there anything you like, it will be used to configure you services
## How to use in ML development in python
2. Run the Infrastructure by this one line:
```shell
$ docker-compose up -d
Creating network "mlflow-basis_A" with driver "bridge"
Creating mlflow_db ... done
Creating tracker_mlflow ... done
Creating aws-s3 ... done
```
<details>
<summary>Click to show</summary>
3. Create mlflow bucket. You can do it **either using AWS CLI or Python Api**. **You dont need an AWS subscription**
<details><summary>AWS CLI</summary>
1. [Install AWS cli](https://aws.amazon.com/cli/) **Yes, i know that you dont have an Amazon Web Services Subscription - dont worry! It wont be needed!**
2. Configure AWS CLI - enter the same credentials from the `.env` file
```shell
aws configure
```
> AWS Access Key ID [****************123]: AKIAIOSFODNN7EXAMPLE
> AWS Secret Access Key [****************123]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
> Default region name [us-west-2]: us-east-1
> Default output format [json]: <ENTER>
3. Run
```shell
aws --endpoint-url=http://localhost:9000 s3 mb s3://mlflow
```
</details>
<details><summary>Python API</summary>
1. Install Minio
```shell
pip install Minio
```
2. Run this to create a bucket
```python
from minio import Minio
from minio.error import ResponseError
s3Client = Minio(
'localhost:9000',
access_key='<YOUR_AWS_ACCESSS_ID>', # copy from .env file
secret_key='<YOUR_AWS_SECRET_ACCESS_KEY>', # copy from .env file
secure=False
)
s3Client.make_bucket('mlflow')
```
</details>
---
4. Open up http://localhost:5000 for MlFlow, and http://localhost:9000/minio/mlflow/ for S3 bucket (you artifacts) with credentials from `.env` file
5. Configure your client-side
1. Configure your client-side
For running mlflow files you need various environment variables set on the client side. To generate them user the convienience script `./bashrc_install.sh`, which installs it on your system or `./bashrc_generate.sh`, which just displays the config to copy & paste.
@@ -80,7 +35,7 @@ For running mlflow files you need various environment variables set on the clien
The script installs this variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, MLFLOW_S3_ENDPOINT_URL, MLFLOW_TRACKING_URI. All of them are needed to use mlflow from the client-side.
6. Test the pipeline with below command with conda. If you dont have conda installed run with `--no-conda`
2. Test the pipeline with below command with conda. If you dont have conda installed run with `--no-conda`
```shell
mlflow run git@github.com:databricks/mlflow-example.git -P alpha=0.5
@@ -88,8 +43,16 @@ mlflow run git@github.com:databricks/mlflow-example.git -P alpha=0.5
python ./quickstart/mlflow_tracking.py
```
7. *(Optional)* If you are constantly switching your environment you can use this environment variable syntax
3. *(Optional)* If you are constantly switching your environment you can use this environment variable syntax
```shell
MLFLOW_S3_ENDPOINT_URL=http://localhost:9000 MLFLOW_TRACKING_URI=http://localhost:5000 mlflow run git@github.com:databricks/mlflow-example.git -P alpha=0.5
```
</details>
## Licensing
Copyright (c) 2021 Tomasz Dłuski
Licensed under the MIT License (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License by reviewing the file [LICENSE](./LICENSE) in the repository.

38
create_bucket.py Normal file
View File

@@ -0,0 +1,38 @@
import os
from minio import Minio
from minio.error import InvalidResponseError
accessID = os.environ.get('AWS_ACCESS_KEY_ID')
accessSecret = os.environ.get('AWS_SECRET_ACCESS_KEY')
minioUrl = os.environ.get('MLFLOW_S3_ENDPOINT_URL')
bucketName = os.environ.get('AWS_BUCKET_NAME')
if accessID == None:
print('[!] AWS_ACCESS_KEY_ID environment variable is empty! run \'source .env\' to load it from the .env file')
exit(1)
if accessSecret == None:
print('[!] AWS_SECRET_ACCESS_KEY environment variable is empty! run \'source .env\' to load it from the .env file')
exit(1)
if minioUrl == None:
print('[!] MLFLOW_S3_ENDPOINT_URL environment variable is empty! run \'source .env\' to load it from the .env file')
exit(1)
if bucketName == None:
print('[!] AWS_BUCKET_NAME environment variable is empty! run \'source .env\' to load it from the .env file')
exit(1)
minioUrlHostWithPort = minioUrl.split('//')[1]
print('[*] minio url: ',minioUrlHostWithPort)
s3Client = Minio(
minioUrlHostWithPort,
access_key=accessID,
secret_key=accessSecret,
secure=False
)
s3Client.make_bucket(bucketName)

View File

@@ -1,50 +1,79 @@
version: '3.2'
version: "3.9"
services:
s3:
image: minio/minio:latest
container_name: aws-s3
image: minio/minio:RELEASE.2021-11-24T23-19-33Z
restart: unless-stopped
ports:
- 9000:9000
- "9000:9000"
- "9001:9001"
environment:
- MINIO_ACCESS_KEY=${AWS_ACCESS_KEY_ID}
- MINIO_SECRET_KEY=${AWS_SECRET_ACCESS_KEY}
command:
server /date
- MINIO_ROOT_USER=${AWS_ACCESS_KEY_ID}
- MINIO_ROOT_PASSWORD=${AWS_SECRET_ACCESS_KEY}
command: server /data --console-address ":9001"
networks:
- A
- internal
- public
volumes:
- ./s3:/date
- minio_volume:/data
db:
restart: always
image: mysql/mysql-server:5.7.28
container_name: mlflow_db
expose:
- "3306"
environment:
- MYSQL_DATABASE=${MYSQL_DATABASE}
- MYSQL_USER=${MYSQL_USER}
- MYSQL_PASSWORD=${MYSQL_PASSWORD}
- MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD}
volumes:
- ./dbdata:/var/lib/mysql
networks:
- A
image: mysql/mysql-server:5.7.28
restart: unless-stopped
container_name: mlflow_db
expose:
- "3306"
environment:
- MYSQL_DATABASE=${MYSQL_DATABASE}
- MYSQL_USER=${MYSQL_USER}
- MYSQL_PASSWORD=${MYSQL_PASSWORD}
- MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD}
volumes:
- db_volume:/var/lib/mysql
networks:
- internal
mlflow:
container_name: tracker_mlflow
image: tracker_ml
build:
context: ./mlflow
dockerfile: Dockerfile
ports:
- "5000:5000"
environment:
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
- AWS_DEFAULT_REGION=${AWS_REGION}
- MLFLOW_S3_ENDPOINT_URL=http://s3:9000
networks:
- A
entrypoint: ./wait-for-it.sh db:3306 -t 90 -- mlflow server --backend-store-uri mysql+pymysql://${MYSQL_USER}:${MYSQL_PASSWORD}@db:3306/${MYSQL_DATABASE} --default-artifact-root s3://${AWS_BUCKET_NAME}/ -h 0.0.0.0
container_name: tracker_mlflow
image: tracker_ml
restart: unless-stopped
build:
context: ./mlflow
dockerfile: Dockerfile
ports:
- "5000:5000"
environment:
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
- AWS_DEFAULT_REGION=${AWS_REGION}
- MLFLOW_S3_ENDPOINT_URL=http://s3:9000
networks:
- public
- internal
entrypoint: mlflow server --backend-store-uri mysql+pymysql://${MYSQL_USER}:${MYSQL_PASSWORD}@db:3306/${MYSQL_DATABASE} --default-artifact-root s3://${AWS_BUCKET_NAME}/ --artifacts-destination s3://${AWS_BUCKET_NAME}/ -h 0.0.0.0
depends_on:
wait-for-db:
condition: service_completed_successfully
create_s3_buckets:
image: minio/mc
depends_on:
- "s3"
entrypoint: >
/bin/sh -c "
until (/usr/bin/mc alias set minio http://s3:9000 '${AWS_ACCESS_KEY_ID}' '${AWS_SECRET_ACCESS_KEY}') do echo '...waiting...' && sleep 1; done;
/usr/bin/mc mb minio/${AWS_BUCKET_NAME};
exit 0;
"
networks:
- internal
wait-for-db:
image: atkrad/wait4x
depends_on:
- db
command: tcp db:3306 -t 90s -i 250ms
networks:
- internal
networks:
A:
driver: bridge
internal:
public:
driver: bridge
volumes:
db_volume:
minio_volume:

View File

@@ -1,10 +1,7 @@
FROM continuumio/miniconda3:latest
RUN pip install mlflow boto3 pymysql
ADD . /app
WORKDIR /app
COPY wait-for-it.sh wait-for-it.sh
RUN chmod +x wait-for-it.sh
RUN pip install mlflow boto3 pymysql

View File

@@ -1,182 +0,0 @@
#!/usr/bin/env bash
# Use this script to test if a given TCP host/port are available
WAITFORIT_cmdname=${0##*/}
echoerr() { if [[ $WAITFORIT_QUIET -ne 1 ]]; then echo "$@" 1>&2; fi }
usage()
{
cat << USAGE >&2
Usage:
$WAITFORIT_cmdname host:port [-s] [-t timeout] [-- command args]
-h HOST | --host=HOST Host or IP under test
-p PORT | --port=PORT TCP port under test
Alternatively, you specify the host and port as host:port
-s | --strict Only execute subcommand if the test succeeds
-q | --quiet Don't output any status messages
-t TIMEOUT | --timeout=TIMEOUT
Timeout in seconds, zero for no timeout
-- COMMAND ARGS Execute command with args after the test finishes
USAGE
exit 1
}
wait_for()
{
if [[ $WAITFORIT_TIMEOUT -gt 0 ]]; then
echoerr "$WAITFORIT_cmdname: waiting $WAITFORIT_TIMEOUT seconds for $WAITFORIT_HOST:$WAITFORIT_PORT"
else
echoerr "$WAITFORIT_cmdname: waiting for $WAITFORIT_HOST:$WAITFORIT_PORT without a timeout"
fi
WAITFORIT_start_ts=$(date +%s)
while :
do
if [[ $WAITFORIT_ISBUSY -eq 1 ]]; then
nc -z $WAITFORIT_HOST $WAITFORIT_PORT
WAITFORIT_result=$?
else
(echo -n > /dev/tcp/$WAITFORIT_HOST/$WAITFORIT_PORT) >/dev/null 2>&1
WAITFORIT_result=$?
fi
if [[ $WAITFORIT_result -eq 0 ]]; then
WAITFORIT_end_ts=$(date +%s)
echoerr "$WAITFORIT_cmdname: $WAITFORIT_HOST:$WAITFORIT_PORT is available after $((WAITFORIT_end_ts - WAITFORIT_start_ts)) seconds"
break
fi
sleep 1
done
return $WAITFORIT_result
}
wait_for_wrapper()
{
# In order to support SIGINT during timeout: http://unix.stackexchange.com/a/57692
if [[ $WAITFORIT_QUIET -eq 1 ]]; then
timeout $WAITFORIT_BUSYTIMEFLAG $WAITFORIT_TIMEOUT $0 --quiet --child --host=$WAITFORIT_HOST --port=$WAITFORIT_PORT --timeout=$WAITFORIT_TIMEOUT &
else
timeout $WAITFORIT_BUSYTIMEFLAG $WAITFORIT_TIMEOUT $0 --child --host=$WAITFORIT_HOST --port=$WAITFORIT_PORT --timeout=$WAITFORIT_TIMEOUT &
fi
WAITFORIT_PID=$!
trap "kill -INT -$WAITFORIT_PID" INT
wait $WAITFORIT_PID
WAITFORIT_RESULT=$?
if [[ $WAITFORIT_RESULT -ne 0 ]]; then
echoerr "$WAITFORIT_cmdname: timeout occurred after waiting $WAITFORIT_TIMEOUT seconds for $WAITFORIT_HOST:$WAITFORIT_PORT"
fi
return $WAITFORIT_RESULT
}
# process arguments
while [[ $# -gt 0 ]]
do
case "$1" in
*:* )
WAITFORIT_hostport=(${1//:/ })
WAITFORIT_HOST=${WAITFORIT_hostport[0]}
WAITFORIT_PORT=${WAITFORIT_hostport[1]}
shift 1
;;
--child)
WAITFORIT_CHILD=1
shift 1
;;
-q | --quiet)
WAITFORIT_QUIET=1
shift 1
;;
-s | --strict)
WAITFORIT_STRICT=1
shift 1
;;
-h)
WAITFORIT_HOST="$2"
if [[ $WAITFORIT_HOST == "" ]]; then break; fi
shift 2
;;
--host=*)
WAITFORIT_HOST="${1#*=}"
shift 1
;;
-p)
WAITFORIT_PORT="$2"
if [[ $WAITFORIT_PORT == "" ]]; then break; fi
shift 2
;;
--port=*)
WAITFORIT_PORT="${1#*=}"
shift 1
;;
-t)
WAITFORIT_TIMEOUT="$2"
if [[ $WAITFORIT_TIMEOUT == "" ]]; then break; fi
shift 2
;;
--timeout=*)
WAITFORIT_TIMEOUT="${1#*=}"
shift 1
;;
--)
shift
WAITFORIT_CLI=("$@")
break
;;
--help)
usage
;;
*)
echoerr "Unknown argument: $1"
usage
;;
esac
done
if [[ "$WAITFORIT_HOST" == "" || "$WAITFORIT_PORT" == "" ]]; then
echoerr "Error: you need to provide a host and port to test."
usage
fi
WAITFORIT_TIMEOUT=${WAITFORIT_TIMEOUT:-15}
WAITFORIT_STRICT=${WAITFORIT_STRICT:-0}
WAITFORIT_CHILD=${WAITFORIT_CHILD:-0}
WAITFORIT_QUIET=${WAITFORIT_QUIET:-0}
# Check to see if timeout is from busybox?
WAITFORIT_TIMEOUT_PATH=$(type -p timeout)
WAITFORIT_TIMEOUT_PATH=$(realpath $WAITFORIT_TIMEOUT_PATH 2>/dev/null || readlink -f $WAITFORIT_TIMEOUT_PATH)
WAITFORIT_BUSYTIMEFLAG=""
if [[ $WAITFORIT_TIMEOUT_PATH =~ "busybox" ]]; then
WAITFORIT_ISBUSY=1
# Check if busybox timeout uses -t flag
# (recent Alpine versions don't support -t anymore)
if timeout &>/dev/stdout | grep -q -e '-t '; then
WAITFORIT_BUSYTIMEFLAG="-t"
fi
else
WAITFORIT_ISBUSY=0
fi
if [[ $WAITFORIT_CHILD -gt 0 ]]; then
wait_for
WAITFORIT_RESULT=$?
exit $WAITFORIT_RESULT
else
if [[ $WAITFORIT_TIMEOUT -gt 0 ]]; then
wait_for_wrapper
WAITFORIT_RESULT=$?
else
wait_for
WAITFORIT_RESULT=$?
fi
fi
if [[ $WAITFORIT_CLI != "" ]]; then
if [[ $WAITFORIT_RESULT -ne 0 && $WAITFORIT_STRICT -eq 1 ]]; then
echoerr "$WAITFORIT_cmdname: strict mode, refusing to execute subprocess"
exit $WAITFORIT_RESULT
fi
exec "${WAITFORIT_CLI[@]}"
else
exit $WAITFORIT_RESULT
fi

6
run_create_bucket.sh Normal file
View File

@@ -0,0 +1,6 @@
#!/bin/bash
set -o allexport; source .env; set +o allexport
pip3 install Minio
python3 ./create_bucket.py