1 Commits

Author SHA1 Message Date
Tomasz Dłuski
6a93c977e4 provide caddyfile basic auth for the mlflow 2021-11-19 22:16:07 +01:00
11 changed files with 364 additions and 158 deletions

4
.env
View File

@@ -1,5 +1,5 @@
AWS_ACCESS_KEY_ID=admin AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=sample_key AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
AWS_REGION=us-east-1 AWS_REGION=us-east-1
AWS_BUCKET_NAME=mlflow AWS_BUCKET_NAME=mlflow
MYSQL_DATABASE=mlflow MYSQL_DATABASE=mlflow

22
Caddyfile Normal file
View File

@@ -0,0 +1,22 @@
# Minio Console
s3.localhost:9001 {
handle_path /* {
reverse_proxy s3:9001
}
}
# Minio API
s3.localhost:9000 {
handle_path /* {
reverse_proxy s3:9000
}
}
mlflow.localhost {
basicauth /* {
root JDJhJDEwJEVCNmdaNEg2Ti5iejRMYkF3MFZhZ3VtV3E1SzBWZEZ5Q3VWc0tzOEJwZE9TaFlZdEVkZDhX # root hiccup
}
handle_path /* {
reverse_proxy mlflow:5000
}
}

View File

@@ -1,6 +1,6 @@
MIT License MIT License
Copyright (c) 2021 Tomasz Dłuski Copyright (c) 2020 Tomasz Dłuski
Permission is hereby granted, free of charge, to any person obtaining a copy Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal of this software and associated documentation files (the "Software"), to deal

106
README.md
View File

@@ -1,41 +1,95 @@
# MLFlow Docker Setup [![Actions Status](https://github.com/Toumash/mlflow-docker/workflows/VerifyDockerCompose/badge.svg)](https://github.com/Toumash/mlflow-docker/actions) # MLFlow Docker Setup [![Actions Status](https://github.com/Toumash/mlflow-docker/workflows/VerifyDockerCompose/badge.svg)](https://github.com/Toumash/mlflow-docker/actions)
> If you want to boot up mlflow project with one-liner - this repo is for you. If you want to boot up mlflow project with one-liner - this repo is for you.
> The only requirement is docker installed on your system and we are going to use Bash on linux/windows.
# 🚀 1-2-3! Setup guide The only requirement is docker installed on your system and we are going to use Bash on linux/windows.
1. Configure `.env` file for your choice. You can put there anything you like, it will be used to configure you services
2. Run `docker compose up`
3. Open up http://localhost:5000 for MlFlow, and http://localhost:9001/ to browse your files in S3 artifact store
[![Youtube tutorial](https://img.youtube.com/vi/ma5lA19IJRA/0.jpg)](https://www.youtube.com/watch?v=ma5lA19IJRA)
**👇Video tutorial how to set it up + BONUS with Microsoft Azure 👇**
[![Youtube tutorial](https://user-images.githubusercontent.com/9840635/144674240-f1ede224-410a-4b77-a7b8-450f45cc79ba.png)](https://www.youtube.com/watch?v=ma5lA19IJRA)
# Features # Features
- One file setup (.env) - Setup by one file (.env)
- Minio S3 artifact store with GUI - Production-ready docker volumes
- MySql mlflow storage - Separate artifacts and data containers
- Ready to use bash scripts for python development! - [Artifacts GUI](https://min.io/)
- Automatically-created s3 buckets - Ready bash scripts to copy and paste for colleagues to use your server!
## How to use in ML development in python ## Simple setup guide
1. Configure `.env` file for your choice. You can put there anything you like, it will be used to configure you services
<details> 2. Run the Infrastructure by this one line:
<summary>Click to show</summary> ```shell
$ docker-compose up -d
Creating network "mlflow-basis_A" with driver "bridge"
Creating mlflow_db ... done
Creating tracker_mlflow ... done
Creating aws-s3 ... done
```
1. Configure your client-side 3. Create mlflow bucket. You can use my bundled script.
For running mlflow files you need various environment variables set on the client side. To generate them use the convienience script `./bashrc_install.sh`, which installs it on your system or `./bashrc_generate.sh`, which just displays the config to copy & paste. Just run
```shell
bash ./run_create_bucket.sh
```
You can also do it **either using AWS CLI or Python Api**.
<details><summary>AWS CLI</summary>
1. [Install AWS cli](https://aws.amazon.com/cli/) **Yes, i know that you dont have an Amazon Web Services Subscription - dont worry! It wont be needed!**
2. Configure AWS CLI - enter the same credentials from the `.env` file
```shell
aws configure
```
> AWS Access Key ID [****************123]: AKIAIOSFODNN7EXAMPLE
> AWS Secret Access Key [****************123]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
> Default region name [us-west-2]: us-east-1
> Default output format [json]: <ENTER>
3. Run
```shell
aws --endpoint-url=http://localhost:9000 s3 mb s3://mlflow
```
</details>
<details><summary>Python API</summary>
1. Install Minio
```shell
pip install Minio
```
2. Run this to create a bucket
```python
from minio import Minio
from minio.error import ResponseError
s3Client = Minio(
'localhost:9000',
access_key='<YOUR_AWS_ACCESSS_ID>', # copy from .env file
secret_key='<YOUR_AWS_SECRET_ACCESS_KEY>', # copy from .env file
secure=False
)
s3Client.make_bucket('mlflow')
```
</details>
---
4. Open up http://localhost:5000 for MlFlow, and http://localhost:9000/minio/mlflow/ for S3 bucket (you artifacts) with credentials from `.env` file
5. Configure your client-side
For running mlflow files you need various environment variables set on the client side. To generate them user the convienience script `./bashrc_install.sh`, which installs it on your system or `./bashrc_generate.sh`, which just displays the config to copy & paste.
> $ ./bashrc_install.sh > $ ./bashrc_install.sh
> [ OK ] Successfully installed environment variables into your .bashrc! > [ OK ] Successfully installed environment variables into your .bashrc!
The script installs this variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, MLFLOW_S3_ENDPOINT_URL, MLFLOW_TRACKING_URI. All of them are needed to use mlflow from the client-side. The script installs this variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, MLFLOW_S3_ENDPOINT_URL, MLFLOW_TRACKING_URI. All of them are needed to use mlflow from the client-side.
2. Test the pipeline with below command with conda. If you dont have conda installed run with `--no-conda` 6. Test the pipeline with below command with conda. If you dont have conda installed run with `--no-conda`
```shell ```shell
mlflow run git@github.com:databricks/mlflow-example.git -P alpha=0.5 mlflow run git@github.com:databricks/mlflow-example.git -P alpha=0.5
@@ -43,16 +97,8 @@ mlflow run git@github.com:databricks/mlflow-example.git -P alpha=0.5
python ./quickstart/mlflow_tracking.py python ./quickstart/mlflow_tracking.py
``` ```
3. *(Optional)* If you are constantly switching your environment you can use this environment variable syntax 7. *(Optional)* If you are constantly switching your environment you can use this environment variable syntax
```shell ```shell
MLFLOW_S3_ENDPOINT_URL=http://localhost:9000 MLFLOW_TRACKING_URI=http://localhost:5000 mlflow run git@github.com:databricks/mlflow-example.git -P alpha=0.5 MLFLOW_S3_ENDPOINT_URL=http://localhost:9000 MLFLOW_TRACKING_URI=http://localhost:5000 mlflow run git@github.com:databricks/mlflow-example.git -P alpha=0.5
``` ```
</details>
## Licensing
Copyright (c) 2021 Tomasz Dłuski
Licensed under the MIT License (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License by reviewing the file [LICENSE](./LICENSE) in the repository.

View File

@@ -9,20 +9,20 @@ minioUrl = os.environ.get('MLFLOW_S3_ENDPOINT_URL')
bucketName = os.environ.get('AWS_BUCKET_NAME') bucketName = os.environ.get('AWS_BUCKET_NAME')
if accessID == None: if accessID == None:
print('[!] AWS_ACCESS_KEY_ID environment variable is empty! run \'source .env\' to load it from the .env file') print('[!] AWS_ACCESS_KEY_ID environemnt variable is empty! run \'source .env\' to load it from the .env file')
exit(1) exit(1)
if accessSecret == None: if accessSecret == None:
print('[!] AWS_SECRET_ACCESS_KEY environment variable is empty! run \'source .env\' to load it from the .env file') print('[!] AWS_SECRET_ACCESS_KEY environemnt variable is empty! run \'source .env\' to load it from the .env file')
exit(1) exit(1)
if minioUrl == None: if minioUrl == None:
print('[!] MLFLOW_S3_ENDPOINT_URL environment variable is empty! run \'source .env\' to load it from the .env file') print('[!] MLFLOW_S3_ENDPOINT_URL environemnt variable is empty! run \'source .env\' to load it from the .env file')
exit(1) exit(1)
if bucketName == None: if bucketName == None:
print('[!] AWS_BUCKET_NAME environment variable is empty! run \'source .env\' to load it from the .env file') print('[!] AWS_BUCKET_NAME environemnt variable is empty! run \'source .env\' to load it from the .env file')
exit(1) exit(1)
minioUrlHostWithPort = minioUrl.split('//')[1] minioUrlHostWithPort = minioUrl.split('//')[1]

View File

@@ -1,23 +1,38 @@
version: "3.9" version: '3.2'
services: services:
s3: caddy:
image: minio/minio:RELEASE.2023-11-01T18-37-25Z image: caddy:2-alpine
restart: unless-stopped container_name: caddy
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile
- /caddy/data:/data
- /caddy/config:/config
ports: ports:
- "9000:9000" - 80:80
- "9001:9001" - 443:443
- 9000:9000
- 9001:9001
restart: unless-stopped
s3:
restart: always
image: minio/minio:latest
container_name: aws-s3
ports:
- 9000
- 9001
environment: environment:
- MINIO_ROOT_USER=${AWS_ACCESS_KEY_ID} - MINIO_ROOT_USER=${AWS_ACCESS_KEY_ID}
- MINIO_ROOT_PASSWORD=${AWS_SECRET_ACCESS_KEY} - MINIO_ROOT_PASSWORD=${AWS_SECRET_ACCESS_KEY}
command: server /data --console-address ":9001" command:
networks: server /date --console-address ":9001"
- internal
- public
volumes: volumes:
- minio_new_volume:/data - ./s3:/date
networks:
- default
- proxy-net
db: db:
image: mysql:8-oracle # -oracle tag supports arm64 architecture! restart: always
restart: unless-stopped image: mysql/mysql-server:5.7.28
container_name: mlflow_db container_name: mlflow_db
expose: expose:
- "3306" - "3306"
@@ -27,13 +42,16 @@ services:
- MYSQL_PASSWORD=${MYSQL_PASSWORD} - MYSQL_PASSWORD=${MYSQL_PASSWORD}
- MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD} - MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD}
volumes: volumes:
- db_new_volume:/var/lib/mysql - ./dbdata:/var/lib/mysql
networks: networks:
- internal - default
mlflow: mlflow:
image: ubuntu/mlflow:2.1.1_1.0-22.04 restart: always
container_name: tracker_mlflow container_name: tracker_mlflow
restart: unless-stopped image: tracker_ml
build:
context: ./mlflow
dockerfile: Dockerfile
ports: ports:
- "5000:5000" - "5000:5000"
environment: environment:
@@ -41,55 +59,11 @@ services:
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} - AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
- AWS_DEFAULT_REGION=${AWS_REGION} - AWS_DEFAULT_REGION=${AWS_REGION}
- MLFLOW_S3_ENDPOINT_URL=http://s3:9000 - MLFLOW_S3_ENDPOINT_URL=http://s3:9000
entrypoint: mlflow server --backend-store-uri mysql+pymysql://${MYSQL_USER}:${MYSQL_PASSWORD}@db:3306/${MYSQL_DATABASE} --default-artifact-root s3://${AWS_BUCKET_NAME}/ -h 0.0.0.0
networks: networks:
- public - proxy-net
- internal - default
entrypoint: mlflow server --backend-store-uri mysql+pymysql://${MYSQL_USER}:${MYSQL_PASSWORD}@db:3306/${MYSQL_DATABASE} --default-artifact-root s3://${AWS_BUCKET_NAME}/ --artifacts-destination s3://${AWS_BUCKET_NAME}/ -h 0.0.0.0
depends_on:
wait-for-db:
condition: service_completed_successfully
create_s3_buckets:
image: minio/mc
depends_on:
- "s3"
entrypoint: >
/bin/sh -c "
until (/usr/bin/mc alias set minio http://s3:9000 '${AWS_ACCESS_KEY_ID}' '${AWS_SECRET_ACCESS_KEY}') do echo '...waiting...' && sleep 1; done;
/usr/bin/mc mb minio/${AWS_BUCKET_NAME};
exit 0;
"
networks:
- internal
wait-for-db:
image: atkrad/wait4x
depends_on:
- db
command: tcp db:3306 -t 90s -i 250ms
networks:
- internal
run_test_experiment:
build:
context: ./test_experiment
dockerfile: Dockerfile
depends_on:
- "mlflow"
environment:
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
- AWS_DEFAULT_REGION=${AWS_REGION}
- MLFLOW_S3_ENDPOINT_URL=http://s3:9000
- MLFLOW_TRACKING_URI=http://mlflow:5000
entrypoint: >
/bin/sh -c "
python3 mlflow_tracking.py;
exit 0;
"
networks:
- internal
networks: networks:
internal: default:
public: proxy-net:
driver: bridge
volumes:
db_new_volume:
minio_new_volume:

10
mlflow/Dockerfile Normal file
View File

@@ -0,0 +1,10 @@
FROM continuumio/miniconda3:latest
ADD . /app
WORKDIR /app
COPY wait-for-it.sh wait-for-it.sh
RUN chmod +x wait-for-it.sh
RUN pip install mlflow boto3 pymysql

182
mlflow/wait-for-it.sh Normal file
View File

@@ -0,0 +1,182 @@
#!/usr/bin/env bash
# Use this script to test if a given TCP host/port are available
WAITFORIT_cmdname=${0##*/}
echoerr() { if [[ $WAITFORIT_QUIET -ne 1 ]]; then echo "$@" 1>&2; fi }
usage()
{
cat << USAGE >&2
Usage:
$WAITFORIT_cmdname host:port [-s] [-t timeout] [-- command args]
-h HOST | --host=HOST Host or IP under test
-p PORT | --port=PORT TCP port under test
Alternatively, you specify the host and port as host:port
-s | --strict Only execute subcommand if the test succeeds
-q | --quiet Don't output any status messages
-t TIMEOUT | --timeout=TIMEOUT
Timeout in seconds, zero for no timeout
-- COMMAND ARGS Execute command with args after the test finishes
USAGE
exit 1
}
wait_for()
{
if [[ $WAITFORIT_TIMEOUT -gt 0 ]]; then
echoerr "$WAITFORIT_cmdname: waiting $WAITFORIT_TIMEOUT seconds for $WAITFORIT_HOST:$WAITFORIT_PORT"
else
echoerr "$WAITFORIT_cmdname: waiting for $WAITFORIT_HOST:$WAITFORIT_PORT without a timeout"
fi
WAITFORIT_start_ts=$(date +%s)
while :
do
if [[ $WAITFORIT_ISBUSY -eq 1 ]]; then
nc -z $WAITFORIT_HOST $WAITFORIT_PORT
WAITFORIT_result=$?
else
(echo -n > /dev/tcp/$WAITFORIT_HOST/$WAITFORIT_PORT) >/dev/null 2>&1
WAITFORIT_result=$?
fi
if [[ $WAITFORIT_result -eq 0 ]]; then
WAITFORIT_end_ts=$(date +%s)
echoerr "$WAITFORIT_cmdname: $WAITFORIT_HOST:$WAITFORIT_PORT is available after $((WAITFORIT_end_ts - WAITFORIT_start_ts)) seconds"
break
fi
sleep 1
done
return $WAITFORIT_result
}
wait_for_wrapper()
{
# In order to support SIGINT during timeout: http://unix.stackexchange.com/a/57692
if [[ $WAITFORIT_QUIET -eq 1 ]]; then
timeout $WAITFORIT_BUSYTIMEFLAG $WAITFORIT_TIMEOUT $0 --quiet --child --host=$WAITFORIT_HOST --port=$WAITFORIT_PORT --timeout=$WAITFORIT_TIMEOUT &
else
timeout $WAITFORIT_BUSYTIMEFLAG $WAITFORIT_TIMEOUT $0 --child --host=$WAITFORIT_HOST --port=$WAITFORIT_PORT --timeout=$WAITFORIT_TIMEOUT &
fi
WAITFORIT_PID=$!
trap "kill -INT -$WAITFORIT_PID" INT
wait $WAITFORIT_PID
WAITFORIT_RESULT=$?
if [[ $WAITFORIT_RESULT -ne 0 ]]; then
echoerr "$WAITFORIT_cmdname: timeout occurred after waiting $WAITFORIT_TIMEOUT seconds for $WAITFORIT_HOST:$WAITFORIT_PORT"
fi
return $WAITFORIT_RESULT
}
# process arguments
while [[ $# -gt 0 ]]
do
case "$1" in
*:* )
WAITFORIT_hostport=(${1//:/ })
WAITFORIT_HOST=${WAITFORIT_hostport[0]}
WAITFORIT_PORT=${WAITFORIT_hostport[1]}
shift 1
;;
--child)
WAITFORIT_CHILD=1
shift 1
;;
-q | --quiet)
WAITFORIT_QUIET=1
shift 1
;;
-s | --strict)
WAITFORIT_STRICT=1
shift 1
;;
-h)
WAITFORIT_HOST="$2"
if [[ $WAITFORIT_HOST == "" ]]; then break; fi
shift 2
;;
--host=*)
WAITFORIT_HOST="${1#*=}"
shift 1
;;
-p)
WAITFORIT_PORT="$2"
if [[ $WAITFORIT_PORT == "" ]]; then break; fi
shift 2
;;
--port=*)
WAITFORIT_PORT="${1#*=}"
shift 1
;;
-t)
WAITFORIT_TIMEOUT="$2"
if [[ $WAITFORIT_TIMEOUT == "" ]]; then break; fi
shift 2
;;
--timeout=*)
WAITFORIT_TIMEOUT="${1#*=}"
shift 1
;;
--)
shift
WAITFORIT_CLI=("$@")
break
;;
--help)
usage
;;
*)
echoerr "Unknown argument: $1"
usage
;;
esac
done
if [[ "$WAITFORIT_HOST" == "" || "$WAITFORIT_PORT" == "" ]]; then
echoerr "Error: you need to provide a host and port to test."
usage
fi
WAITFORIT_TIMEOUT=${WAITFORIT_TIMEOUT:-15}
WAITFORIT_STRICT=${WAITFORIT_STRICT:-0}
WAITFORIT_CHILD=${WAITFORIT_CHILD:-0}
WAITFORIT_QUIET=${WAITFORIT_QUIET:-0}
# Check to see if timeout is from busybox?
WAITFORIT_TIMEOUT_PATH=$(type -p timeout)
WAITFORIT_TIMEOUT_PATH=$(realpath $WAITFORIT_TIMEOUT_PATH 2>/dev/null || readlink -f $WAITFORIT_TIMEOUT_PATH)
WAITFORIT_BUSYTIMEFLAG=""
if [[ $WAITFORIT_TIMEOUT_PATH =~ "busybox" ]]; then
WAITFORIT_ISBUSY=1
# Check if busybox timeout uses -t flag
# (recent Alpine versions don't support -t anymore)
if timeout &>/dev/stdout | grep -q -e '-t '; then
WAITFORIT_BUSYTIMEFLAG="-t"
fi
else
WAITFORIT_ISBUSY=0
fi
if [[ $WAITFORIT_CHILD -gt 0 ]]; then
wait_for
WAITFORIT_RESULT=$?
exit $WAITFORIT_RESULT
else
if [[ $WAITFORIT_TIMEOUT -gt 0 ]]; then
wait_for_wrapper
WAITFORIT_RESULT=$?
else
wait_for
WAITFORIT_RESULT=$?
fi
fi
if [[ $WAITFORIT_CLI != "" ]]; then
if [[ $WAITFORIT_RESULT -ne 0 && $WAITFORIT_STRICT -eq 1 ]]; then
echoerr "$WAITFORIT_cmdname: strict mode, refusing to execute subprocess"
exit $WAITFORIT_RESULT
fi
exec "${WAITFORIT_CLI[@]}"
else
exit $WAITFORIT_RESULT
fi

View File

@@ -1,22 +1,22 @@
import os import os
from random import random, randint from random import random, randint
import mlflow from mlflow import mlflow,log_metric, log_param, log_artifacts
if __name__ == "__main__": if __name__ == "__main__":
with mlflow.start_run() as run: with mlflow.start_run() as run:
mlflow.set_tracking_uri('http://localhost:5000') mlflow.set_tracking_uri('https://mlflow.localhost')
print("Running mlflow_tracking.py") print("Running mlflow_tracking.py")
mlflow.log_param("param1", randint(0, 100)) log_param("param1", randint(0, 100))
mlflow.log_metric("foo", random()) log_metric("foo", random())
mlflow.log_metric("foo", random() + 1) log_metric("foo", random() + 1)
mlflow.log_metric("foo", random() + 2) log_metric("foo", random() + 2)
if not os.path.exists("outputs"): if not os.path.exists("outputs"):
os.makedirs("outputs") os.makedirs("outputs")
with open("outputs/test.txt", "w") as f: with open("outputs/test.txt", "w") as f:
f.write("hello world!") f.write("hello world!")
mlflow.log_artifacts("outputs") log_artifacts("outputs")

View File

@@ -1,6 +0,0 @@
FROM continuumio/miniconda3:latest
RUN pip install mlflow boto3
WORKDIR /app
COPY . .

View File

@@ -1,22 +0,0 @@
import os
from random import random, randint
import mlflow
if __name__ == "__main__":
with mlflow.start_run() as run:
mlflow.set_tracking_uri('http://mlflow:5000')
print("Running mlflow_tracking.py")
mlflow.log_param("param1", randint(0, 100))
mlflow.log_metric("foo", random())
mlflow.log_metric("foo", random() + 1)
mlflow.log_metric("foo", random() + 2)
if not os.path.exists("outputs"):
os.makedirs("outputs")
with open("outputs/test.txt", "w") as f:
f.write("hello world!")
mlflow.log_artifacts("outputs")