provide caddyfile basic auth for the mlflow

2025-11-05 07:39:22 +01:00 · 2021-11-19 22:16:07 +01:00
11 changed files with 364 additions and 158 deletions
--- a/.env
+++ b/.env
@@ -1,5 +1,5 @@
-AWS_ACCESS_KEY_ID=admin
+AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
-AWS_SECRET_ACCESS_KEY=sample_key
+AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
 AWS_REGION=us-east-1
 AWS_BUCKET_NAME=mlflow
 MYSQL_DATABASE=mlflow
--- a/22
+++ b/22
@@ -0,0 +1,22 @@
 # Minio Console
 s3.localhost:9001 {
    handle_path /* {
        reverse_proxy s3:9001
    }
 }
 # Minio API
 s3.localhost:9000 {
    handle_path /* {
        reverse_proxy s3:9000
    }
 }
 mlflow.localhost {
    basicauth /* {
 	    root JDJhJDEwJEVCNmdaNEg2Ti5iejRMYkF3MFZhZ3VtV3E1SzBWZEZ5Q3VWc0tzOEJwZE9TaFlZdEVkZDhX # root hiccup
    }
    handle_path /* {
        reverse_proxy mlflow:5000
    }
 }
--- a/2
+++ b/2
@@ -1,6 +1,6 @@
 MIT License
-Copyright (c) 2021 Tomasz Dłuski
+Copyright (c) 2020 Tomasz Dłuski
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
--- a/README.md
+++ b/README.md
@@ -1,41 +1,95 @@
 # MLFlow Docker Setup [![Actions Status](https://github.com/Toumash/mlflow-docker/workflows/VerifyDockerCompose/badge.svg)](https://github.com/Toumash/mlflow-docker/actions)
-> If you want to boot up mlflow project with one-liner - this repo is for you. 
+If you want to boot up mlflow project with one-liner - this repo is for you. 
 > The only requirement is docker installed on your system and we are going to use Bash on linux/windows.
-# 🚀 1-2-3! Setup guide 
+The only requirement is docker installed on your system and we are going to use Bash on linux/windows.
 1. Configure `.env` file for your choice. You can put there anything you like, it will be used to configure you services
 2. Run `docker compose up`
 3. Open up http://localhost:5000 for MlFlow, and http://localhost:9001/ to browse your files in S3 artifact store
-
+[![Youtube tutorial](https://img.youtube.com/vi/ma5lA19IJRA/0.jpg)](https://www.youtube.com/watch?v=ma5lA19IJRA)
 **👇Video tutorial how to set it up + BONUS with Microsoft Azure 👇**
 [![Youtube tutorial](https://user-images.githubusercontent.com/9840635/144674240-f1ede224-410a-4b77-a7b8-450f45cc79ba.png)](https://www.youtube.com/watch?v=ma5lA19IJRA)
 # Features
- - One file setup (.env)
+ - Setup by one file (.env)
- - Minio S3 artifact store with GUI
+ - Production-ready docker volumes
- - MySql mlflow storage
+ - Separate artifacts and data containers
- - Ready to use bash scripts for python development!
+ - [Artifacts GUI](https://min.io/)
- - Automatically-created s3 buckets
+ - Ready bash scripts to copy and paste for colleagues to use your server!
-## How to use in ML development in python
+## Simple setup guide
 1. Configure `.env` file for your choice. You can put there anything you like, it will be used to configure you services
-<details>
+2. Run the Infrastructure by this one line:
-<summary>Click to show</summary>
+```shell
 $ docker-compose up -d
 Creating network "mlflow-basis_A" with driver "bridge"
 Creating mlflow_db      ... done
 Creating tracker_mlflow ... done
 Creating aws-s3         ... done
 ```
-1. Configure your client-side
+3. Create mlflow bucket. You can use my bundled script.
-For running mlflow files you need various environment variables set on the client side. To generate them use the convienience script `./bashrc_install.sh`, which installs it on your system or `./bashrc_generate.sh`, which just displays the config to copy & paste.
+Just run 
 ```shell 
 bash ./run_create_bucket.sh
 ```
 You can also do it **either using AWS CLI or Python Api**.
 <details><summary>AWS CLI</summary>
 1. [Install AWS cli](https://aws.amazon.com/cli/) **Yes, i know that you dont have an Amazon Web Services Subscription - dont worry! It wont be needed!**
 2. Configure AWS CLI - enter the same credentials from the `.env` file
 ```shell
 aws configure
 ```
 > AWS Access Key ID [****************123]: AKIAIOSFODNN7EXAMPLE  
 > AWS Secret Access Key [****************123]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY  
 > Default region name [us-west-2]: us-east-1  
 > Default output format [json]: <ENTER>  
 3. Run
 ```shell
 aws --endpoint-url=http://localhost:9000 s3 mb s3://mlflow
 ```
 </details>
 <details><summary>Python API</summary>
 1. Install Minio
 ```shell
 pip install Minio
 ```
 2. Run this to create a bucket
 ```python
 from minio import Minio
 from minio.error import ResponseError
 s3Client = Minio(
    'localhost:9000',
    access_key='<YOUR_AWS_ACCESSS_ID>', # copy from .env file
    secret_key='<YOUR_AWS_SECRET_ACCESS_KEY>', # copy from .env file
    secure=False
 )
 s3Client.make_bucket('mlflow')
 ```
 </details>
 ---
 4. Open up http://localhost:5000 for MlFlow, and http://localhost:9000/minio/mlflow/ for S3 bucket (you artifacts) with credentials from `.env` file
 5. Configure your client-side
 For running mlflow files you need various environment variables set on the client side. To generate them user the convienience script `./bashrc_install.sh`, which installs it on your system or `./bashrc_generate.sh`, which just displays the config to copy & paste.
 > $ ./bashrc_install.sh   
 > [ OK ] Successfully installed environment variables into your .bashrc!
 The script installs this variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, MLFLOW_S3_ENDPOINT_URL, MLFLOW_TRACKING_URI. All of them are needed to use mlflow from the client-side.
-2. Test the pipeline with below command with conda. If you dont have conda installed run with `--no-conda`
+6. Test the pipeline with below command with conda. If you dont have conda installed run with `--no-conda`
 ```shell
 mlflow run git@github.com:databricks/mlflow-example.git -P alpha=0.5
@@ -43,16 +97,8 @@ mlflow run git@github.com:databricks/mlflow-example.git -P alpha=0.5
 python ./quickstart/mlflow_tracking.py
 ```
-3. *(Optional)* If you are constantly switching your environment you can use this environment variable syntax
+7. *(Optional)* If you are constantly switching your environment you can use this environment variable syntax
 ```shell
 MLFLOW_S3_ENDPOINT_URL=http://localhost:9000 MLFLOW_TRACKING_URI=http://localhost:5000 mlflow run git@github.com:databricks/mlflow-example.git -P alpha=0.5
 ```
 </details>
 ## Licensing
 Copyright (c) 2021 Tomasz Dłuski
 Licensed under the MIT License (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License by reviewing the file [LICENSE](./LICENSE) in the repository.
--- a/create_bucket.py
+++ b/create_bucket.py
@@ -9,20 +9,20 @@ minioUrl =  os.environ.get('MLFLOW_S3_ENDPOINT_URL')
 bucketName =  os.environ.get('AWS_BUCKET_NAME')
 if accessID == None:
-    print('[!] AWS_ACCESS_KEY_ID environment variable is empty! run \'source .env\' to load it from the .env file')
+    print('[!] AWS_ACCESS_KEY_ID environemnt variable is empty! run \'source .env\' to load it from the .env file')
    exit(1)
 if accessSecret == None:
-    print('[!] AWS_SECRET_ACCESS_KEY environment variable is empty! run \'source .env\' to load it from the .env file')
+    print('[!] AWS_SECRET_ACCESS_KEY environemnt variable is empty! run \'source .env\' to load it from the .env file')
    exit(1)
 if minioUrl == None:
-    print('[!] MLFLOW_S3_ENDPOINT_URL environment variable is empty! run \'source .env\' to load it from the .env file')
+    print('[!] MLFLOW_S3_ENDPOINT_URL environemnt variable is empty! run \'source .env\' to load it from the .env file')
    exit(1)
 if bucketName == None:
-    print('[!] AWS_BUCKET_NAME environment variable is empty! run \'source .env\' to load it from the .env file')
+    print('[!] AWS_BUCKET_NAME environemnt variable is empty! run \'source .env\' to load it from the .env file')
    exit(1)
 minioUrlHostWithPort = minioUrl.split('//')[1]
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -1,23 +1,38 @@
-version: "3.9"
+version: '3.2'
 services:
-  s3:
+  caddy:
-    image: minio/minio:RELEASE.2023-11-01T18-37-25Z
+    image: caddy:2-alpine
-    restart: unless-stopped
+    container_name: caddy
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile
      - /caddy/data:/data
      - /caddy/config:/config
    ports:
-      - "9000:9000"
+      - 80:80
-      - "9001:9001"
+      - 443:443
      - 9000:9000
      - 9001:9001
    restart: unless-stopped
  s3:
    restart: always
    image: minio/minio:latest
    container_name: aws-s3
    ports:
      - 9000
      - 9001
    environment:
      - MINIO_ROOT_USER=${AWS_ACCESS_KEY_ID}
      - MINIO_ROOT_PASSWORD=${AWS_SECRET_ACCESS_KEY}
-    command: server /data --console-address ":9001"
+    command:
-    networks:
+      server /date --console-address ":9001"
      - internal
      - public
    volumes:
-      - minio_new_volume:/data
+      - ./s3:/date
    networks:
      - default
      - proxy-net
  db:
-    image: mysql:8-oracle # -oracle tag supports arm64 architecture!
+      restart: always
-    restart: unless-stopped
+      image: mysql/mysql-server:5.7.28
      container_name: mlflow_db
      expose:
          - "3306"
@@ -27,13 +42,16 @@ services:
          - MYSQL_PASSWORD=${MYSQL_PASSWORD}
          - MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD}
      volumes:
-      - db_new_volume:/var/lib/mysql
+          - ./dbdata:/var/lib/mysql
      networks:
-      - internal
+          - default
  mlflow:
-    image: ubuntu/mlflow:2.1.1_1.0-22.04
+      restart: always
      container_name: tracker_mlflow
-    restart: unless-stopped
+      image: tracker_ml
      build:
          context: ./mlflow
          dockerfile: Dockerfile
      ports:
          - "5000:5000"
      environment:
@@ -41,55 +59,11 @@ services:
          - AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
          - AWS_DEFAULT_REGION=${AWS_REGION}
          - MLFLOW_S3_ENDPOINT_URL=http://s3:9000
      entrypoint: mlflow server --backend-store-uri mysql+pymysql://${MYSQL_USER}:${MYSQL_PASSWORD}@db:3306/${MYSQL_DATABASE} --default-artifact-root s3://${AWS_BUCKET_NAME}/ -h 0.0.0.0
      networks:
-      - public
+          - proxy-net
-      - internal
+          - default
-    entrypoint: mlflow server --backend-store-uri mysql+pymysql://${MYSQL_USER}:${MYSQL_PASSWORD}@db:3306/${MYSQL_DATABASE} --default-artifact-root s3://${AWS_BUCKET_NAME}/ --artifacts-destination s3://${AWS_BUCKET_NAME}/ -h 0.0.0.0
+
    depends_on:
      wait-for-db:
        condition: service_completed_successfully
  create_s3_buckets:
    image: minio/mc
    depends_on:
      - "s3"
    entrypoint: >
      /bin/sh -c "
      until (/usr/bin/mc alias set minio http://s3:9000 '${AWS_ACCESS_KEY_ID}' '${AWS_SECRET_ACCESS_KEY}') do echo '...waiting...' && sleep 1; done;
      /usr/bin/mc mb minio/${AWS_BUCKET_NAME};
      exit 0;
      "
    networks:
      - internal
  wait-for-db:
    image: atkrad/wait4x
    depends_on:
      - db
    command: tcp db:3306 -t 90s -i 250ms
    networks:
      - internal
  run_test_experiment:
    build:
      context: ./test_experiment
      dockerfile: Dockerfile
    depends_on:
      - "mlflow"
    environment:
      - AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
      - AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
      - AWS_DEFAULT_REGION=${AWS_REGION}
      - MLFLOW_S3_ENDPOINT_URL=http://s3:9000
      - MLFLOW_TRACKING_URI=http://mlflow:5000
    entrypoint: >
      /bin/sh -c "
      python3 mlflow_tracking.py;
      exit 0;
      "
    networks:
      - internal
 networks:
-  internal:
+  default:
-  public:
+  proxy-net:
    driver: bridge
 volumes:
  db_new_volume:
  minio_new_volume:
--- a/mlflow/Dockerfile
+++ b/mlflow/Dockerfile
@@ -0,0 +1,10 @@
 FROM continuumio/miniconda3:latest
 ADD . /app
 WORKDIR /app
 COPY wait-for-it.sh wait-for-it.sh 
 RUN chmod +x wait-for-it.sh
 RUN pip install mlflow boto3 pymysql
--- a/mlflow/wait-for-it.sh
+++ b/mlflow/wait-for-it.sh
@@ -0,0 +1,182 @@
 #!/usr/bin/env bash
 # Use this script to test if a given TCP host/port are available
 WAITFORIT_cmdname=${0##*/}
 echoerr() { if [[ $WAITFORIT_QUIET -ne 1 ]]; then echo "$@" 1>&2; fi }
 usage()
 {
    cat << USAGE >&2
 Usage:
    $WAITFORIT_cmdname host:port [-s] [-t timeout] [-- command args]
    -h HOST | --host=HOST       Host or IP under test
    -p PORT | --port=PORT       TCP port under test
                                Alternatively, you specify the host and port as host:port
    -s | --strict               Only execute subcommand if the test succeeds
    -q | --quiet                Don't output any status messages
    -t TIMEOUT | --timeout=TIMEOUT
                                Timeout in seconds, zero for no timeout
    -- COMMAND ARGS             Execute command with args after the test finishes
 USAGE
    exit 1
 }
 wait_for()
 {
    if [[ $WAITFORIT_TIMEOUT -gt 0 ]]; then
        echoerr "$WAITFORIT_cmdname: waiting $WAITFORIT_TIMEOUT seconds for $WAITFORIT_HOST:$WAITFORIT_PORT"
    else
        echoerr "$WAITFORIT_cmdname: waiting for $WAITFORIT_HOST:$WAITFORIT_PORT without a timeout"
    fi
    WAITFORIT_start_ts=$(date +%s)
    while :
    do
        if [[ $WAITFORIT_ISBUSY -eq 1 ]]; then
            nc -z $WAITFORIT_HOST $WAITFORIT_PORT
            WAITFORIT_result=$?
        else
            (echo -n > /dev/tcp/$WAITFORIT_HOST/$WAITFORIT_PORT) >/dev/null 2>&1
            WAITFORIT_result=$?
        fi
        if [[ $WAITFORIT_result -eq 0 ]]; then
            WAITFORIT_end_ts=$(date +%s)
            echoerr "$WAITFORIT_cmdname: $WAITFORIT_HOST:$WAITFORIT_PORT is available after $((WAITFORIT_end_ts - WAITFORIT_start_ts)) seconds"
            break
        fi
        sleep 1
    done
    return $WAITFORIT_result
 }
 wait_for_wrapper()
 {
    # In order to support SIGINT during timeout: http://unix.stackexchange.com/a/57692
    if [[ $WAITFORIT_QUIET -eq 1 ]]; then
        timeout $WAITFORIT_BUSYTIMEFLAG $WAITFORIT_TIMEOUT $0 --quiet --child --host=$WAITFORIT_HOST --port=$WAITFORIT_PORT --timeout=$WAITFORIT_TIMEOUT &
    else
        timeout $WAITFORIT_BUSYTIMEFLAG $WAITFORIT_TIMEOUT $0 --child --host=$WAITFORIT_HOST --port=$WAITFORIT_PORT --timeout=$WAITFORIT_TIMEOUT &
    fi
    WAITFORIT_PID=$!
    trap "kill -INT -$WAITFORIT_PID" INT
    wait $WAITFORIT_PID
    WAITFORIT_RESULT=$?
    if [[ $WAITFORIT_RESULT -ne 0 ]]; then
        echoerr "$WAITFORIT_cmdname: timeout occurred after waiting $WAITFORIT_TIMEOUT seconds for $WAITFORIT_HOST:$WAITFORIT_PORT"
    fi
    return $WAITFORIT_RESULT
 }
 # process arguments
 while [[ $# -gt 0 ]]
 do
    case "$1" in
        *:* )
        WAITFORIT_hostport=(${1//:/ })
        WAITFORIT_HOST=${WAITFORIT_hostport[0]}
        WAITFORIT_PORT=${WAITFORIT_hostport[1]}
        shift 1
        ;;
        --child)
        WAITFORIT_CHILD=1
        shift 1
        ;;
        -q | --quiet)
        WAITFORIT_QUIET=1
        shift 1
        ;;
        -s | --strict)
        WAITFORIT_STRICT=1
        shift 1
        ;;
        -h)
        WAITFORIT_HOST="$2"
        if [[ $WAITFORIT_HOST == "" ]]; then break; fi
        shift 2
        ;;
        --host=*)
        WAITFORIT_HOST="${1#*=}"
        shift 1
        ;;
        -p)
        WAITFORIT_PORT="$2"
        if [[ $WAITFORIT_PORT == "" ]]; then break; fi
        shift 2
        ;;
        --port=*)
        WAITFORIT_PORT="${1#*=}"
        shift 1
        ;;
        -t)
        WAITFORIT_TIMEOUT="$2"
        if [[ $WAITFORIT_TIMEOUT == "" ]]; then break; fi
        shift 2
        ;;
        --timeout=*)
        WAITFORIT_TIMEOUT="${1#*=}"
        shift 1
        ;;
        --)
        shift
        WAITFORIT_CLI=("$@")
        break
        ;;
        --help)
        usage
        ;;
        *)
        echoerr "Unknown argument: $1"
        usage
        ;;
    esac
 done
 if [[ "$WAITFORIT_HOST" == "" || "$WAITFORIT_PORT" == "" ]]; then
    echoerr "Error: you need to provide a host and port to test."
    usage
 fi
 WAITFORIT_TIMEOUT=${WAITFORIT_TIMEOUT:-15}
 WAITFORIT_STRICT=${WAITFORIT_STRICT:-0}
 WAITFORIT_CHILD=${WAITFORIT_CHILD:-0}
 WAITFORIT_QUIET=${WAITFORIT_QUIET:-0}
 # Check to see if timeout is from busybox?
 WAITFORIT_TIMEOUT_PATH=$(type -p timeout)
 WAITFORIT_TIMEOUT_PATH=$(realpath $WAITFORIT_TIMEOUT_PATH 2>/dev/null || readlink -f $WAITFORIT_TIMEOUT_PATH)
 WAITFORIT_BUSYTIMEFLAG=""
 if [[ $WAITFORIT_TIMEOUT_PATH =~ "busybox" ]]; then
    WAITFORIT_ISBUSY=1
    # Check if busybox timeout uses -t flag
    # (recent Alpine versions don't support -t anymore)
    if timeout &>/dev/stdout | grep -q -e '-t '; then
        WAITFORIT_BUSYTIMEFLAG="-t"
    fi
 else
    WAITFORIT_ISBUSY=0
 fi
 if [[ $WAITFORIT_CHILD -gt 0 ]]; then
    wait_for
    WAITFORIT_RESULT=$?
    exit $WAITFORIT_RESULT
 else
    if [[ $WAITFORIT_TIMEOUT -gt 0 ]]; then
        wait_for_wrapper
        WAITFORIT_RESULT=$?
    else
        wait_for
        WAITFORIT_RESULT=$?
    fi
 fi
 if [[ $WAITFORIT_CLI != "" ]]; then
    if [[ $WAITFORIT_RESULT -ne 0 && $WAITFORIT_STRICT -eq 1 ]]; then
        echoerr "$WAITFORIT_cmdname: strict mode, refusing to execute subprocess"
        exit $WAITFORIT_RESULT
    fi
    exec "${WAITFORIT_CLI[@]}"
 else
    exit $WAITFORIT_RESULT
 fi
--- a/quickstart/mlflow_tracking.py
+++ b/quickstart/mlflow_tracking.py
@@ -1,22 +1,22 @@
 import os
 from random import random, randint
-import mlflow
+from mlflow import mlflow,log_metric, log_param, log_artifacts
 if __name__ == "__main__":
    with mlflow.start_run() as run:
-        mlflow.set_tracking_uri('http://localhost:5000')
+        mlflow.set_tracking_uri('https://mlflow.localhost')
        print("Running mlflow_tracking.py")
-        mlflow.log_param("param1", randint(0, 100))
+        log_param("param1", randint(0, 100))
-        mlflow.log_metric("foo", random())
+        log_metric("foo", random())
-        mlflow.log_metric("foo", random() + 1)
+        log_metric("foo", random() + 1)
-        mlflow.log_metric("foo", random() + 2)
+        log_metric("foo", random() + 2)
        if not os.path.exists("outputs"):
            os.makedirs("outputs")
        with open("outputs/test.txt", "w") as f:
            f.write("hello world!")
-        mlflow.log_artifacts("outputs")
+        log_artifacts("outputs")
--- a/test_experiment/Dockerfile
+++ b/test_experiment/Dockerfile
@@ -1,6 +0,0 @@
 FROM continuumio/miniconda3:latest
 RUN pip install mlflow boto3
 WORKDIR /app
 COPY . .
--- a/test_experiment/mlflow_tracking.py
+++ b/test_experiment/mlflow_tracking.py
@@ -1,22 +0,0 @@
 import os
 from random import random, randint
 import mlflow
 if __name__ == "__main__":
    with mlflow.start_run() as run:
        mlflow.set_tracking_uri('http://mlflow:5000')
        print("Running mlflow_tracking.py")
        mlflow.log_param("param1", randint(0, 100))
        mlflow.log_metric("foo", random())
        mlflow.log_metric("foo", random() + 1)
        mlflow.log_metric("foo", random() + 2)
        if not os.path.exists("outputs"):
            os.makedirs("outputs")
        with open("outputs/test.txt", "w") as f:
            f.write("hello world!")
        mlflow.log_artifacts("outputs")