2020-08-24 00:43:21 +02:00
# MLFlow Docker Setup [](https://github.com/Toumash/mlflow-docker/actions)
2020-08-24 00:04:53 +02:00
If you want to boot up mlflow project with one-liner - this repo is for you.
2020-08-24 00:17:12 +02:00
The only requirement is docker installed on your system and we are going to use Bash on linux/windows.
2018-11-22 21:13:29 +08:00
2020-08-24 00:21:05 +02:00
## Step by step guide
2020-08-24 09:07:01 +02:00
1. Configure `.env` file for your choice. You can put there anything you like, it will be used for our services configuration
2020-08-24 00:17:12 +02:00
2020-08-24 09:25:08 +02:00
2. Run the Infrastructure by this one line:
```shell
$ docker-compose up -d
Creating network "mlflow-basis_A" with driver "bridge"
Creating mlflow_db ... done
Creating tracker_mlflow ... done
Creating aws-s3 ... done
```
3. Create mlflow bucket. You can do it **either using AWS CLI or Python Api** . **You dont need an AWS subscription**
2020-08-24 00:20:18 +02:00
< details > < summary > AWS CLI< / summary >
2020-08-24 00:17:12 +02:00
2020-08-23 23:27:00 +02:00
1. [Install AWS cli ](https://aws.amazon.com/cli/ ) **Yes, i know that you dont have an Amazon Web Services Subscription - dont worry! It wont be needed!**
2020-08-24 00:17:12 +02:00
2. Configure AWS CLI - enter the same credentials from the `.env` file
2018-11-22 21:13:29 +08:00
2020-08-23 23:27:00 +02:00
```shell
aws configure
2018-11-22 21:13:29 +08:00
```
2020-08-23 23:27:00 +02:00
> AWS Access Key ID [****************123]: AKIAIOSFODNN7EXAMPLE
> AWS Secret Access Key [****************123]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
> Default region name [us-west-2]: us-east-1
> Default output format [json]: <ENTER>
2018-11-22 21:13:29 +08:00
2020-08-24 00:17:12 +02:00
3. Run
2020-08-23 23:27:00 +02:00
```shell
aws --endpoint-url=http://localhost:9000 s3 mb s3://mlflow
2018-11-22 21:13:29 +08:00
```
2020-08-24 00:19:41 +02:00
2020-08-24 00:17:12 +02:00
< / details >
2020-08-24 00:20:18 +02:00
< details > < summary > Python API< / summary >
2020-08-24 00:17:12 +02:00
1. Install Minio
```shell
pip install Minio
```
2. Run this to create a bucket
```python
from minio import Minio
from minio.error import ResponseError
s3Client = Minio(
'localhost:9000',
2020-08-24 09:48:00 +02:00
access_key='< YOUR_AWS_ACCESSS_ID > ', # copy from .env file
secret_key='< YOUR_AWS_SECRET_ACCESS_KEY > ', # copy from .env file
2020-08-24 00:17:12 +02:00
secure=False
)
s3Client.make_bucket('mlflow')
```
< / details >
2018-11-22 21:13:29 +08:00
2020-08-24 00:21:05 +02:00
---
2020-08-24 09:25:08 +02:00
4. Open up http://localhost:5000/#/ for MlFlow, and http://localhost:9000/minio/mlflow/ for S3 bucket (you artifacts) with credentials from `.env` file
2018-11-22 21:13:29 +08:00
2020-08-24 09:25:08 +02:00
5. Configure your client-side
2018-11-22 21:13:29 +08:00
2020-08-23 23:27:00 +02:00
For running mlflow files you AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables present on the client-side.
2018-11-22 21:13:29 +08:00
2020-08-24 00:17:12 +02:00
Also, you will need to specify the address of your S3 server (minio) and mlflow tracking server
2020-08-23 23:27:00 +02:00
```shell
export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
2020-08-24 00:17:12 +02:00
export MLFLOW_S3_ENDPOINT_URL=http://localhost:9000
export MLFLOW_TRACKING_URI=http://localhost:5000
2018-11-22 21:13:29 +08:00
```
2020-08-24 09:42:24 +02:00
You can load them from the .env file. But i recommend putting it in the .bashrc as below
2020-08-24 00:36:00 +02:00
```
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
AWS_REGION=us-east-1
AWS_BUCKET_NAME=mlflow
MYSQL_DATABASE=mlflow
MYSQL_USER=mlflow_user
MYSQL_PASSWORD=mlflow_password
MYSQL_ROOT_PASSWORD=toor
MLFLOW_S3_ENDPOINT_URL=http://localhost:9000
MLFLOW_TRACKING_URI=http://localhost:5000
```
Then run
2020-08-23 23:27:00 +02:00
```shell
source .env
2018-11-22 21:13:29 +08:00
```
2020-08-24 00:36:00 +02:00
or add them as `export X=Y` to the .bashrc file and then run
2018-11-22 21:13:29 +08:00
2020-08-23 23:27:00 +02:00
```shell
source ~/.bashrc
2018-11-22 21:13:29 +08:00
```
2020-08-23 22:28:42 +02:00
2020-08-24 09:25:08 +02:00
6. Test the pipeline with below command with conda. If you dont have conda installed run with `--no-conda`
2020-08-23 22:28:42 +02:00
2020-08-23 23:27:00 +02:00
```shell
2020-08-24 00:17:12 +02:00
mlflow run git@github .com:databricks/mlflow-example.git -P alpha=0.5
2020-08-23 22:28:42 +02:00
```
2020-08-23 23:28:50 +02:00
Optionally you can run
```shell
2020-08-24 00:17:12 +02:00
python ./quickstart/mlflow_tracking.py
2020-08-23 23:28:50 +02:00
```
2020-08-24 09:25:08 +02:00
7. (Optional) If you are constantly switching your environment you can use this environment variable syntax
2020-08-23 22:28:42 +02:00
2020-08-24 00:17:12 +02:00
```shell
MLFLOW_S3_ENDPOINT_URL=http://localhost:9000 MLFLOW_TRACKING_URI=http://localhost:5000 mlflow run git@github .com:databricks/mlflow-example.git -P alpha=0.5
2020-08-23 22:28:42 +02:00
```
2020-08-24 09:26:36 +02:00
# Improvements needed
- [ ] db is very slow to boot up, and tracker_mlflow is crashing due to db being loaded. We need a wait script for the mlflow, to wait for the db to boot up