-
Notifications
You must be signed in to change notification settings - Fork 4
Part 2: Develop Machine Learning Pipeline
In this part, we will walk through how to configure the ML code repo with Azure DevOps, and build, release and deploy a toy ML model. Here we focus on the system setups, and will slide over on the ML and Python part for now.
Connect to your Azure DevOps Git repo created earlier. You can use different git GUI, here I am using Visual Studio.
In Visual Studio Open Team Explorer, click Manage Connections, and add the repo.
You can right click the folder and choose open folders in File Explore, to access the local files. At this point, you would only see an empty structure under the adf folder (assuming you have configure the ADF service with this git in Part 1), don't worry, in the next module, we will configure the adf part through web UI and you do not need to change any files under adf folder manually.
Next step is to upload the aml folder (model python template) to the git. Download the folder from this link to your local folder, and commit the changes back to Azure DevOps Repo.

The MLADS repo contains two main folder: adf and aml. As the name suggests, ADF pipeline scripts are tracked inside adf folder, while ML related scripts and AML config files are tracked inside aml folder. The development of adf is through interactive web UI, hence you don't need to edit the json files under adf folder. There is another yml file outside aml and adf folder, it was generated through Azure DevOps, again, you don't need to edit it. The focus of the tutorial is on the file structures under aml folder.
- environment_setup: this folder saves configs for AML service and Azure DevOps
- Config: the two files specify the profile to connect with AML service. In this tutorial, we do not differentiate between dev and prod, and you can ignore prod file for now.
- config_dev: Remember to add your subscription information to the config files.
- tenant_id = {you can find this from Azure Active Directory page}
- service_principal_id = {Application (client) ID of mlads_aad}
- subscription_id = {you can find this from Azure Portal Subscription tab}
- resource_group = MLADS_rg
- WorkSpace = MLADS
- BlobName= mlads
- Data_factory_name = dfcompute
- location = westus
- keyVaultUrl = {https://mlads*******.vault.azure.net}
- config_dev: Remember to add your subscription information to the config files.
- install_requirments.sh: this file passes the required Azure Python packages to Azure DevOps Deployment
- mlads_util.py: this .py script setup the AML environment for Python.
- requirements.txt: this file passes the required Python packages to Azure DevOps Deployment
- RunAll.py: this .py script triggers the ML pipeline built in Azure DevOps Deployment
- Config: the two files specify the profile to connect with AML service. In this tutorial, we do not differentiate between dev and prod, and you can ignore prod file for now.
- mlads: this folder saves configs and scripts that are model specific.
- aml_service: this folder saves Model specific configs and ML pipeline scripts
- Run.py: ML pipeline script to generate the ML pipeline
- setup.ini: this file passes the model required compute VM config.
- mlads.py: this .py script is the model script, read the data, run the model and make predictions. Here we are going to build a simple CF recommendation model to recommend the next Azure service for each subscription.
- aml_service: this folder saves Model specific configs and ML pipeline scripts
In this tutorial, I will skip the explanation of the Python files. Please refer to the inline comments. You can find many AML Python SDK tutorials here [https://docs.microsoft.com/en-us/python/api/overview/azure/ml/?view=azure-ml-py](Azure ML Python SDK)
Use your favorite Python editor ( I am using Eclipse + Pydev), and execute Run.py line by line You can refer to the inline comments on what each script does.
You may encounter a few errors with missing python packages or wrong working directory.
Once you execute Run.py with isTest= True, you can find a running "Experiment" in Azure ML workspace.
Now you can execute Run.py with isTest= False, the script will trigger a ML pipeline build in Azure ML workspace.
Try to run the pipeline build at least once through your local Python environment. It would generate a json file to store the latest AML pipeline id and upload it to the blob storage account. This file will later tell the ADF pipeline what's the latest AML pipeline to trigger.
Now we can move to ADF configuration