Debug GitHub Action to create new GCP Composer Environment#50
Debug GitHub Action to create new GCP Composer Environment#50andrewphilipsmith wants to merge 12 commits into
Conversation
|
This PR will make issue https://mapaction.atlassian.net/browse/DATAPIPE-81 irrelevant. It will also have implications for this issues: |
|
Hi @andrewphilipsmith, it might be worth discussing this PR a bit when you're back. Essentially I'm wondering if doing this through GitHub Actions is the right approach, considering the Airflow environments will presumably be long lived and so a different context to routine tasks that I think GH Actions is more suited to. I completely support making the Composer configuration defined as code though, so changes can be tracked in Git and resources are reproducible. However I'd recommend we use something like Terraform (which I have a fair amount of experience with) to so. There's documentation here on how to configure a Google Composer instance for example, which I'm happy to implement. I also have some leave coming up so will have time to get this setup. I would definitely want to setup this up in a separate Google Cloud Project initially so as not to interfere with anything (once as code it would be easy to target a different project afterwards). |
This PR builds on #49 and debugs the GitHub Action which creates a new Composer Environment (aka managed Airflow instance) in Google Cloud Platform.
I am adding some of the relevant notes are here until I find a better location for them in the relevant files in the repo.
Process
GCP Composer names each instance of Airflow and its associated K8s nodes an "Environment". A new environment can be created with the command:
The Action can only be manually triggered. When triggered it asks the user to input the name to be used for the new Composer Environment. There is no default value.
TODO: The value of
secrets.GCLOUD_COMPOSER_ENVIRONMENTshould be updated with the new name.When you create a new Environment, you cannot specify which bucket it will read the DAGs from. It will create a new bucket (with an incomprehensible name). The URL of the new bucket is queried with this command:
This gives the URL including the
/dagssuffix. The/dagssuffix needs to be stripped off before the next step.The
Configsaction (which deploys the dags and plugin code to the Airflow instance) uses theGCLOUD_BUCKET_PATHsecret as the destination to deploy to. Therefore this action needs to update the value of the GCLOUD_BUCKET_PATH secret.TODO: Update the value of
secret.GCLOUD_BUCKET_PATHOpen questions:
requirements.txtforapache-airflowand uses the version number specified there. Is this a sensible/sane/foolhardy idea?ContainerandConfigsactions on completion of this action?Removing old Composer Environment and DAG bucket
There is currently no Action that removes an old Composer Environment and DAG bucket. Here are some notes on the process. The command
gcloud composer delete...does not remove the associated DAG bucket. Therefore this DAG bucket must be identified before removing the relevant Composer Environment.