Skip to content

feat: add workflow_dispatch for automated dataset uploads to Azure Blob #14

Description

@devin-ai-integration

Summary

Add a GitHub Actions workflow_dispatch workflow that allows team members to upload dataset files to Azure Blob Storage directly from the GitHub Actions UI — no local az CLI setup required.

Proposed Implementation

  1. Add open_data to github-oidc.auto.tfvars in infra-azure:

    open_data = {
      repo         = "bcit-tlu/open-data"
      branches     = ["main"]
      pull_request = false
    }
  2. Grant the identity Storage Blob Data Contributor on open_data_account scope (in role-assignments.auto.tfvars or dynamically via github_sp_assignments)

  3. Create .github/workflows/upload-datasets.yaml:

    name: Upload Datasets
    on:
      workflow_dispatch:
        inputs:
          environment:
            description: Target container (datasets-latest or datasets-stable)
            required: true
            default: datasets-latest
            type: choice
            options:
              - datasets-latest
              - datasets-stable
          source_path:
            description: Path to upload (relative to repo root, e.g. data/)
            required: true
            type: string
    
    jobs:
      upload:
        runs-on: ubuntu-latest
        permissions:
          id-token: write
          contents: read
        steps:
          - uses: actions/checkout@v4
          - uses: azure/login@v2
            with:
              client-id: \${{ secrets.AZURE_CLIENT_ID }}
              tenant-id: \${{ secrets.AZURE_TENANT_ID }}
              subscription-id: \${{ secrets.AZURE_SUBSCRIPTION_ID }}
          - run: |
              az storage blob upload-batch \
                --account-name opendata6ivjr59a \
                --destination \${{ inputs.environment }} \
                --source \${{ inputs.source_path }} \
                --auth-mode login --overwrite

Prerequisites

  • Add open-data to infra-azure OIDC config (github-oidc.auto.tfvars)
  • Grant Storage Blob Data Contributor role to the new managed identity
  • Add GitHub repo secrets: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_SUBSCRIPTION_ID

Context

Currently datasets are uploaded manually via az storage blob upload-batch --auth-mode login. This workflow would allow any team member with repo write access to trigger an upload from the GitHub Actions tab without needing local Azure CLI authentication.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions