Skip to content

Add DAP id field and wire up new dag api endpoints #342

@thecaffiend

Description

@thecaffiend

Our DAP profiles each need an id field that can be used to uniquely id them (the pipeline name and version are currently used, and the name is more f a human friendly string with spaces and caps). This field will be mapped to dag_ids so that when a user selects a workflow to run we can get the correct input params for the user to provide.

We also need to add new endpoints to support apache airflow usage. The proposed list is below, and the ones for front end ui building should be the start (so, getting list of available dags and then get the profiles for DAPs used in DAGs)

Proposed new REST API endpoints

Get info about available workflows (for UI and for other things)

  • GET
    • workflows/{dag_id}
      • List on all (no dag_id) or one (if dag_id specified and
        valid) workflow.
      • This can be a proxy straight off to the airflow REST API
        /api/v2/dags/{dag_id} endpoint unless we really need to prune data
        returned
    • workflows/{dag_id}/tasks/{task_id}
      • list details about one or all tasks in a DAG
      • this is probably going to need some data from the airflow api at
        /api/v2/dags/{dag_id}/tasks, but also needs the parameter schema for
        each DAP used in the workflow, so it would need to pull the mappings from
        wherever we put our manifests and then read the profile for each
        DAP@version and send that back as a list.
        if we need to pull inheritance base schema, do that here?

Start/stop workflows

  • POST
    • workflows/{dag_id}/runs
      • kick off a specific dag
      • this will need to take some representation of the param schema for each
        DAP in the workflow. wince we're initially going with workflows specifying
        the DAP@versions, these will be the params for one of those entries.
      • this will need to return a 400 on any bad/missing params
  • PATCH
    • workflows/{dag_id}/runs/{dag_run_id}
      • this is how we can handle stopping a run that's in progress
      • there is no state for stopped (or paused) and so the main thing folks do
        is just set it to failed or success. You can also change things at
        that point and reschedule if it was stopped cause a param was wrong or
        something.
      • we can proxy this to REST API endpoint
        /api/v2/dags/{dag_id}/dagRuns/{dag_run_id}

Check workflow status

  • GET
    • workflows/{dag_id}/runs/{dag_run_id}
      • return data about one or all running workflow.
      • this could probably just be a proxy off to the REST API endpoint at
        /api/v2/dags/{dag_id}/dagRuns/{dag_run_id}
      • dag_run_id is maintained on the front end after the workflow is kicked
        off
      • these take a query param for a pattern for a user name assuming we have
        that all worked out.
    • workflows/{dag_id}/runs/{dag_run_id}/tasks/{task_id}
      • get info on one or more task instances in a DAG run
      • use this to get all tasks in a workflow run and figure states for each.
        because tasks can go in paralell there may be more than one in a running
        state. also, things that have completed will be known as well.
      • we can proxy this off to the REST API endpoint
        /api/v2/dags/{dag_id}/dagRuns/{dag_run_id}/taskInstances/{task_id}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions