Our DAP profiles each need an id field that can be used to uniquely id them (the pipeline name and version are currently used, and the name is more f a human friendly string with spaces and caps). This field will be mapped to dag_ids so that when a user selects a workflow to run we can get the correct input params for the user to provide.
We also need to add new endpoints to support apache airflow usage. The proposed list is below, and the ones for front end ui building should be the start (so, getting list of available dags and then get the profiles for DAPs used in DAGs)
Proposed new REST API endpoints
Get info about available workflows (for UI and for other things)
GET
workflows/{dag_id}
- List on all (no
dag_id) or one (if dag_id specified and
valid) workflow.
- This can be a proxy straight off to the airflow REST API
/api/v2/dags/{dag_id} endpoint unless we really need to prune data
returned
workflows/{dag_id}/tasks/{task_id}
- list details about one or all tasks in a DAG
- this is probably going to need some data from the airflow api at
/api/v2/dags/{dag_id}/tasks, but also needs the parameter schema for
each DAP used in the workflow, so it would need to pull the mappings from
wherever we put our manifests and then read the profile for each
DAP@version and send that back as a list.
if we need to pull inheritance base schema, do that here?
Start/stop workflows
POST
workflows/{dag_id}/runs
- kick off a specific dag
- this will need to take some representation of the param schema for each
DAP in the workflow. wince we're initially going with workflows specifying
the DAP@versions, these will be the params for one of those entries.
- this will need to return a 400 on any bad/missing params
PATCH
workflows/{dag_id}/runs/{dag_run_id}
- this is how we can handle stopping a run that's in progress
- there is no state for stopped (or paused) and so the main thing folks do
is just set it to failed or success. You can also change things at
that point and reschedule if it was stopped cause a param was wrong or
something.
- we can proxy this to REST API endpoint
/api/v2/dags/{dag_id}/dagRuns/{dag_run_id}
Check workflow status
GET
workflows/{dag_id}/runs/{dag_run_id}
- return data about one or all running workflow.
- this could probably just be a proxy off to the REST API endpoint at
/api/v2/dags/{dag_id}/dagRuns/{dag_run_id}
dag_run_id is maintained on the front end after the workflow is kicked
off
- these take a query param for a pattern for a user name assuming we have
that all worked out.
workflows/{dag_id}/runs/{dag_run_id}/tasks/{task_id}
- get info on one or more task instances in a DAG run
- use this to get all tasks in a workflow run and figure states for each.
because tasks can go in paralell there may be more than one in a running
state. also, things that have completed will be known as well.
- we can proxy this off to the REST API endpoint
/api/v2/dags/{dag_id}/dagRuns/{dag_run_id}/taskInstances/{task_id}
Our DAP profiles each need an id field that can be used to uniquely id them (the pipeline name and version are currently used, and the name is more f a human friendly string with spaces and caps). This field will be mapped to
dag_idsso that when a user selects a workflow to run we can get the correct input params for the user to provide.We also need to add new endpoints to support apache airflow usage. The proposed list is below, and the ones for front end ui building should be the start (so, getting list of available dags and then get the profiles for DAPs used in DAGs)
Proposed new REST API endpoints
Get info about available workflows (for UI and for other things)
GETworkflows/{dag_id}dag_id) or one (ifdag_idspecified andvalid) workflow.
/api/v2/dags/{dag_id}endpoint unless we really need to prune datareturned
workflows/{dag_id}/tasks/{task_id}/api/v2/dags/{dag_id}/tasks, but also needs the parameter schema foreach DAP used in the workflow, so it would need to pull the mappings from
wherever we put our manifests and then read the profile for each
DAP@version and send that back as a list.
if we need to pull inheritance base schema, do that here?
Start/stop workflows
POSTworkflows/{dag_id}/runsDAP in the workflow. wince we're initially going with workflows specifying
the
DAP@versions, these will be the params for one of those entries.PATCHworkflows/{dag_id}/runs/{dag_run_id}is just set it to
failedorsuccess. You can also change things atthat point and reschedule if it was stopped cause a param was wrong or
something.
/api/v2/dags/{dag_id}/dagRuns/{dag_run_id}Check workflow status
GETworkflows/{dag_id}/runs/{dag_run_id}/api/v2/dags/{dag_id}/dagRuns/{dag_run_id}dag_run_idis maintained on the front end after the workflow is kickedoff
that all worked out.
workflows/{dag_id}/runs/{dag_run_id}/tasks/{task_id}because tasks can go in paralell there may be more than one in a running
state. also, things that have completed will be known as well.
/api/v2/dags/{dag_id}/dagRuns/{dag_run_id}/taskInstances/{task_id}