Search before asking
What happened
The CircleCI collectWorkflows subtask fails when DevLake calls
GET /v2/pipeline/{pipeline_id}/workflow for a pipeline whose workflow
endpoint returns HTTP 500 from CircleCI, even though GET /v2/pipeline/{id}
returns valid metadata (200 OK with a populated body).
The subtask retries three times, then aborts. Because collectWorkflows runs
before collectJobs, no workflows or jobs are collected for the entire
project on that run — CI/CD and DORA metrics go stale.
Environment
|
|
| DevLake version |
v1.0.3-beta12 |
| Plugin |
circleci |
| Database |
MySQL 8.x |
| CircleCI deployment type |
CircleCI Server (self-hosted) — not CircleCI Cloud |
| CircleCI Server version |
4.9.4 |
| Trigger |
Project blueprint data collection (full or incremental) |
Note on CircleCI Server vs Cloud: This bug is reproducible on a
self-hosted CircleCI Server instance. It has not been verified on
CircleCI Cloud (circleci.com), but the DevLake code path is identical
for both. CircleCI Server exposes the same /v2/ API surface; the broken
workflow endpoint behaviour described here may be specific to self-hosted
versions where individual pipeline records can become corrupt or stuck.
Error / logs
subtask collectWorkflows ended unexpectedly
caused by: Retry exceeded 3 times calling /v2/pipeline/<pipeline-id>/workflow.
The last error was: Http DoAsync error calling [method:GET path:/v2/pipeline/<pipeline-id>/workflow query:map[]].
Response: {"message":"Internal Server Error"} (500)
Reproduced against CircleCI Server API directly
Pipeline metadata succeeds:
curl -s -H "Circle-Token: $TOKEN" \
"https://<your-circleci-server>/api/v2/pipeline/<pipeline-id>"
# Returns: {"id":"<pipeline-id>","errors":[],"project_slug":"gh/<org>/<repo>","state":"created", ...}
Workflow list fails with 500:
curl -s -H "Circle-Token: $TOKEN" \
"https://<your-circleci-server>/api/v2/pipeline/<pipeline-id>/workflow"
# Returns: {"message":"An internal server error occurred."} (HTTP 500)
The same pipeline ID returns 200 on the /pipeline/{id} endpoint but 500 on
/pipeline/{id}/workflow. This is a CircleCI Server-side condition (corrupt
or stuck pipeline record) that DevLake cannot prevent, but must handle
gracefully.
Affected pipeline (example shape)
| Field |
Value |
| Pipeline ID |
<uuid> (valid, returned by /project/{slug}/pipeline pagination) |
| Project |
gh/<org>/<repo> |
| State |
created (stuck — workflows never materialized) |
| Trigger |
Webhook / pull request event |
Pipelines in created state with no workflows are a known occurrence on
CircleCI Server when a webhook fires but the server fails to create workflow
records internally.
Root cause (DevLake side)
collectWorkflows iterates every row in _tool_circleci_pipelines for the
project on full sync (no SyncPolicy.TimeAfter filter on the DB query). For each
row it calls the workflow API. The plugin only skips 404 responses via
ignoreDeletedBuilds in shared.go — 500 is not skipped, so one bad
pipeline record kills the subtask.
This is not the same bug as #8907
(empty workflow ID → /v2/workflow//job 500 in collectJobs).
Root cause (CircleCI Server side)
For at least one pipeline, CircleCI Server returns 500 on the workflow endpoint
while pipeline metadata is available — likely a corrupt or stuck pipeline record
on the server (state created since creation, workflows never materialized).
This has been observed on a self-hosted CircleCI Server instance. It is unclear
whether CircleCI Cloud can produce this condition.
What do you expect to happen
- When CircleCI returns 404 or 500 for a single pipeline's workflow
endpoint, DevLake should log and skip that pipeline and continue collecting
workflows for the rest of the project.
collectWorkflows should respect the blueprint Data Time Range
(SyncPolicy.TimeAfter) when choosing which _tool_circleci_pipelines rows
to iterate, so full sync does not call the workflow API for every historical
pipeline row ever stored in the tool table.
- A single bad pipeline on CircleCI Server should not block CI/CD collection
for an entire project.
How to reproduce
- Configure a CircleCI connection pointing at a CircleCI Server instance.
- Ensure
_tool_circleci_pipelines contains at least one pipeline ID where
GET /v2/pipeline/{id} returns 200 but
GET /v2/pipeline/{id}/workflow returns 500.
- These are typically pipelines in
created state with no associated workflows,
caused by a failed or corrupt webhook trigger on the server.
- Run CircleCI data collection for that project (full sync is the most reliable
trigger because collectWorkflows iterates all DB pipeline rows without a
time filter).
- Observe
collectWorkflows fail with retry-exceeded 500; collectJobs and
downstream converters do not run for the entire project.
To find candidate pipelines on your CircleCI Server instance:
# List project pipelines and look for state=created with no items in /workflow
curl -s -H "Circle-Token: $TOKEN" \
"https://<your-circleci-server>/api/v2/project/gh/<org>/<repo>/pipeline" \
| jq '.items[] | select(.state=="created") | .id'
# Then test each candidate:
curl -s -H "Circle-Token: $TOKEN" \
"https://<your-circleci-server>/api/v2/pipeline/<candidate-id>/workflow"
Anything else
Operator workaround (per pipeline)
Delete the bad pipeline row from the tool table, then re-sync:
DELETE FROM _tool_circleci_pipelines
WHERE id = '<pipeline-id-returning-500>';
This is not durable — new bad records or full-sync iteration over remaining
historical rows can trigger the same failure again.
Related issues (not duplicates)
| Issue |
Relationship |
| #7797 |
collectWorkflows 404 after retention; time-range fix on collectPipelines only — closed |
| #8907 |
/v2/workflow//job 500 in collectJobs (empty workflow ID) — closed, #8912 |
| #8309 |
Malformed workflow JSON in convert phase — closed |
No open issue covers 500 on /v2/pipeline/{valid-id}/workflow in
collectWorkflows.
Frequency
Occurs whenever collection reaches a pipeline with a broken workflow endpoint.
On full sync over a project with an extended pipeline history in
_tool_circleci_pipelines, the probability of hitting such a record increases
significantly. Projects that have been active for over a year are most at risk.
Version
v1.0.3-beta12
Are you willing to submit PR?
Code of Conduct
Search before asking
What happened
The CircleCI
collectWorkflowssubtask fails when DevLake callsGET /v2/pipeline/{pipeline_id}/workflowfor a pipeline whose workflowendpoint returns HTTP 500 from CircleCI, even though
GET /v2/pipeline/{id}returns valid metadata (200 OK with a populated body).
The subtask retries three times, then aborts. Because
collectWorkflowsrunsbefore
collectJobs, no workflows or jobs are collected for the entireproject on that run — CI/CD and DORA metrics go stale.
Environment
v1.0.3-beta12circleciError / logs
Reproduced against CircleCI Server API directly
Pipeline metadata succeeds:
Workflow list fails with 500:
The same pipeline ID returns 200 on the
/pipeline/{id}endpoint but 500 on/pipeline/{id}/workflow. This is a CircleCI Server-side condition (corruptor stuck pipeline record) that DevLake cannot prevent, but must handle
gracefully.
Affected pipeline (example shape)
<uuid>(valid, returned by/project/{slug}/pipelinepagination)gh/<org>/<repo>created(stuck — workflows never materialized)Pipelines in
createdstate with no workflows are a known occurrence onCircleCI Server when a webhook fires but the server fails to create workflow
records internally.
Root cause (DevLake side)
collectWorkflowsiterates every row in_tool_circleci_pipelinesfor theproject on full sync (no
SyncPolicy.TimeAfterfilter on the DB query). For eachrow it calls the workflow API. The plugin only skips 404 responses via
ignoreDeletedBuildsinshared.go— 500 is not skipped, so one badpipeline record kills the subtask.
This is not the same bug as #8907
(empty workflow ID →
/v2/workflow//job500 incollectJobs).Root cause (CircleCI Server side)
For at least one pipeline, CircleCI Server returns 500 on the workflow endpoint
while pipeline metadata is available — likely a corrupt or stuck pipeline record
on the server (state
createdsince creation, workflows never materialized).This has been observed on a self-hosted CircleCI Server instance. It is unclear
whether CircleCI Cloud can produce this condition.
What do you expect to happen
endpoint, DevLake should log and skip that pipeline and continue collecting
workflows for the rest of the project.
collectWorkflowsshould respect the blueprint Data Time Range(
SyncPolicy.TimeAfter) when choosing which_tool_circleci_pipelinesrowsto iterate, so full sync does not call the workflow API for every historical
pipeline row ever stored in the tool table.
for an entire project.
How to reproduce
_tool_circleci_pipelinescontains at least one pipeline ID whereGET /v2/pipeline/{id}returns 200 butGET /v2/pipeline/{id}/workflowreturns 500.createdstate with no associated workflows,caused by a failed or corrupt webhook trigger on the server.
trigger because
collectWorkflowsiterates all DB pipeline rows without atime filter).
collectWorkflowsfail with retry-exceeded 500;collectJobsanddownstream converters do not run for the entire project.
To find candidate pipelines on your CircleCI Server instance:
Anything else
Operator workaround (per pipeline)
Delete the bad pipeline row from the tool table, then re-sync:
This is not durable — new bad records or full-sync iteration over remaining
historical rows can trigger the same failure again.
Related issues (not duplicates)
collectWorkflows404 after retention; time-range fix oncollectPipelinesonly — closed/v2/workflow//job500 incollectJobs(empty workflow ID) — closed, #8912No open issue covers 500 on
/v2/pipeline/{valid-id}/workflowincollectWorkflows.Frequency
Occurs whenever collection reaches a pipeline with a broken workflow endpoint.
On full sync over a project with an extended pipeline history in
_tool_circleci_pipelines, the probability of hitting such a record increasessignificantly. Projects that have been active for over a year are most at risk.
Version
v1.0.3-beta12
Are you willing to submit PR?
Code of Conduct