[Bug][CircleCI] `collectWorkflows` aborts entire sync when pipeline workflow API returns HTTP 500

### Search before asking

- [x] I had searched in the [issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and found no similar issues.


### What happened


The CircleCI `collectWorkflows` subtask fails when DevLake calls
`GET /v2/pipeline/{pipeline_id}/workflow` for a pipeline whose workflow
endpoint returns HTTP 500 from CircleCI, even though `GET /v2/pipeline/{id}`
returns valid metadata (200 OK with a populated body).

The subtask retries three times, then aborts. Because `collectWorkflows` runs
before `collectJobs`, **no workflows or jobs are collected for the entire
project** on that run — CI/CD and DORA metrics go stale.

#### Environment

| | |
| --- | --- |
| DevLake version | `v1.0.3-beta12` |
| Plugin | `circleci` |
| Database | MySQL 8.x |
| CircleCI deployment type | **CircleCI Server** (self-hosted) — **not** CircleCI Cloud |
| CircleCI Server version | 4.9.4 |
| Trigger | Project blueprint data collection (full or incremental) |

> **Note on CircleCI Server vs Cloud:** This bug is reproducible on a
> self-hosted **CircleCI Server** instance. It has not been verified on
> CircleCI Cloud (`circleci.com`), but the DevLake code path is identical
> for both. CircleCI Server exposes the same `/v2/` API surface; the broken
> workflow endpoint behaviour described here may be specific to self-hosted
> versions where individual pipeline records can become corrupt or stuck.

#### Error / logs

```
subtask collectWorkflows ended unexpectedly
caused by: Retry exceeded 3 times calling /v2/pipeline/<pipeline-id>/workflow.
The last error was: Http DoAsync error calling [method:GET path:/v2/pipeline/<pipeline-id>/workflow query:map[]].
Response: {"message":"Internal Server Error"} (500)
```


#### Reproduced against CircleCI Server API directly

Pipeline metadata succeeds:

```bash
curl -s -H "Circle-Token: $TOKEN" \
  "https://<your-circleci-server>/api/v2/pipeline/<pipeline-id>"
# Returns: {"id":"<pipeline-id>","errors":[],"project_slug":"gh/<org>/<repo>","state":"created", ...}
```

Workflow list fails with 500:

```bash
curl -s -H "Circle-Token: $TOKEN" \
  "https://<your-circleci-server>/api/v2/pipeline/<pipeline-id>/workflow"
# Returns: {"message":"An internal server error occurred."} (HTTP 500)
```

The same pipeline ID returns 200 on the `/pipeline/{id}` endpoint but 500 on
`/pipeline/{id}/workflow`. This is a CircleCI Server-side condition (corrupt
or stuck pipeline record) that DevLake cannot prevent, but must handle
gracefully.

#### Affected pipeline (example shape)

| Field | Value |
| --- | --- |
| Pipeline ID | `<uuid>` (valid, returned by `/project/{slug}/pipeline` pagination) |
| Project | `gh/<org>/<repo>` |
| State | `created` (stuck — workflows never materialized) |
| Trigger | Webhook / pull request event |

Pipelines in `created` state with no workflows are a known occurrence on
CircleCI Server when a webhook fires but the server fails to create workflow
records internally.

#### Root cause (DevLake side)

`collectWorkflows` iterates **every row** in `_tool_circleci_pipelines` for the
project on full sync (no `SyncPolicy.TimeAfter` filter on the DB query). For each
row it calls the workflow API. The plugin only skips **404** responses via
`ignoreDeletedBuilds` in `shared.go` — **500 is not skipped**, so one bad
pipeline record kills the subtask.

This is **not** the same bug as [#8907](https://github.com/apache/devlake/issues/8907)
(empty workflow ID → `/v2/workflow//job` 500 in `collectJobs`).

#### Root cause (CircleCI Server side)

For at least one pipeline, CircleCI Server returns 500 on the workflow endpoint
while pipeline metadata is available — likely a corrupt or stuck pipeline record
on the server (state `created` since creation, workflows never materialized).
This has been observed on a self-hosted CircleCI Server instance. It is unclear
whether CircleCI Cloud can produce this condition.


### What do you expect to happen


1. When CircleCI returns **404** or **500** for a single pipeline's workflow
   endpoint, DevLake should **log and skip** that pipeline and continue collecting
   workflows for the rest of the project.
2. `collectWorkflows` should respect the blueprint **Data Time Range**
   (`SyncPolicy.TimeAfter`) when choosing which `_tool_circleci_pipelines` rows
   to iterate, so full sync does not call the workflow API for every historical
   pipeline row ever stored in the tool table.
3. A single bad pipeline on CircleCI Server should not block CI/CD collection
   for an entire project.

### How to reproduce


1. Configure a CircleCI connection pointing at a **CircleCI Server** instance.
2. Ensure `_tool_circleci_pipelines` contains at least one pipeline ID where
   `GET /v2/pipeline/{id}` returns 200 but
   `GET /v2/pipeline/{id}/workflow` returns 500.
   - These are typically pipelines in `created` state with no associated workflows,
     caused by a failed or corrupt webhook trigger on the server.
3. Run CircleCI data collection for that project (full sync is the most reliable
   trigger because `collectWorkflows` iterates all DB pipeline rows without a
   time filter).
4. Observe `collectWorkflows` fail with retry-exceeded 500; `collectJobs` and
   downstream converters do not run for the entire project.

**To find candidate pipelines on your CircleCI Server instance:**

```bash
# List project pipelines and look for state=created with no items in /workflow
curl -s -H "Circle-Token: $TOKEN" \
  "https://<your-circleci-server>/api/v2/project/gh/<org>/<repo>/pipeline" \
  | jq '.items[] | select(.state=="created") | .id'

# Then test each candidate:
curl -s -H "Circle-Token: $TOKEN" \
  "https://<your-circleci-server>/api/v2/pipeline/<candidate-id>/workflow"
```


### Anything else


#### Operator workaround (per pipeline)

Delete the bad pipeline row from the tool table, then re-sync:

```sql
DELETE FROM _tool_circleci_pipelines
WHERE id = '<pipeline-id-returning-500>';
```

This is not durable — new bad records or full-sync iteration over remaining
historical rows can trigger the same failure again.

#### Related issues (not duplicates)

| Issue | Relationship |
| --- | --- |
| [#7797](https://github.com/apache/devlake/issues/7797) | `collectWorkflows` 404 after retention; time-range fix on `collectPipelines` only — closed |
| [#8907](https://github.com/apache/devlake/issues/8907) | `/v2/workflow//job` 500 in **`collectJobs`** (empty workflow ID) — closed, [#8912](https://github.com/apache/devlake/pull/8912) |
| [#8309](https://github.com/apache/devlake/issues/8309) | Malformed workflow JSON in **convert** phase — closed |

No open issue covers **500 on `/v2/pipeline/{valid-id}/workflow`** in
`collectWorkflows`.

#### Frequency

Occurs whenever collection reaches a pipeline with a broken workflow endpoint.
On full sync over a project with an extended pipeline history in
`_tool_circleci_pipelines`, the probability of hitting such a record increases
significantly. Projects that have been active for over a year are most at risk.

### Version

v1.0.3-beta12

### Are you willing to submit PR?

- [x] Yes I am willing to submit a PR!

### Code of Conduct

- [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug][CircleCI] `collectWorkflows` aborts entire sync when pipeline workflow API returns HTTP 500 #8948

Search before asking

What happened

Environment

Error / logs

Reproduced against CircleCI Server API directly

Affected pipeline (example shape)

Root cause (DevLake side)

Root cause (CircleCI Server side)

What do you expect to happen

How to reproduce

Anything else

Operator workaround (per pipeline)

Related issues (not duplicates)

Frequency

Version

Are you willing to submit PR?

Code of Conduct

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development


DevLake version	`v1.0.3-beta12`
Plugin	`circleci`
Database	MySQL 8.x
CircleCI deployment type	CircleCI Server (self-hosted) — not CircleCI Cloud
CircleCI Server version	4.9.4
Trigger	Project blueprint data collection (full or incremental)

Field	Value
Pipeline ID	`<uuid>` (valid, returned by `/project/{slug}/pipeline` pagination)
Project	`gh/<org>/<repo>`
State	`created` (stuck — workflows never materialized)
Trigger	Webhook / pull request event

Issue	Relationship
#7797	`collectWorkflows` 404 after retention; time-range fix on `collectPipelines` only — closed
#8907	`/v2/workflow//job` 500 in `collectJobs` (empty workflow ID) — closed, #8912
#8309	Malformed workflow JSON in convert phase — closed

Uh oh!

[Bug][CircleCI] collectWorkflows aborts entire sync when pipeline workflow API returns HTTP 500 #8948

Description

Search before asking

What happened

Environment

Error / logs

Reproduced against CircleCI Server API directly

Affected pipeline (example shape)

Root cause (DevLake side)

Root cause (CircleCI Server side)

What do you expect to happen

How to reproduce

Anything else

Operator workaround (per pipeline)

Related issues (not duplicates)

Frequency

Version

Are you willing to submit PR?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[Bug][CircleCI] `collectWorkflows` aborts entire sync when pipeline workflow API returns HTTP 500 #8948