Skip to content

source-stripe-native: speed up fetching connected account ids with concurrent worker system#4229

Open
Alex-Bair wants to merge 1 commit intomainfrom
bair/source-stripe-native-speed-up-fetching-connected-accounts
Open

source-stripe-native: speed up fetching connected account ids with concurrent worker system#4229
Alex-Bair wants to merge 1 commit intomainfrom
bair/source-stripe-native-speed-up-fetching-connected-accounts

Conversation

@Alex-Bair
Copy link
Copy Markdown
Member

Description:

The previous _fetch_connected_account_ids implementation was a single sequential paginator through GET /v1/accounts. For platforms with 36k+ connected accounts, this was taking over an hour. That's too slow since all connected account ids are fetched each time the capture starts up; we'd rather not spend an hour fetching connected account ids before capturing any data.

This commit replaces that sequential paginator with a concurrent worker system modeled after source-klaviyo-native's events backfill. The concurrent worker system partitions the time range into chunks using Stripe's created[gte]/created[lte] query parameters and has multiple workers paginate through their respective chunks in parallel.

Workers detect dense time windows (chunks that take >30s to paginate) and, when idle workers are available, submit the remaining unprocessed range to a subdivision worker that splits it into smaller chunks for other workers to pick up.

Notes for reviewers:

This is an isolated change that's a drop in replacement for how the connector fetches connected account ids. The very similar concurrent worker system in source-klaviyo-native has been working well for multiple months, and I anticipate it'll work well in source-stripe-native too to speed up fetching all connected account ids by at least 5x, likely more.

…ncurrent worker system

The previous `_fetch_connected_account_ids` implementation was a single
sequential paginator through GET `/v1/accounts`. For platforms with
36k+ connected accounts, this was taking over an hour. That's too slow
since all connected account ids are fetched each time the capture starts
up; we'd rather not spend an hour fetching connected account ids before
capturing any data.

This commit replaces that sequential paginator with a concurrent worker
system modeled after `source-klaviyo-native`'s events backfill that
partitions the time range into chunks using Stripe's
`created[gte]`/`created[lte]` query parameters and has multiple workers
paginate through their respective chunks in parallel.

Workers detect dense time windows (chunks that take >30s to paginate)
and, when idle workers are available, submit the remaining unprocessed
range to a subdivision worker that splits it into smaller chunks for
other workers to pick up.
@Alex-Bair
Copy link
Copy Markdown
Member Author

Note: most of the CI checks failed due to an intermittent GitHub issue, but the one for source-stripe-native did succeed once GitHub stopped returning 504s.

@Alex-Bair Alex-Bair marked this pull request as ready for review April 13, 2026 16:22
@Alex-Bair Alex-Bair requested a review from a team April 13, 2026 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant