Skip to content

Optimize media asset bucket listing#415

Draft
digital-pro wants to merge 1 commit intomainfrom
optimize-bucket-calls
Draft

Optimize media asset bucket listing#415
digital-pro wants to merge 1 commit intomainfrom
optimize-bucket-calls

Conversation

@digital-pro
Copy link
Collaborator

Summary

  • Align bucket prefixes to match audio/ and visual/ layout
  • Limit GCS list calls to the intended prefixes and tighten path validation
  • Fix recursive pagination argument order to keep filtering consistent

Issue

Listing could include unrelated paths when bucket prefixes or validation didn’t align with the current bucket structure, causing extra listing work and risk of pulling unrelated assets.

Fix

Bucket name generation and prefix selection now mirror the actual layout, and validation explicitly enforces audio//... and visual//....

Savings

Estimated list-traffic reduction when 6 tasks run:

  • Prior worst-case list payload ~0.8 MB per call
  • 4 list calls per task → ~24 calls → ~19 MB total list traffic
  • With prefix scoping, list traffic should drop to a few MB total (depends on asset counts under each prefix)

Test plan

  • Not run (logic-only change)
  • Optional: run a task in dev and confirm assets resolve for audio and visual prefixes

Align bucket prefixes with audio/visual layout to reduce listing scope.

if (data.nextPageToken) {
return getMediaAssets(bucketName, whitelist, taskName, language, data.nextPageToken, categorizedObjects);
return getMediaAssets(bucketName, whitelist, language, taskName, data.nextPageToken, categorizedObjects);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the arguments are flipped here. @digital-pro did you run the tasks locally to check if this PR works?

return `${bucket}/${taskName}`;
}

return `${bucket}/${taskName}`;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this:

  if (assetType === 'visual') {
    return `${bucket}/${taskName}`;
  }

is unnecessary since the default return is the same as well.


return `${bucket}/${assetType === 'audio' ? language : taskName}`;
if (assetType === 'audio') {
return language ? `${bucket}/${language}` : bucket;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So previously if the language was not present it was adding an extra / and that was causing issues?

@digital-pro digital-pro marked this pull request as draft February 2, 2026 17:11
@digital-pro
Copy link
Collaborator Author

Sorry, I posted to the -engineering channel when I found the Sentry report over the weekend that this was mostly a placeholder and that someone who knows the code better should figure out if this is the right approach to fixing it. I didn't feel competent to know, but wanted to flag that the issue wasn't the initially blamed large asset, but large object name retrievals.

@asengupta3
Copy link
Collaborator

@zwatson2001 do you think we can look into this?

@digital-pro
Copy link
Collaborator Author

digital-pro commented Feb 2, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants