feat(aws_tools): add batch S3 uploader/downloader (s3_files_uploader, s3_files_download)#3276
Conversation
… s3_files_download)
Add batch counterparts to the new s3_file_uploader / s3_file_download tools
introduced by this PR, so a single workflow node can process N files in one
invocation instead of forcing users to wrap the single-file tools in an
Iteration node.
New tools (tools/aws/tools/):
- s3_files_uploader (input_files: files; key_prefix; per-entry presigned URL)
- s3_files_download (s3_uris: array; emits one Dify file blob per success)
Behavior:
- Per-entry failure isolation: a single bad file/URI does not abort the
batch; status='ok'|'failed' (+ error) is captured per entry in
results[]. The whole invocation only emits a top-level error message
when *every* entry fails.
- Uploader auto-disambiguates duplicate filenames in the same batch
(image.png, image-1.png, image-2.png) so concurrent upstream branches
with identical filenames do not silently overwrite each other.
- Downloader yields blobs in input order; failed entries simply produce
no blob and downstream nodes can correlate via results[].
- Both tools reuse the same inline credential helpers as the single-file
versions (no shared utils/ module introduced).
Registry / metadata:
- tools/aws/manifest.yaml: 0.0.27 -> 0.0.28
- tools/aws/provider/aws_tools.yaml: register both new tool yaml files
- tools/aws/README.md: add Features lines for the batch variants
Validation:
* python -m py_compile + yaml.safe_load on the new files: clean
* black --check -l 100 + ruff check: clean
* 10 mock-boto3 unit tests covering basic batch, dedup, partial failure
(ClientError), all-fail top-level error, empty input, presign error
isolation, partial download (NoSuchKey), invalid URI: 10/10 pass
* End-to-end on Dify 1.14.2 Community Edition + real S3 (cn-northwest-1):
- Workflow [Start file-list -> s3_files_uploader -> extract URIs ->
s3_files_download -> summarize -> End], 3 files (txt 40B + png 220B
+ pdf 540B), elapsed ~1.2s, all 6 steps succeeded
- SHA-256 round-trip byte-identical for all 3 files (pulled back via
aws s3 cp and compared with the source)
- Each presigned URL returned by the uploader fetched via curl and
verified byte-identical to the source
- Partial-failure run (2 valid + 1 bogus s3:// URI): downloader
returned count=3, ok=2, failed=1 with a NoSuchKey error string for
the bogus URI and exactly 2 file blobs yielded in input order
Out of scope (kept for a follow-up):
- Multipart upload / streaming for large files
- Any change to existing s3_operator / s3_file_uploader / s3_file_download
- Shared utils/ module
There was a problem hiding this comment.
Code Review
This pull request introduces two new tools to the AWS plugin: AWS S3 Batch File Uploader and AWS S3 Batch File Download, enabling multi-file S3 operations with per-file failure isolation in a single invocation. The review feedback suggests optimizing memory usage in the batch download tool by yielding file blobs immediately rather than buffering them in memory, which helps prevent Out-Of-Memory (OOM) errors. Additionally, it recommends catching general exceptions during presigned URL generation in the batch uploader to ensure that secondary presigning failures do not incorrectly fail the entire upload process.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
- s3_files_download: yield each blob inline during the download loop
instead of buffering all file bytes in memory and yielding at the
end. The buffer-then-flush pattern made peak RSS scale with N x
file_size, which can trip the Dify plugin container's 256 MB memory
limit on a batch of large files. Inline yield keeps peak RSS bounded
by a single file's size.
- s3_files_uploader: catch a broader Exception around
generate_presigned_url instead of just ClientError. The presign call
is a client-side helper that can raise ParamValidationError, other
BotoCoreError subclasses, or unrelated runtime errors; letting any
of those propagate would fail the whole batch even though the
upload itself already succeeded. We still record the message in
entry['presign_error'] so the result remains observable.
Validation:
- python -m py_compile + black --check -l 100 + ruff check: clean
- Mock unit tests (10/10): still pass
- End-to-end re-run on Dify 1.14.2 + real S3 (cn-northwest-1):
- 3-file batch: succeeded, sizes [40, 220, 540] match, order preserved
- Partial-failure download (2 valid + 1 bogus s3:// URI): count=3,
ok=2, failed=1, files yielded in input order ['a.txt', 'img.png']
(failed entry produces no blob, downstream nodes correlate via
json results)
|
Thanks @gemini-code-assist for the review! Both points adopted in
Re-validation after the fix:
|
What this PR does
Follow-up to #3273 (which added the single-file
s3_file_uploader/s3_file_download). This PR adds batch counterparts so a single workflow node can process N files in one invocation, instead of forcing users to wrap the single-file tools in an Iteration node.s3_files_uploader— takesinput_files(type: files) from an upstream node and uploads each to a configurable S3 bucket with optional per-object presignedGETURLs.s3_files_download— takess3_uris(type: arrayofs3://bucket/key) and emits one Dify file blob per success in input order, plus structured metadata for downstream nodes.The single-file tools are unchanged.
Why
For many real workflows (
Frame Extractor,Nova CanvaswithnumberOfImages > 1, multi-file user uploads from a Start node), the natural input is N files at once. Wrapping the single-file tools in an Iteration node works but:A typical batch workflow now looks like:
Tool surface
s3_files_uploaderinput_filesfilesarray[file]/file-listvariable.bucket_namestrings3://.key_prefixstring{prefix}/{filename}.aws_region/aws_access_key_id/aws_secret_access_key/aws_session_tokenstringgenerate_presign_urlbooleanfalse; produces a presignedGETURL per object.presign_expirynumber3600seconds.Outputs:
json = {count, ok, failed, results: [{index, bucket_name, object_key, s3_uri, presigned_url?, presign_expiry?, status: "ok"|"failed", error?}, ...]}plus a per-linetextsummary (one line per entry: presigned URL when available, else s3_uri, elseFAILED [i]: <error>).s3_files_downloads3_urisarray(LLM-fillable)s3://bucket/key.aws_region/aws_access_key_id/aws_secret_access_key/aws_session_tokenstringOutputs:
json = {count, ok, failed, results: [{index, s3_uri, bucket, key, content_type, content_length, etag, last_modified, filename, status, error?}, ...]}, a per-linetextsummary (bucket / key / content_lengthfor each success,FAILED [i] <s3_uri>: <error>for each failure), and one Difyfileblob per successful URI in input order (failed entries simply produce no blob; downstream nodes correlate viaresults).Design choices
object_keyoverride on the batch uploader. A single override cannot apply to N files. The final key per file is always derived from the file's ownfilename(or a UUID fallback) and optionally prepended withkey_prefix. The batch uploader auto-disambiguates duplicate filenames in the same batch (image.png/image-1.png/image-2.png) so concurrent upstream branches with identical filenames don't silently overwrite each other.status=ok|failed(+error) is captured per entry inresults[]. The whole invocation only emits a top-level error message when every entry fails, so downstream nodes still see a clear failure signal in the all-failed case._resolve_aws_credentials/_build_boto3_client_kwargsshape introduced in feat(aws_tools): add S3 File Uploader and S3 File Download tools #3273 — no sharedutils/module is introduced. The boto3 client is created once per_invokecall and reused across batch entries within that one call.Files
Validation
Static:
python -m py_compileon both.pyfiles — ✅yaml.safe_loadon the new + modified yaml files — ✅extra.python.sourcepaths resolve to existing files — ✅black --check -l 100andruff checkboth clean on the new files — ✅label/descriptionlanguages match the rest of the plugin (en_US/zh_Hans/pt_BR) — ✅pyproject.tomlfrom feat(aws_tools): add S3 File Uploader and S3 File Download tools #3273) — ✅Mock unit tests (10/10 pass): basic batch upload with presign, duplicate-filename dedup, partial upload failure (ClientError), all-fail top-level error, empty input, presign error isolation (upload OK + presign fails), basic batch download, partial download failure (NoSuchKey), invalid URI yields no blob, empty s3_uris.
End-to-end on Dify 1.14.2 Community Edition + real S3 (
cn-northwest-1):.difypkgfromtools/aws/on this branch, installed on a self-hosted instance.[Start (file-list) -> s3_files_uploader -> Code (extract URIs) -> s3_files_download -> Code (summary) -> End].a.txt, 40 B), image/png (100×100 RGBA, 220 B), and application/pdf (doc.pdf, 540 B).status = succeeded, all 6 steps green, elapsed ~1.2 s.upload_urisanddownload_keysmatched input order;download_sizes = [40, 220, 540].aws s3 cpand compared SHA-256 — byte-for-byte identical for all 3 files:a.txtc377b72e7343c1642a35c7ff5108fef6c14fc5fba3aecce89c18d9ac526e4de8img.png5a2fe18dbec51b2426f8aa31f6424b0efff246497646f1aa2314abe8d09b7aecdoc.pdf7b6fed1b75159c5cbc633e04f9011a1a9e4f22efce2621b8e14646064cf8c6fagenerate_presign_url=true, every entry'spresigned_urlwas fetched viacurland produced byte-identical content to the local source.s3://URI) on the download tool: returnedcount=3, ok=2, failed=1with aNoSuchKeyerror string for the bogus URI, and exactly 2 file blobs yielded in input order.Out of scope (kept for a follow-up)
s3_operator/s3_file_uploader/s3_file_downloadutils/module