feat(aws_tools): add S3 File Uploader and S3 File Download tools#3273
Conversation
Add two new builtin tools to tools/aws so that Dify workflows can move file objects (not just text) between workflow nodes and S3: - s3_file_uploader: takes a file variable from an upstream node and uploads it to a configurable bucket/key, optionally returning a presigned URL. - s3_file_download: takes an s3://bucket/key URI and emits a Dify file (via create_blob_message) plus structured metadata for downstream consumption. Why --- The existing s3_operator only handles text payloads (text_content in, UTF-8 text out), so it can't be wired directly to a Start node 'file' input or to any tool that emits binary file variables. These two tools close that gap with the same UX and parameter conventions as s3_operator. Implementation notes -------------------- - Both tools are self-contained (credential-resolution helpers are inlined) so this PR does not introduce a shared utils/ module. - They reuse the existing aws_tools provider's credentials_for_provider schema (Access Key / Secret Key / Region) and additionally accept a per-invocation aws_session_token for STS / role-assumption use cases. - Three-language labels (en_US / zh_Hans / pt_BR) match the rest of the plugin's tools; identity.author follows existing convention (AWS). - Bumped manifest.yaml version from 0.0.26 to 0.0.27. - README.md Features section updated. - Code formatted with black (-l 100); ruff check passes clean. - No new dependencies (boto3/botocore already in pyproject.toml). Validation ---------- Static: - python -m py_compile on both .py files - yaml.safe_load on all touched yaml files - Verified extra.python.source paths resolve correctly - black --check + ruff check both clean on the new files End-to-end (real run, not dry validation): - Built a .difypkg from tools/aws/ on this branch - Installed it on a self-hosted Dify 1.14.2 Community Edition - Imported a workflow [Start file -> s3_file_uploader -> s3_file_download -> End], pointed at S3 in cn-northwest-1 - Triggered the workflow via the Service API with text/PNG payloads - Result: status=succeeded, total_steps=4, elapsed ~0.4-0.8s - Pulled both objects back from S3 via aws s3 cp and SHA-256 verified byte-for-byte identical with the local source files - (Companion regression run on aws-samples/dify-aws-tool#168 also covered PDF binary, generate_presign_url=true, and STS aws_session_token paths with the same code; all green and SHA-256 identical.) Origin / attribution -------------------- Implementation derived from the public s3_file_uploader.py / s3_file_download.py in r3-yamauchi/dify-my-aws-tools-plugin (Apache-2.0). The author has confirmed he is happy for these two tools to be contributed upstream to langgenius/dify-official-plugins with no attribution requirement; comments translated to English to match surrounding files. The companion aws-samples/dify-aws-tool PR langgenius#168 contains the same code.
There was a problem hiding this comment.
Code Review
This pull request introduces two new AWS S3 tools: AWS S3 File Uploader and AWS S3 File Download, enabling workflows to upload files to S3 (with optional presigned URLs) and download S3 objects as Dify file variables. The review feedback identifies a critical thread-safety issue where caching the s3_client as an instance attribute could lead to race conditions or credential leakage across concurrent executions. It is recommended to initialize the client locally within the _invoke method instead. Additionally, the feedback suggests improving error handling for S3 bucket/key exceptions and safely parsing the presign_expiry parameter to prevent runtime crashes.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Apply concrete code-review feedback from gemini-code-assist on PR langgenius#3273: 1. Thread safety / credential leakage (high-priority) - Move boto3 client construction from cached `self.s3_client` to a local variable inside `_invoke`. Tool instances are reused by the plugin runtime across concurrent invocations, so a cached client tied to one tenant's credentials must never leak into another execution. Creating an S3 client is lightweight (no network I/O) so there is no real cost to building it per invocation. - Drop the now-unused `_reset_clients_on_credential_change` and `_credential_signature` helpers (and the `Iterable` import). They tried to address the same race but were inherently fragile under concurrency. 2. Standardised exception handling in s3_file_download - Switch from `self.s3_client.exceptions.NoSuchBucket` / `NoSuchKey` (which depended on the cached instance attribute) to standard `ClientError` error-code matching via `exc.response["Error"]["Code"]`. 3. Robust filename extraction in s3_file_download - Tolerate trailing slashes in the S3 key (e.g. `s3://bucket/foo/`) so the emitted Dify file's `filename` is never empty. 4. Safe presign_expiry parsing in s3_file_uploader - Extracted a small `_parse_presign_expiry` helper that tolerates None / empty string / non-numeric input and falls back to the default of 3600 seconds, instead of letting `int(None)` raise TypeError when the optional Dify number field is left blank. Validation ---------- - black -l 100 + ruff check both clean. - End-to-end re-validation on a fresh self-hosted Dify 1.14.2: built a .difypkg from this branch, installed it, and ran the regression matrix again - text/plain, image/png, application/pdf, generate_presign_url with curl-fetch, and STS aws_session_token via `aws sts get-session-token`. All six runs returned status=succeeded; SHA-256 byte-for-byte identical on every round-trip. Unit-tested `_parse_presign_expiry` against None / "" / 600 / "600" / "not a number" / 3.14 / custom-default; all 7 cases produce the expected fall-back behaviour. Refs ---- PR review: langgenius#3273 (review)
…ins#3273 Same set of fixes applied to the companion PR on the upstream langgenius/dify-official-plugins repo (#3273), surfaced by gemini-code-assist review: 1. Thread safety: replace cached self.s3_client with a local boto3 client created inside each _invoke. Drops the helper functions _reset_clients_on_credential_change and _credential_signature. 2. Standardised ClientError error-code matching for NoSuchBucket / NoSuchKey (no longer relies on the dropped instance-attribute exceptions namespace). 3. Tolerate trailing slashes in the S3 key when deriving filename. 4. Safe presign_expiry parsing (None / empty / non-numeric all fall back to 3600 instead of crashing with TypeError). Re-validated end to end: TXT / PNG / PDF / presign URL / STS session token paths all succeed with byte-for-byte SHA-256 match.
|
Thanks for the review @gemini-code-assist[bot] — all four points accepted and applied in
Re-validation
End-to-end re-run on a fresh self-hosted Dify 1.14.2 with this branch packaged into a
Plus a small unit-style check for The companion AWS samples PR (aws-samples/dify-aws-tool#168) has been updated with the same fixes. |
|
That is excellent news, @leoou331. Thank you for the comprehensive re-validation and for applying the requested improvements so thoroughly. The extracted |
Follow-up: batch counterpartsPushed commit Tool surface (new tools only)
Design choices
Validation
Files Out of scope (kept for a follow-up)
|
What this PR does
Adds two new builtin tools to the
aws_toolsplugin (tools/aws):s3_file_uploader— takes afilevariable from an upstream workflow node (Start file input, LLM output, another tool, etc.) and uploads it to a configurable S3 bucket/key. Optionally returns a presignedGETURL.s3_file_download— takes ans3://bucket/keyURI and emits the object as a Dify file (viacreate_blob_message) plus structured metadata for downstream nodes.Why
The existing
s3_operatoris text-only:text_content: stringstringIt cannot be wired to a
Startnode'sfileinput, nor consume the file output of nodes likeFrame Extractor/Nova Canvas. These two new tools close that gap and keep the same UX as the rest of the plugin (provider-level credentials, optional per-tool overrides, three-language labels).A typical workflow now looks like:
Files
Both Python modules are self-contained — credential-resolution helpers (
_resolve_aws_credentials,_build_boto3_client_kwargs,_reset_clients_on_credential_change) are inlined so this PR does not introduce a sharedutils/module that the rest of the plugin doesn't use.Tool surface
s3_file_uploader(form parameters)input_filefilebucket_namestrings3://.key_prefixstringworkflow-outputs.object_keystringaws_regionstringaws_access_key_id/aws_secret_access_key/aws_session_tokenstringgenerate_presign_urlbooleanfalse.presign_expirynumber3600seconds.Outputs three messages:
text=s3://bucket/keyor presigned URL;json={bucket_name, object_key, s3_uri, presigned_url?, presign_expiry?}; nofiles.s3_file_downloads3_uristring(LLM-fillable)s3://bucket/key.aws_region/aws_access_key_id/aws_secret_access_key/aws_session_tokenstringOutputs
files = [<Dify file>],json = {bucket, key, content_type, content_length, etag, last_modified, s3_uri}, and akey: valuetextblock of the same metadata.Validation
Static:
python -m py_compileon both.pyfiles — ✅yaml.safe_loadon the new + modified yaml files — ✅extra.python.sourcepaths resolve to existing files — ✅black --check -l 100andruff checkboth clean on the new files — ✅label/descriptionlanguages match the rest of the plugin (en_US/zh_Hans/pt_BR) — ✅pyproject.toml) — ✅End-to-end (real run, not dry validation):
.difypkgfromtools/aws/on this branch.[Start (file input) -> s3_file_uploader -> s3_file_download -> End], pointed at S3 bucket incn-northwest-1.hello.txt, 244 B) and image/png (50×50 RGBA, 144 B).status = succeeded,total_steps = 4, elapsed ~0.4-0.8s.aws s3 cpand compared SHA-256 — byte-for-byte identical with the local source.A companion run on the equivalent
aws-samples/dify-aws-tool#168PR (same code) additionally covered:generate_presign_url=true— URL contains correct SigV4 fields, customX-Amz-Expireshonored,curlfetch returned HTTP 200 with byte-identical contentaws_session_token— temporary credentials path round-tripped successfullyOut of scope
s3_operatorbehaviorutils/module — left for a follow-up if more tools want the helpers