Skip to content

feat(aws_tools): add S3 File Uploader and S3 File Download tools#168

Open
leoou331 wants to merge 2 commits into
aws-samples:mainfrom
leoou331:add-s3-file-uploader-and-download
Open

feat(aws_tools): add S3 File Uploader and S3 File Download tools#168
leoou331 wants to merge 2 commits into
aws-samples:mainfrom
leoou331:add-s3-file-uploader-and-download

Conversation

@leoou331

@leoou331 leoou331 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

What this PR does

Adds two new builtin tools to the aws_tools plugin:

  • s3_file_uploader — takes a file variable from an upstream workflow node (Start file input, LLM output, another tool, etc.) and uploads it to a configurable S3 bucket/key. Optionally returns a presigned GET URL.
  • s3_file_download — takes an s3://bucket/key URI and emits the object as a Dify file (via create_blob_message) plus structured metadata for downstream nodes.

Why

The existing s3_operator is text-only:

  • input is text_content: string
  • output is UTF-8 decoded string

It cannot be wired to a Start node's file input, nor consume the file output of nodes like Frame Extractor / Nova Canvas. These two new tools close that gap and keep the same UX as the rest of the plugin (provider-level credentials, optional per-tool overrides, three-language labels).

A typical workflow now looks like:

[Start: file input] -> [s3_file_uploader] -> ... -> [s3_file_download] -> [LLM/Code/...]

Files

plugins/aws_tools/
├── provider/aws_tools.yaml           # +2 lines (register both tools)
└── tools/
    ├── s3_file_uploader.py           # 201 lines
    ├── s3_file_uploader.yaml         # 139 lines
    ├── s3_file_download.py           # 175 lines
    └── s3_file_download.yaml         # 78 lines
README.md, README_ZH.md, README_JA.md # +2 lines each (tool table)

Both Python modules are self-contained — credential-resolution helpers (_resolve_aws_credentials, _build_boto3_client_kwargs, _reset_clients_on_credential_change) are inlined so this PR does not introduce a shared utils/ module that the rest of the repo doesn't use.

Tool surface

s3_file_uploader (form parameters)

Param Type Required Notes
input_file file yes Bound to upstream file variable.
bucket_name string yes Without s3://.
key_prefix string no Folder-style prefix, e.g. workflow-outputs.
object_key string no Override final key; defaults to incoming filename.
aws_region string no Per-tool override of provider default.
aws_access_key_id / aws_secret_access_key / aws_session_token string no Per-tool override / STS support.
generate_presign_url boolean no Default false.
presign_expiry number no Default 3600 seconds.

Outputs three messages: text = s3://bucket/key or presigned URL; json = {bucket_name, object_key, s3_uri, presigned_url?, presign_expiry?}; no files.

s3_file_download

Param Type Required Notes
s3_uri string (LLM-fillable) yes s3://bucket/key.
aws_region / aws_access_key_id / aws_secret_access_key / aws_session_token string no Same overrides as uploader.

Outputs files = [<Dify file>], json = {bucket, key, content_type, content_length, etag, last_modified, s3_uri}, and a key=value text block of the same metadata.

Validation

Static:

  • python -m py_compile on both .py files — ✅
  • yaml.safe_load on both new yaml files and the modified provider/aws_tools.yaml — ✅
  • Verified extra.python.source paths resolve to existing files — ✅
  • Confirmed label/description languages match the rest of the plugin (en_US / zh_Hans / pt_BR) — ✅

End-to-end (this is from a real run, not a dry validation):

  1. Built a .difypkg from plugins/aws_tools/ on this branch.
  2. Installed it on a self-hosted Dify 1.14.2 Community Edition.
  3. Imported a workflow [Start (file input) -> s3_file_uploader -> s3_file_download -> End], pointed at S3 bucket in cn-northwest-1.
  4. Triggered the workflow via the Service API with hello.txt (244 bytes).
  5. Result:
    • status = succeeded, total_steps = 4, elapsed = 1.0s
    • uploaded_uri = s3://dify-test-s3-function/pr-validation/hello.txt
    • downloaded_meta populated with bucket / key / content_type / content_length=244 / etag / last_modified / s3_uri
  6. Pulled the object back from S3 with aws s3 cp and compared SHA-256 — byte-for-byte identical with the local source.

Out of scope

  • Multipart upload / streaming for large files
  • Any change to existing s3_operator behavior
  • Shared utils/ module — left for a follow-up if more tools want the helpers

Add two new builtin tools to the aws_tools plugin so that Dify workflows
can move file objects (not just text) between workflow nodes and S3:

- s3_file_uploader: takes a file variable from an upstream node and uploads
  it to a configurable bucket/key, optionally returning a presigned URL.
- s3_file_download: takes an s3://bucket/key URI and emits a Dify file (via
  create_blob_message) plus structured metadata for downstream consumption.

Why
---
The existing s3_operator only handles text payloads (text_content in,
UTF-8 text out), so it can't be wired directly to a Start node 'file'
input or to any tool that emits binary file variables. These two tools
close that gap with the same UX and parameter conventions as s3_operator.

Implementation notes
--------------------
- Both tools are self-contained (credential-resolution helpers are
  inlined) so this PR does not introduce a shared utils/ module.
- They reuse the existing aws_tools provider's credentials_for_provider
  schema (Access Key / Secret Key / Region) and additionally accept a
  per-invocation aws_session_token for STS / role-assumption use cases.
- Three-language labels (en_US / zh_Hans / pt_BR) match the rest of the
  plugin's tools.

Validation
----------
- Static: yaml.safe_load all touched yaml files; py_compile both .py
  files; verified extra.python.source paths resolve correctly.
- End-to-end: packaged plugins/aws_tools/ from this branch into a
  .difypkg, installed it on a self-hosted Dify 1.14.2 instance, and
  ran a workflow [Start -> S3 Upload -> S3 Download -> End] against
  cn-northwest-1. status=succeeded, total_steps=4, elapsed=1.0s.
  Object SHA-256 verified byte-for-byte identical between local file
  and the S3 round-trip.

Origin
------
Implementation derived from the public S3 file uploader/download tools
in r3-yamauchi/dify-my-aws-tools-plugin (Apache-2.0). Author confirmed
he is happy for these two tools to be contributed upstream into
aws-samples/dify-aws-tool with no attribution requirement; comments
have been translated to English to match the surrounding files.
…ins#3273

Same set of fixes applied to the companion PR on the upstream
langgenius/dify-official-plugins repo (#3273), surfaced by
gemini-code-assist review:

1. Thread safety: replace cached self.s3_client with a local boto3
   client created inside each _invoke. Drops the helper functions
   _reset_clients_on_credential_change and _credential_signature.
2. Standardised ClientError error-code matching for NoSuchBucket /
   NoSuchKey (no longer relies on the dropped instance-attribute
   exceptions namespace).
3. Tolerate trailing slashes in the S3 key when deriving filename.
4. Safe presign_expiry parsing (None / empty / non-numeric all fall
   back to 3600 instead of crashing with TypeError).

Re-validated end to end: TXT / PNG / PDF / presign URL / STS session
token paths all succeed with byte-for-byte SHA-256 match.
@leoou331

Copy link
Copy Markdown
Contributor Author

Pushed 546e2a3 to mirror review fixes from the upstream langgenius PR (langgenius/dify-official-plugins#3273):

  1. Drop cached self.s3_client; build a local boto3 client per _invoke call (removes a thread-safety / cross-tenant credential-leak risk).
  2. Remove the now-unused _reset_clients_on_credential_change / _credential_signature helpers (-23 lines).
  3. Use standard ClientError error-code matching for NoSuchBucket / NoSuchKey in s3_file_download (no longer depends on the cached client's .exceptions namespace).
  4. Tolerate trailing slashes in the S3 key when deriving the downloaded filename.
  5. Safe presign_expiry parsing (None / "" / non-numeric all fall back to 3600 instead of TypeError).

Re-validated end to end on Dify 1.14.2: TXT / PNG / PDF / presign URL / STS aws_session_token paths all succeed; SHA-256 byte-for-byte identical on every round-trip.

crazywoola pushed a commit to langgenius/dify-official-plugins that referenced this pull request Jun 11, 2026
* feat(aws_tools): add S3 File Uploader and S3 File Download tools

Add two new builtin tools to tools/aws so that Dify workflows can move
file objects (not just text) between workflow nodes and S3:

- s3_file_uploader: takes a file variable from an upstream node and
  uploads it to a configurable bucket/key, optionally returning a
  presigned URL.
- s3_file_download: takes an s3://bucket/key URI and emits a Dify file
  (via create_blob_message) plus structured metadata for downstream
  consumption.

Why
---
The existing s3_operator only handles text payloads (text_content in,
UTF-8 text out), so it can't be wired directly to a Start node 'file'
input or to any tool that emits binary file variables. These two tools
close that gap with the same UX and parameter conventions as
s3_operator.

Implementation notes
--------------------
- Both tools are self-contained (credential-resolution helpers are
  inlined) so this PR does not introduce a shared utils/ module.
- They reuse the existing aws_tools provider's credentials_for_provider
  schema (Access Key / Secret Key / Region) and additionally accept a
  per-invocation aws_session_token for STS / role-assumption use cases.
- Three-language labels (en_US / zh_Hans / pt_BR) match the rest of the
  plugin's tools; identity.author follows existing convention (AWS).
- Bumped manifest.yaml version from 0.0.26 to 0.0.27.
- README.md Features section updated.
- Code formatted with black (-l 100); ruff check passes clean.
- No new dependencies (boto3/botocore already in pyproject.toml).

Validation
----------
Static:
- python -m py_compile on both .py files
- yaml.safe_load on all touched yaml files
- Verified extra.python.source paths resolve correctly
- black --check + ruff check both clean on the new files

End-to-end (real run, not dry validation):
- Built a .difypkg from tools/aws/ on this branch
- Installed it on a self-hosted Dify 1.14.2 Community Edition
- Imported a workflow [Start file -> s3_file_uploader -> s3_file_download
  -> End], pointed at S3 in cn-northwest-1
- Triggered the workflow via the Service API with text/PNG payloads
- Result: status=succeeded, total_steps=4, elapsed ~0.4-0.8s
- Pulled both objects back from S3 via aws s3 cp and SHA-256 verified
  byte-for-byte identical with the local source files
- (Companion regression run on aws-samples/dify-aws-tool#168 also covered
  PDF binary, generate_presign_url=true, and STS aws_session_token paths
  with the same code; all green and SHA-256 identical.)

Origin / attribution
--------------------
Implementation derived from the public s3_file_uploader.py /
s3_file_download.py in r3-yamauchi/dify-my-aws-tools-plugin
(Apache-2.0). The author has confirmed he is happy for these two tools
to be contributed upstream to langgenius/dify-official-plugins with
no attribution requirement; comments translated to English to match
surrounding files. The companion aws-samples/dify-aws-tool PR #168
contains the same code.

* fix(s3 tools): address gemini-code-assist review

Apply concrete code-review feedback from gemini-code-assist on PR #3273:

1. Thread safety / credential leakage (high-priority)
   - Move boto3 client construction from cached `self.s3_client` to a
     local variable inside `_invoke`. Tool instances are reused by the
     plugin runtime across concurrent invocations, so a cached client
     tied to one tenant's credentials must never leak into another
     execution. Creating an S3 client is lightweight (no network I/O)
     so there is no real cost to building it per invocation.
   - Drop the now-unused `_reset_clients_on_credential_change` and
     `_credential_signature` helpers (and the `Iterable` import).
     They tried to address the same race but were inherently fragile
     under concurrency.

2. Standardised exception handling in s3_file_download
   - Switch from `self.s3_client.exceptions.NoSuchBucket` /
     `NoSuchKey` (which depended on the cached instance attribute) to
     standard `ClientError` error-code matching via
     `exc.response["Error"]["Code"]`.

3. Robust filename extraction in s3_file_download
   - Tolerate trailing slashes in the S3 key (e.g. `s3://bucket/foo/`)
     so the emitted Dify file's `filename` is never empty.

4. Safe presign_expiry parsing in s3_file_uploader
   - Extracted a small `_parse_presign_expiry` helper that tolerates
     None / empty string / non-numeric input and falls back to the
     default of 3600 seconds, instead of letting `int(None)` raise
     TypeError when the optional Dify number field is left blank.

Validation
----------
- black -l 100 + ruff check both clean.
- End-to-end re-validation on a fresh self-hosted Dify 1.14.2: built a
  .difypkg from this branch, installed it, and ran the regression
  matrix again - text/plain, image/png, application/pdf, generate_presign_url
  with curl-fetch, and STS aws_session_token via `aws sts get-session-token`.
  All six runs returned status=succeeded; SHA-256 byte-for-byte identical
  on every round-trip. Unit-tested `_parse_presign_expiry` against
  None / "" / 600 / "600" / "not a number" / 3.14 / custom-default;
  all 7 cases produce the expected fall-back behaviour.

Refs
----
PR review: #3273 (review)

---------

Co-authored-by: leoou331 <leoou@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant