Eliminate HEAD requests during downloads, especially for faster transfers of small files by crowecawcaw · Pull Request #363 · boto/s3transfer

crowecawcaw · 2025-11-21T21:57:54Z

This PR optimizes downloads through the TransferManager by removing the upfront HEAD request. Previously, every download issued a HEAD request to determine object size before starting the GET request. This
change eliminates that extra round-trip by extracting metadata from the first GET response instead.

For small files, the download time is dominated by the request latency, so eliminating one of the two requests results in a ~50% download time reduction. For large files, the effect is less noticeable because the download time is dominated by the the transfer time and because there are multiple chunks to download. In both cases, we save the cost of the HEAD request.

What Changed

• Removed HEAD requests: Downloads now start immediately with a ranged GET request for the first chunk
• Dynamic size detection: Extract object size and ETag from the first GET response headers (ContentRange or ContentLength)
• Dynamic chunk scheduling: After the first chunk completes, schedule additional chunks only if the object is larger than the chunk size
• Simplified code flow: Consolidated download logic into a single path instead of branching on size upfront
• ETag consistency for all chunks: When an ETag is available (either pre-provided via a subscriber, e.g. from a prior HEAD request by the AWS CLI, or extracted from the first GET response), all subsequent ranged GET requests include an IfMatch header with that ETag. This ensures S3 rejects the request if the object changes mid-download. The first GET request also includes IfMatch if an ETag was pre-provided before the download started.

Testing

Unit, functional, and integ tests pass. I also added a new script to benchmark downloading many small files. For downloading 1000 1kB files on my laptop, the total duration dropped 41% from 15.0s to 8.9.

Backward Compatibility

External API unchanged. All download methods have the same signatures.

Flow Diagrams

Before (with HEAD request)

flowchart TD
    A[HEAD Request] --> C{Size < 8MB?}
    C -->|Yes| D[GET Request]
    C -->|No| E[Multiple GET Requests]
    D --> F[Complete]
    E --> F

After (no HEAD request)

flowchart TD
    A[GET First Chunk] --> B{Size < 8MB?}
    B -->|Yes| C[Complete]
    B -->|No| D[GET Remaining Chunks]
    D --> C

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

aemous

I left 1 nit. Note that I am not the primary reviewer for this PR, and we are still pending the primary review.

s3transfer/download.py

aemous · 2026-03-06T15:21:43Z

Looks like downloads may be failing when the object has a size of 0 bytes. The error I'm seeing is

botocore.exceptions.ClientError: An error occurred (InvalidRange) when calling the GetObject operation: The requested range is not satisfiable

Can you confirm whether you are able to reproduce this, and look into resolving it?

InvalidRange occurs when the range we are requesting has no overlap with the object itself. For 0 byte objects, this will always be the case if we are requesting the first 0-x bytes, since there are no bytes at all. The only feasible patch I can imagine here would be a try-except clause that handles InvalidRange by performing a non-ranged GET, which may be undesirable. If you have a better alternative for resolving this that doesn't involve needing an extra request, that would be preferred.

Signed-off-by: Stephen Crowe <6042774+crowecawcaw@users.noreply.github.com>

crowecawcaw · 2026-03-06T16:19:52Z

I see the same issue. I added an integ test, fixed the unit test to mirror the same behavior, then fixed the issue. Instead of sending a second Get request for an object we know is empty, it just returns the empty content.

aemous · 2026-03-12T20:03:08Z

@crowecawcaw Can you push a revision that satisfies the 'Lint code' GH Action?

In case you cannot see the Action details, I pasted it here:

pre-commit hook(s) made changes.
If you are seeing this message in CI, reproduce locally with: `pre-commit run --all-files`.
To run `pre-commit` as part of git workflow, use `pre-commit install`.
All changes made by hooks:
diff --git a/scripts/performance/time-batch-download.py b/scripts/performance/time-batch-download.py
index 93e8a2a..1cf04fc 100755
--- a/scripts/performance/time-batch-download.py
+++ b/scripts/performance/time-batch-download.py
@@ -2,10 +2,12 @@
 """Direct timing of batch downloads without shell wrapper."""
 
 import argparse
-import tempfile
 import shutil
+import tempfile
 import time
+
 from botocore.session import get_session
+
 from s3transfer.manager import TransferManager
 
 
@@ -20,13 +22,13 @@ def main():
     parser.add_argument('--file-size', type=int, required=True)
     parser.add_argument('--s3-bucket', required=True)
     args = parser.parse_args()
-    
+
     session = get_session()
     client = session.create_client('s3')
-    
+
     tempdir = tempfile.mkdtemp()
     s3_keys = []
-    
+
     try:
         # Upload files
         print(f"Uploading {args.file_count} files...")
@@ -37,7 +39,7 @@ def main():
                 s3_key = f"perf_test_{i}"
                 manager.upload(file_path, args.s3_bucket, s3_key)
                 s3_keys.append(s3_key)
-        
+
         # Download files
         print(f"Downloading {args.file_count} files...")
         start_time = time.time()
@@ -46,9 +48,9 @@ def main():
                 download_path = f"{tempdir}/download_{i}"
                 manager.download(args.s3_bucket, s3_key, download_path)
         duration = time.time() - start_time
-        
+
         print(f"Download duration: {duration:.2f} seconds")
-        
+
         # Cleanup
         for s3_key in s3_keys:
             client.delete_object(Bucket=args.s3_bucket, Key=s3_key)

crowecawcaw added 2 commits November 21, 2025 10:31

Avoid HEAD requests for downloads

68b0e69

Add benchmarking script for small files

c138cd0

crowecawcaw mentioned this pull request Dec 10, 2025

perf: avoid S3 HEAD requests when downloading job outputs aws-deadline/deadline-cloud#924

Closed

aemous reviewed Mar 2, 2026

View reviewed changes

s3transfer/download.py Outdated Show resolved Hide resolved

aemous reviewed Mar 4, 2026

View reviewed changes

s3transfer/download.py Show resolved Hide resolved

aemous reviewed Mar 4, 2026

View reviewed changes

s3transfer/download.py Show resolved Hide resolved

crowecawcaw added 2 commits March 6, 2026 08:01

address PR notes

fa19703

Signed-off-by: Stephen Crowe <6042774+crowecawcaw@users.noreply.github.com>

handle 0 range files

e4c77d5

Signed-off-by: Stephen Crowe <6042774+crowecawcaw@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eliminate HEAD requests during downloads, especially for faster transfers of small files#363

Eliminate HEAD requests during downloads, especially for faster transfers of small files#363
crowecawcaw wants to merge 4 commits intoboto:developfrom
crowecawcaw:callback

crowecawcaw commented Nov 21, 2025 •

edited

Loading

Uh oh!

aemous left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aemous commented Mar 6, 2026 •

edited

Loading

Uh oh!

crowecawcaw commented Mar 6, 2026

Uh oh!

aemous commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

crowecawcaw commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What Changed

Testing

Backward Compatibility

Flow Diagrams

Before (with HEAD request)

After (no HEAD request)

Uh oh!

aemous left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aemous commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crowecawcaw commented Mar 6, 2026

Uh oh!

aemous commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

crowecawcaw commented Nov 21, 2025 •

edited

Loading

aemous commented Mar 6, 2026 •

edited

Loading