Skip to content

Set timeout for all requests in set_requests_session#1259

Open
tylanderson wants to merge 4 commits intoearthaccess-dev:mainfrom
tylanderson:ta/set-timeout-setsession-request
Open

Set timeout for all requests in set_requests_session#1259
tylanderson wants to merge 4 commits intoearthaccess-dev:mainfrom
tylanderson:ta/set-timeout-setsession-request

Conversation

@tylanderson
Copy link

@tylanderson tylanderson commented Mar 12, 2026

Description

Adds timeouts on requests during set_requests_session. I had encountered an issue where the first request would run much longer than expected before timing out, this will surface that error faster. Set to 10s for now.


"Ready for review" checklist

  • Open PR as draft
  • Please review our Pull Request Guide
  • Mark "ready for review" after following instructions in the guide

Merge checklist

  • PR title is descriptive
  • PR body contains links to related and resolved issues (e.g. closes #1)
  • If needed, CHANGELOG.md updated
  • If needed, docs and/or README.md updated
  • If needed, unit tests added
  • All checks passing (comment pre-commit.ci autofix if pre-commit is failing)
  • At least one approval

📚 Documentation preview 📚: https://earthaccess--1259.org.readthedocs.build/en/1259/

@github-actions
Copy link

github-actions bot commented Mar 12, 2026

Binder 👈 Launch a binder notebook on this branch for commit 15217ff

I will automatically update this comment whenever this PR is modified

Binder 👈 Launch a binder notebook on this branch for commit ecdfa8f

Binder 👈 Launch a binder notebook on this branch for commit ec14f86

Binder 👈 Launch a binder notebook on this branch for commit 6e82788

@tylanderson tylanderson marked this pull request as ready for review March 12, 2026 16:26
@mfisher87
Copy link
Member

Nice! Thank you! Totally agree a requests session should always have a timeout.

Ideally the user would have some control over this, but the current architecture of earthaccess makes that a bit challenging. A more object oriented approach (client/session interface) for earthaccess (#1250) would make this easier for the user to control. For now, maybe an environment variable would enable that.

I worry that a 10 second timeout might negatively impact folks with less reliable internet connections (e.g. working in remote areas, mobile hotspot, satellite, ??), so I'd suggest we implement some mechanism for the user to override before merging this PR.

Thoughts?

@tylanderson
Copy link
Author

Yeah agree that configuration currently is a bit difficult as things stand now. I ran a few tests with my network simulating lower performing networks (e.g. 3G) which did work, that being said I think having this be configurable via env is a good option for now, I'll add that in

@mfisher87
Copy link
Member

Thanks Tyler! ❤️

@mfisher87 mfisher87 requested review from betolink and chuckwondo and removed request for chuckwondo March 13, 2026 22:42
Copy link
Member

@mfisher87 mfisher87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a bit more work before this is ready! See comments above.

I've also added some other reviewers who have touched this code. The main question in my mind now is should we set the timeout for every request executed under the session? Why just set_requests_session?

Maybe we should modify our requests.Session subclass instead. Then, instead of the try block we can also make the warning a default behavior.

E.g. (pseudocode)

class SessionWithTimeoutAndHeaderRedirection(requests.Session):
    """Requests removes auth headers if the redirect happens outside the
    original req domain.
    """

    def __init__(
        self,
        edl_hostname: str, auth: tuple[str, str] | None = None
    ) -> None:
        super().__init__()
        self.headers.update({"User-Agent": user_agent})

        if auth:
            hook = BasicAuthResponseHook(edl_hostname, auth)
            self.hooks["response"].append(hook)

    @cached_property
    def timeout(self) -> tuple[int, int] | tuple | None:
         # Get it from the envvar / default value

    def request(self, method, url, **kwargs):
        kwargs.setdefault('timeout', self.timeout)
        try: 
            return super().request(method, url, **kwargs)
        except requests.Timeout as e:
            # print warning
            raise e

It's been a while since I thought about this, but setting a default timeout on a session was a big point of contention with the Requests community like a decade ago or more. I think the recommended way to do this is actually to override an Adapter class. So I wouldn't consider the above ☝️ to be correct without more research 😁

Another option we have is to just pass a timeout to every single requests call, and Ruff can help us do that: https://docs.astral.sh/ruff/rules/request-without-timeout/

Comment on lines +52 to +58
timeout_str = os.getenv("EARTHACCESS_REQUEST_TIMEOUT")
timeout = (
int(timeout_str)
if timeout_str and timeout_str.isdigit()
else DEFAULT_REQUEST_TIMEOUT
)
return timeout
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit nitpicky, hope you don't mind!

Suggested change
timeout_str = os.getenv("EARTHACCESS_REQUEST_TIMEOUT")
timeout = (
int(timeout_str)
if timeout_str and timeout_str.isdigit()
else DEFAULT_REQUEST_TIMEOUT
)
return timeout
timeout_str = os.getenv("EARTHACCESS_REQUEST_TIMEOUT", default="")
if timeout_str.isdigit():
return int(timeout_str)
else:
return DEFAULT_REQUEST_TIMEOUT_SECONDS


logger = logging.getLogger(__name__)

DEFAULT_REQUEST_TIMEOUT = 10 # seconds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DEFAULT_REQUEST_TIMEOUT = 10 # seconds
DEFAULT_REQUEST_TIMEOUT_SECONDS = 10

What do you think?



@pytest.mark.parametrize(
"env_timeout, first_status, second_status, expected_timeout, expected_calls",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"env_timeout, first_status, second_status, expected_timeout, expected_calls",
("env_timeout", "first_status", "second_status", "expected_timeout", "expected_calls"),

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants