Set timeout for all requests in set_requests_session#1259
Set timeout for all requests in set_requests_session#1259tylanderson wants to merge 4 commits intoearthaccess-dev:mainfrom
Conversation
|
I will automatically update this comment whenever this PR is modified
|
|
Nice! Thank you! Totally agree a requests session should always have a timeout. Ideally the user would have some control over this, but the current architecture of earthaccess makes that a bit challenging. A more object oriented approach (client/session interface) for earthaccess (#1250) would make this easier for the user to control. For now, maybe an environment variable would enable that. I worry that a 10 second timeout might negatively impact folks with less reliable internet connections (e.g. working in remote areas, mobile hotspot, satellite, ??), so I'd suggest we implement some mechanism for the user to override before merging this PR. Thoughts? |
|
Yeah agree that configuration currently is a bit difficult as things stand now. I ran a few tests with my network simulating lower performing networks (e.g. 3G) which did work, that being said I think having this be configurable via env is a good option for now, I'll add that in |
|
Thanks Tyler! ❤️ |
There was a problem hiding this comment.
I think we need a bit more work before this is ready! See comments above.
I've also added some other reviewers who have touched this code. The main question in my mind now is should we set the timeout for every request executed under the session? Why just set_requests_session?
Maybe we should modify our requests.Session subclass instead. Then, instead of the try block we can also make the warning a default behavior.
E.g. (pseudocode)
class SessionWithTimeoutAndHeaderRedirection(requests.Session):
"""Requests removes auth headers if the redirect happens outside the
original req domain.
"""
def __init__(
self,
edl_hostname: str, auth: tuple[str, str] | None = None
) -> None:
super().__init__()
self.headers.update({"User-Agent": user_agent})
if auth:
hook = BasicAuthResponseHook(edl_hostname, auth)
self.hooks["response"].append(hook)
@cached_property
def timeout(self) -> tuple[int, int] | tuple | None:
# Get it from the envvar / default value
def request(self, method, url, **kwargs):
kwargs.setdefault('timeout', self.timeout)
try:
return super().request(method, url, **kwargs)
except requests.Timeout as e:
# print warning
raise e
It's been a while since I thought about this, but setting a default timeout on a session was a big point of contention with the Requests community like a decade ago or more. I think the recommended way to do this is actually to override an Adapter class. So I wouldn't consider the above ☝️ to be correct without more research 😁
Another option we have is to just pass a timeout to every single requests call, and Ruff can help us do that: https://docs.astral.sh/ruff/rules/request-without-timeout/
| timeout_str = os.getenv("EARTHACCESS_REQUEST_TIMEOUT") | ||
| timeout = ( | ||
| int(timeout_str) | ||
| if timeout_str and timeout_str.isdigit() | ||
| else DEFAULT_REQUEST_TIMEOUT | ||
| ) | ||
| return timeout |
There was a problem hiding this comment.
A bit nitpicky, hope you don't mind!
| timeout_str = os.getenv("EARTHACCESS_REQUEST_TIMEOUT") | |
| timeout = ( | |
| int(timeout_str) | |
| if timeout_str and timeout_str.isdigit() | |
| else DEFAULT_REQUEST_TIMEOUT | |
| ) | |
| return timeout | |
| timeout_str = os.getenv("EARTHACCESS_REQUEST_TIMEOUT", default="") | |
| if timeout_str.isdigit(): | |
| return int(timeout_str) | |
| else: | |
| return DEFAULT_REQUEST_TIMEOUT_SECONDS |
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| DEFAULT_REQUEST_TIMEOUT = 10 # seconds |
There was a problem hiding this comment.
| DEFAULT_REQUEST_TIMEOUT = 10 # seconds | |
| DEFAULT_REQUEST_TIMEOUT_SECONDS = 10 |
What do you think?
|
|
||
|
|
||
| @pytest.mark.parametrize( | ||
| "env_timeout, first_status, second_status, expected_timeout, expected_calls", |
There was a problem hiding this comment.
| "env_timeout, first_status, second_status, expected_timeout, expected_calls", | |
| ("env_timeout", "first_status", "second_status", "expected_timeout", "expected_calls"), |
Description
Adds timeouts on requests during
set_requests_session. I had encountered an issue where the first request would run much longer than expected before timing out, this will surface that error faster. Set to 10s for now."Ready for review" checklist
Merge checklist
closes #1)CHANGELOG.mdupdatedREADME.mdupdatedpre-commit.ci autofixif pre-commit is failing)📚 Documentation preview 📚: https://earthaccess--1259.org.readthedocs.build/en/1259/