-
Notifications
You must be signed in to change notification settings - Fork 0
Disallow new requests after ≥10 and ≥1% undocumented error responses #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
changelog: cover 0.2.1
* n_results is renamed to n_success; * n_extracted_queries is removed, because it's always the same as n_results (i.e. n_success); * n_input_queries is removed: it wasn't really a number of input queries, (it was a number of processed queries), and it can be computed from other stats: success + fatal errors; * added a short comment which explains each stat value
clean up AggStats
Zyte Data API → Zyte API
add brotli as a dependency
Prepare for 0.4.0 release
Network error retry time: 5 minutes → 15 minutes
It seems aiohttp has troubles with edge cases of Keep-Alive, and disabling it helps with ServerDisconnectedErrors. Using aiohttp sessions is still important, because it allows to reduce the number of ClientConnectorErrors.
Don't reuse the connections.
Add conservative_retrying
Add Python 3.12 and 3.13, drop Python 3.8, update tool versions
| if store_errors and isinstance(e, RequestError): | ||
| write_output(e.parsed.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an error detected while addressing typing issues, and removing the request error mock from tests. write_output does json.dumps, so when passing a string here before, it was printing in the file '{"… instead of the expected {"….
| raise | ||
|
|
||
| logger.error(str(e)) | ||
| logger.exception("Exception raised during response handling") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it useful to see the exception traceback when the exception is not RequestError. Although it might make things too verbose for RequestError. Maybe we should only log the traceback when the exception is unknown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like something that can/should be re-evaluated after using it.
|
Continued at zytedata#82 |
cc @kmike @wRAR
To be merged into zytedata#78, or if that’s merged first, to be moved (the PR) to the upstream repo with
mainas the target branch.It’s a rather hacky approach, to be honest:
The number of total responses and undocumented error responses is tracked in class variables. Tests involve monkey patching to reset or hard-code their values.
If we want to keep the implementation at tenacity level, this is the cleanest path I saw, although I don’t discard there being better ways.
The use of the class to count, instead of an instance, is due to tenacity creating copies of the instance for each function wrapping. The specifics are not completely clear to me, I am not 100% sure there is no way around it, but this line seems problematic for instance-based storage of these counters. The statistics on the next line are also not global, and get reset on every wrapped call.
We could consider implementing this logic outside tenacity, in the client code. It would allow for a cleaner implementation, e.g. based on stats. The main issue I see, and it might be considered minor, is that then we cannot stop new requests as soon as the condition is met, we still need to wait for the retries of 1 of the on-going requests to finish, which means we may sometimes stop new requests only after e.g. 11+ and not only 10 undocumented error responses. There is also the issue of not being able to customize this logic through a custom retry policy class, but we might not want to support that to begin with, and if we did, we could instead provide some client parameter(s) instead.
I made some other decisions I’m not entirely sure about:
Post-merge work:
TooManyUndocumentedErrorsexception.