feat(download): first class support for streaming download #2013#2021
feat(download): first class support for streaming download #2013#2021pdavid-cssopra wants to merge 2 commits intodevelopfrom
Conversation
Test Results 4 files ± 0 4 suites ±0 3m 28s ⏱️ +15s Results for commit 4ea1844. ± Comparison against base commit 798a34a. This pull request removes 6 and adds 11 tests. Note that renamed tests count towards both.♻️ This comment has been updated with latest results. |
Code Coverage (Ubuntu)DetailsDiff against developResults for commit: 4ea1844 Minimum allowed coverage is ♻️ This comment has been updated with latest results |
Code Coverage (Windows)DetailsDiff against developResults for commit: 4ea1844 Minimum allowed coverage is ♻️ This comment has been updated with latest results |
4a4afa6 to
f70e19a
Compare
47999eb to
b9f0490
Compare
ecb9fba to
775e01e
Compare
tests/units/test_eoproduct.py
Outdated
| # Download | ||
| asset_stream = product.assets["foo"].stream_download() | ||
| self.assertEqual(asset_stream.headers["content-type"], "image/tiff") | ||
| self.assertEqual(asset_stream.headers["content-length"], "1") |
There was a problem hiding this comment.
a content could be downloaded here
There was a problem hiding this comment.
That's the case
# Download to tmp directory
filepath = os.path.join(self.output_dir, asset_stream.filename)
with open(filepath, "wb") as fp:
for chunk in asset_stream.content:
fp.write(chunk)
Global emultation is unconsistent, even if has headers like content length = 1, mock not serve real file
There was a problem hiding this comment.
Several issues here:
content-lengthshould not be 1. Problem fixed by test: headers as requests CaseInsensitiveDict #2054- downloaded file name is not correct (comes from
_dummy_downloadable_product). Please adapt mock for asset in this test
There was a problem hiding this comment.
The method for emulating external requests is inadequate for the required tests, which demand sequencing multiple calls. Lacking the time to properly recreate the emulations, the problem will be solved with an unsightly workaround.
Regarding HTTP headers, the IEEE retains the HTTP 1.0 header, which is the most restrictive of the three current standard versions (1.0, 1.1, 2.0), in a CamelCase-like format, such as "Content-Length".
It's reassuring that the code has a safeguard, but not that it compensates for inconsistencies in code writing. All fields have been standardized to a common naming convention.
There was a problem hiding this comment.
Here is a representative example of what HTTP emulation mechanics for testing could look like.
http_mock.py
There was a problem hiding this comment.
like already discussed, we are using responses in tests to handle multiple calls. Please use it instead of rewriting an alternative
There was a problem hiding this comment.
responses librairie does not expose overloadable classes and not allow a way to adapt protocol behaviour. It's unfortunatly not enough to manage custom case like streaming, in particular when eodag does not go through the standard stream transfer, and does not expose a generic generator object.
There was a problem hiding this comment.
we do not need to overload responses classes, neither to adapt protocol behavior. Please test following this example #2062
There was a problem hiding this comment.
conflict appears between response librairie and native mocking from EodagTestCase.
Heritance from EodagTestCase to TestEoProduct had to be removed to implement requested tests update.
a0d88dc to
42aa3b5
Compare
5bde4cf to
de048c6
Compare
d6a9e54 to
1882636
Compare
| from typing import Union | ||
|
|
||
|
|
||
| class ResponseFile: |
There was a problem hiding this comment.
please use responses library. A file content can be passed as body if needed https://github.com/getsentry/responses?tab=readme-ov-file#response-parameters
There was a problem hiding this comment.
this class replace an existing code in test:
tests/units/test_download_plugins.py:230
def _download_response_archive(self):
class Response:
"""Emulation of a response to requests.get method for a zipped product"""
def __init__(response):
# Using a zipped product file
with open(self.local_product_as_archive_path, "rb") as fh:
response.__zip_buffer = io.BytesIO(fh.read())
cl = response.__zip_buffer.getbuffer().nbytes
response.headers = CaseInsensitiveDict(
{
"content-length": cl,
"content-disposition": "attachment; filename=foobar.zip",
}
)
response.url = "http://foo.bar"
def __enter__(response):
return response
def __exit__(response, *args):
pass
def iter_content(response, **kwargs):
with response.__zip_buffer as fh:
while True:
chunk = fh.read(kwargs["chunk_size"])
if not chunk:
break
yield chunk
def raise_for_status(response):
pass
def close(response):
pass
return Response()
The class has been generalized to allow for extending tests without rewriting existing ones. The file has been isolated into a separate file for cleanliness and dependency isolation.
The response library does not allow overloading the response object of a request, and is therefore not suitable here.
There was a problem hiding this comment.
this manual mechanism was also not needed, I just removed it in #2062.
Please adapt your tests following this example
There was a problem hiding this comment.
conflict appears between response librairie and native mocking from EodagTestCase.
Heritance from EodagTestCase to TestEoProduct had to be removed to implement requested tests update.
1882636 to
69cd665
Compare
009fd5a to
03bbe32
Compare
03bbe32 to
4ea1844
Compare
What would you like to be added ?
The support for streaming downloads that is already available for use in server mode should also be available for users in the eodag library.
Why is this needed ?
Streaming downloads would allow users to send the data they retrieved from EODAG to another tool of their choice without having to store on their filesystem first. This feature would enable more powerful and performant data processing pipelines using the EODAG library.
the new stream_download method will be used in stac-fastapi-eodag in place of _stream_download_dict
this should also be demonstrated in a tutorial to upload products/assets on a local S3 created with minio: will fix #1325
How should it be implemented ?
Currently there is a method plugin.__stream_download_dict() that is implemented in the download plugins. This method should be used in a public method EODataAccessGateway.stream_download(product) similar to EODataAccessGateway.download(product).
Proposed enhancements:
No need to have the stream_download method exposed through EODataAccessGateway.
Technical note
Minio is no longer supported;
rustfsis suggested as a replacement for tutorials.Tests update reveals somes bugs: