Requesting review for "show patches" implementation#33
Conversation
choosehappy
left a comment
There was a problem hiding this comment.
Some thoughts + questions for your consideration
| which allows for immediate progressive loading of patch images. However, this doubles the number of REST requests (1 request returns patch metadata, and 1 request returns the patch image). | ||
|
|
||
| I should also note that the viewport currently waits for all sample_patches_by_point requests to complete (Promise.all) before rendering the patch images. | ||
| This can be improved upon to greedily render patches as they are received, requiring some changes to the frontend code. |
There was a problem hiding this comment.
yes, i think this should be done greedily - especially since its not obvious without a lot of testing at scale what the worst case time response for all patches would be. what is more likely is that the user seems some particular region quite quickly that they'r einterested in and then zooms into that area - which would then mean that the remaining "waiting" would have been wasted compute. is this sufficiently easy to implement?
There was a problem hiding this comment.
looking at how the underlying function is implemented - this shouldn't be done at a bbox level nayway:
this is essentially a "for loop" of embarassingly parallel queries, isn't it?
since it looks like results don't have any interaction with each other - the end point is essentially blocking the 1st response until the 200th.
if the overhead is the concern, an async option can be used:
import asyncio
async def fetch_user(user_id):
return await db.fetch_one(
"SELECT * FROM users WHERE id = :id",
{"id": user_id},
)
results = await asyncio.gather(
*(fetch_user(i) for i in ids)
)or if bounded:
sem = asyncio.Semaphore(20)
async def fetch(user_id):
async with sem:
return await db.fetch_one(...)
results = await asyncio.gather(
*(fetch(i) for i in ids)
)There was a problem hiding this comment.
Yes this is certainly superior to the current for loop implementation.
Do you think the parallelism should happen within the server (using asyncio), or just continue to let the client generate 100-200 rest requests?
There was a problem hiding this comment.
my first gut feeling is to say - it depends on how fast it gets done. there was some other discussion here essentially talk about the intended user experience. they should see patches as quickly as possible and have it be non-blocking so that they can move around, which will "save" the rest of the compute for the patches that are then no longer needed. this justifies the 1-by-1 approach. however, if the parallelism allows this to happen and return a result quickly, like, in 200ms - then we'll likely experience an even faster UI experience because there isn't the overhead of connecting and communicating 200 connections. do you have a sense of how fast this is and how its speed would scale as the dataset gets larger? what does your guy say?
| store = PatchStore(project_id, session) | ||
| x_min, x_max = min(xmin, xmax), max(xmin, xmax) | ||
| y_min, y_max = min(ymin, ymax), max(ymin, ymax) | ||
| x_coords = np.random.uniform(x_min, x_max, size=num_samples) |
There was a problem hiding this comment.
i think we discussed this and decided on random, but in thinking more about it, i think this should be determnistic and a simple cartesian grid. the benifit here is reproducaibility and ease of sharing. e.g., the user can copy paste the URL of their viewer, send it to someone else, and (assuming the system wasn't training) they would see exxactly the same patches and UI.
There was a problem hiding this comment.
it might also be a faster query, since you'll get the first patch in the grid as opposed to having to randomly go through potentially millions of patches to find the correct one
There was a problem hiding this comment.
okay - I'll update this to be a cartesian grid.
| This endpoint was intended for "show patches" functionality, mirroring a similar endpoint in PS v1. | ||
| However, it is not currently used because it's expensive to run a SQL query with a large number (PATCH_NUM_SAMPLES = 200) of predicate BETWEEN conditions. | ||
| Instead, the frontend currently calls the sample_patches_by_point endpoint for each of the 200 points in the grid, | ||
| which allows for immediate progressive loading of patch images. However, this doubles the number of REST requests (1 request returns patch metadata, and 1 request returns the patch image). |
There was a problem hiding this comment.
it looks like the sub command accepts an argument to return the image as well, why not let this trickle down and back up?
There was a problem hiding this comment.
If I understand your question, you're asking why the endpoint doesn't also accept an include_image argument?
The answer is, it probably should. It makes rendering the image a little more complicated - the blob returned for PatchResponse.image would need to be used to create a blob URL in the client:
const imageUrl = URL.createObjectURL(blob);Which would then need to be revoked after rendering to prevent memory leaks:
URL.revokeObjectURL(imageUrl)There was a problem hiding this comment.
i don't really remember how the patches are sent - i would have expected it to just be the blob as the message body and not as e.g., a separate url creation that needs to be managed?
| I should also note that the viewport currently waits for all sample_patches_by_point requests to complete (Promise.all) before rendering the patch images. | ||
| This can be improved upon to greedily render patches as they are received, requiring some changes to the frontend code. | ||
| The implementation currently supports abort signals when the user pans or zooms, which should be preserved in any future changes. | ||
| """ |
There was a problem hiding this comment.
is this comment correct? large limit == 1?
There was a problem hiding this comment.
Sorry the comment is not accurate. limit=1 gets the first returned patch matching the predicate. Will update the comment.
No description provided.