Skip to content

Requesting review for "show patches" implementation#33

Open
jacksonjacobs1 wants to merge 4 commits into
choosehappy:v2.0from
jacksonjacobs1:review-show-patches
Open

Requesting review for "show patches" implementation#33
jacksonjacobs1 wants to merge 4 commits into
choosehappy:v2.0from
jacksonjacobs1:review-show-patches

Conversation

@jacksonjacobs1

Copy link
Copy Markdown
Collaborator

No description provided.

@choosehappy choosehappy left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts + questions for your consideration

which allows for immediate progressive loading of patch images. However, this doubles the number of REST requests (1 request returns patch metadata, and 1 request returns the patch image).

I should also note that the viewport currently waits for all sample_patches_by_point requests to complete (Promise.all) before rendering the patch images.
This can be improved upon to greedily render patches as they are received, requiring some changes to the frontend code.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i think this should be done greedily - especially since its not obvious without a lot of testing at scale what the worst case time response for all patches would be. what is more likely is that the user seems some particular region quite quickly that they'r einterested in and then zooms into that area - which would then mean that the remaining "waiting" would have been wasted compute. is this sufficiently easy to implement?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at how the underlying function is implemented - this shouldn't be done at a bbox level nayway:

this is essentially a "for loop" of embarassingly parallel queries, isn't it?

for i_min, i_max, j_min, j_max in cell_ranges:

since it looks like results don't have any interaction with each other - the end point is essentially blocking the 1st response until the 200th.

if the overhead is the concern, an async option can be used:

import asyncio

async def fetch_user(user_id):
    return await db.fetch_one(
        "SELECT * FROM users WHERE id = :id",
        {"id": user_id},
    )

results = await asyncio.gather(
    *(fetch_user(i) for i in ids)
)

or if bounded:

sem = asyncio.Semaphore(20)

async def fetch(user_id):
    async with sem:
        return await db.fetch_one(...)

results = await asyncio.gather(
    *(fetch(i) for i in ids)
)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is certainly superior to the current for loop implementation.

Do you think the parallelism should happen within the server (using asyncio), or just continue to let the client generate 100-200 rest requests?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my first gut feeling is to say - it depends on how fast it gets done. there was some other discussion here essentially talk about the intended user experience. they should see patches as quickly as possible and have it be non-blocking so that they can move around, which will "save" the rest of the compute for the patches that are then no longer needed. this justifies the 1-by-1 approach. however, if the parallelism allows this to happen and return a result quickly, like, in 200ms - then we'll likely experience an even faster UI experience because there isn't the overhead of connecting and communicating 200 connections. do you have a sense of how fast this is and how its speed would scale as the dataset gets larger? what does your guy say?

store = PatchStore(project_id, session)
x_min, x_max = min(xmin, xmax), max(xmin, xmax)
y_min, y_max = min(ymin, ymax), max(ymin, ymax)
x_coords = np.random.uniform(x_min, x_max, size=num_samples)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we discussed this and decided on random, but in thinking more about it, i think this should be determnistic and a simple cartesian grid. the benifit here is reproducaibility and ease of sharing. e.g., the user can copy paste the URL of their viewer, send it to someone else, and (assuming the system wasn't training) they would see exxactly the same patches and UI.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might also be a faster query, since you'll get the first patch in the grid as opposed to having to randomly go through potentially millions of patches to find the correct one

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay - I'll update this to be a cartesian grid.

This endpoint was intended for "show patches" functionality, mirroring a similar endpoint in PS v1.
However, it is not currently used because it's expensive to run a SQL query with a large number (PATCH_NUM_SAMPLES = 200) of predicate BETWEEN conditions.
Instead, the frontend currently calls the sample_patches_by_point endpoint for each of the 200 points in the grid,
which allows for immediate progressive loading of patch images. However, this doubles the number of REST requests (1 request returns patch metadata, and 1 request returns the patch image).

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like the sub command accepts an argument to return the image as well, why not let this trickle down and back up?

include_image=False,

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand your question, you're asking why the endpoint doesn't also accept an include_image argument?

The answer is, it probably should. It makes rendering the image a little more complicated - the blob returned for PatchResponse.image would need to be used to create a blob URL in the client:

const imageUrl = URL.createObjectURL(blob);

Which would then need to be revoked after rendering to prevent memory leaks:

URL.revokeObjectURL(imageUrl)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't really remember how the patches are sent - i would have expected it to just be the blob as the message body and not as e.g., a separate url creation that needs to be managed?

I should also note that the viewport currently waits for all sample_patches_by_point requests to complete (Promise.all) before rendering the patch images.
This can be improved upon to greedily render patches as they are received, requiring some changes to the frontend code.
The implementation currently supports abort signals when the user pans or zooms, which should be preserved in any future changes.
"""

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comment correct? large limit == 1?

limit=1, # Large limit to fetch all matches within the cell range

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry the comment is not accurate. limit=1 gets the first returned patch matching the predicate. Will update the comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants