Requesting review for "show patches" implementation by jacksonjacobs1 · Pull Request #33 · choosehappy/PatchSorter

jacksonjacobs1 · 2026-06-30T16:24:30Z

No description provided.

choosehappy

Some thoughts + questions for your consideration

choosehappy · 2026-07-01T11:02:58Z

+which allows for immediate progressive loading of patch images. However, this doubles the number of REST requests (1 request returns patch metadata, and 1 request returns the patch image).
+
+I should also note that the viewport currently waits for all sample_patches_by_point requests to complete (Promise.all) before rendering the patch images. 
+This can be improved upon to greedily render patches as they are received, requiring some changes to the frontend code.


yes, i think this should be done greedily - especially since its not obvious without a lot of testing at scale what the worst case time response for all patches would be. what is more likely is that the user seems some particular region quite quickly that they'r einterested in and then zooms into that area - which would then mean that the remaining "waiting" would have been wasted compute. is this sufficiently easy to implement?

looking at how the underlying function is implemented - this shouldn't be done at a bbox level nayway:

this is essentially a "for loop" of embarassingly parallel queries, isn't it?

PatchSorter/patchsorter/db/head_client/patch.py

Line 785 in 9771eb5

for i_min, i_max, j_min, j_max in cell_ranges:

since it looks like results don't have any interaction with each other - the end point is essentially blocking the 1st response until the 200th.

if the overhead is the concern, an async option can be used:

import asyncio async def fetch_user(user_id): return await db.fetch_one( "SELECT * FROM users WHERE id = :id", {"id": user_id}, ) results = await asyncio.gather( *(fetch_user(i) for i in ids) )

or if bounded:

sem = asyncio.Semaphore(20) async def fetch(user_id): async with sem: return await db.fetch_one(...) results = await asyncio.gather( *(fetch(i) for i in ids) )

Yes this is certainly superior to the current for loop implementation.

Do you think the parallelism should happen within the server (using asyncio), or just continue to let the client generate 100-200 rest requests?

my first gut feeling is to say - it depends on how fast it gets done. there was some other discussion here essentially talk about the intended user experience. they should see patches as quickly as possible and have it be non-blocking so that they can move around, which will "save" the rest of the compute for the patches that are then no longer needed. this justifies the 1-by-1 approach. however, if the parallelism allows this to happen and return a result quickly, like, in 200ms - then we'll likely experience an even faster UI experience because there isn't the overhead of connecting and communicating 200 connections. do you have a sense of how fast this is and how its speed would scale as the dataset gets larger? what does your guy say?

choosehappy · 2026-07-01T11:06:59Z

        store = PatchStore(project_id, session)
        x_min, x_max = min(xmin, xmax), max(xmin, xmax)
        y_min, y_max = min(ymin, ymax), max(ymin, ymax)
        x_coords = np.random.uniform(x_min, x_max, size=num_samples)


i think we discussed this and decided on random, but in thinking more about it, i think this should be determnistic and a simple cartesian grid. the benifit here is reproducaibility and ease of sharing. e.g., the user can copy paste the URL of their viewer, send it to someone else, and (assuming the system wasn't training) they would see exxactly the same patches and UI.

it might also be a faster query, since you'll get the first patch in the grid as opposed to having to randomly go through potentially millions of patches to find the correct one

okay - I'll update this to be a cartesian grid.

choosehappy · 2026-07-01T11:12:34Z

+This endpoint was intended for "show patches" functionality, mirroring a similar endpoint in PS v1. 
+However, it is not currently used because it's expensive to run a SQL query with a large number (PATCH_NUM_SAMPLES = 200) of predicate BETWEEN conditions.
+Instead, the frontend currently calls the sample_patches_by_point endpoint for each of the 200 points in the grid, 
+which allows for immediate progressive loading of patch images. However, this doubles the number of REST requests (1 request returns patch metadata, and 1 request returns the patch image).


it looks like the sub command accepts an argument to return the image as well, why not let this trickle down and back up?

PatchSorter/patchsorter/db/head_client/patch.py

Line 793 in 9771eb5

include_image=False,

If I understand your question, you're asking why the endpoint doesn't also accept an include_image argument?

The answer is, it probably should. It makes rendering the image a little more complicated - the blob returned for PatchResponse.image would need to be used to create a blob URL in the client:

const imageUrl = URL.createObjectURL(blob);

Which would then need to be revoked after rendering to prevent memory leaks:

URL.revokeObjectURL(imageUrl)

i don't really remember how the patches are sent - i would have expected it to just be the blob as the message body and not as e.g., a separate url creation that needs to be managed?

choosehappy · 2026-07-01T11:24:20Z

+I should also note that the viewport currently waits for all sample_patches_by_point requests to complete (Promise.all) before rendering the patch images. 
+This can be improved upon to greedily render patches as they are received, requiring some changes to the frontend code.
+The implementation currently supports abort signals when the user pans or zooms, which should be preserved in any future changes.
+"""


is this comment correct? large limit == 1?

PatchSorter/patchsorter/db/head_client/patch.py

Line 792 in 9771eb5

limit=1, # Large limit to fetch all matches within the cell range

Sorry the comment is not accurate. limit=1 gets the first returned patch matching the predicate. Will update the comment.

jacksonjacobs1 added 4 commits June 30, 2026 08:45

configure vs code to correctly set python env

a25f872

added NEEDS REVIEW

aa04da7

update review text

b833f83

update review text

48e5288

jacksonjacobs1 requested a review from choosehappy June 30, 2026 16:24

choosehappy requested changes Jul 1, 2026

View reviewed changes

Conversation

jacksonjacobs1 commented Jun 30, 2026

Uh oh!

choosehappy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants