Incremental `GROUP BY _block_num` and `DISTINCT BY _block_num` by leoyvens · Pull Request #1877 · edgeandnode/amp

leoyvens · 2026-02-27T14:49:08Z

This implements two related features:

Streaming DISTINCT ON and GROUP BY, when the key is _block_num
Using function call syntax block_num() to more easily refer to the block number anywhere in the query.

Our execution semantics essentially already support the special case where an aggregation is restricted to within a block, thanks to the assumption that data for a same block never spans more than a single microbatch. So executing the aggregation in isolation on each microbatch yields correct results. The necessary changes were around incremental query validation checks and block number propagation.

LNSD

Please, check my comments 🙂

LNSD · 2026-02-27T15:14:57Z

crates/core/common/src/lib.rs

 pub use datafusion::{arrow, parquet};
 pub use datasets_common::{block_num::BlockNum, block_range::BlockRange, end_block::EndBlock};

+pub mod block_num_udf;


Can we rename the module to just block_num? maybe we can move all the UDFs under common::udfs (e.g., common::udfs::evm::* or common::udfs::block_num)

LNSD · 2026-03-04T12:37:46Z

crates/core/common/src/udfs/block_num.rs

I meant to move common::block_num to common::udfs::block_num (from crates/core/common/src/block_num.rs to crates/core/common/src/udfs/block_num.rs)

LNSD · 2026-03-04T12:58:18Z

The CI failure is related to the changes: https://github.com/edgeandnode/amp/actions/runs/22639732283/job/65612193391?pr=1877

LNSD · 2026-03-05T12:00:06Z

@leoyvens, if this is still WIP, could you mark the PR as a draft?

leoyvens · 2026-03-05T12:03:30Z

@LNSD yes CI is failing for legit reasons, marked as draft

LNSD · 2026-03-05T12:13:05Z

@LNSD yes CI is failing for legit reasons, marked as draft

Thanks. The issue was that I was flooded with notifications to review the PR, and it wasn't ready. Using the "draft mode" can help better signal when someone wants the PR to be reviewed.

leoyvens · 2026-03-05T13:48:53Z

@Theodus This now ready for review

Signed-off-by: Leonardo Yvens <leoyvens@gmail.com>

Allow GROUP BY queries that include _block_num as a group key to work with incremental processing instead of being rejected. - Handle Aggregate in BlockNumPropagator by setting next_block_num_expr - Remove Aggregate from the unsupported-node error arm Signed-off-by: Leonardo Yvens <leoyvens@gmail.com>

…P BY

Signed-off-by: Leo <leo@edgeandnode.com>

…gation Adds a `block_num()` sentinel UDF that lets users explicitly request the propagated `_block_num` value in projections and DISTINCT ON expressions, particularly in join contexts where the bare `_block_num` column would be ambiguous. Key changes: - Register BlockNumUdf in builtin_udfs() in session_state - BlockNumPropagator now replaces block_num() UDF with the correct greatest(left._block_num, right._block_num) expression from the join - forbid_underscore_prefixed_aliases enhanced to check all node types (not just Projection) and to reject bare _block_num in multi-table projections - incremental_op_kind uses expr_outputs_block_num for Aggregate/Distinct::On first-key checks, accepting post-propagation expressions derived from _block_num

…_num ambiguity

… not input qualifiers

…rojection; update tests

…wildcard hint Signed-off-by: Leo <leo@edgeandnode.com>

Theodus

LGTM

leoyvens requested a review from Theodus February 27, 2026 14:49

leoyvens marked this pull request as draft February 27, 2026 14:49

leoyvens marked this pull request as ready for review February 27, 2026 15:00

LNSD reviewed Feb 27, 2026

View reviewed changes

leoyvens force-pushed the incremental-distinct-on branch 3 times, most recently from f9ae9a5 to ed06c86 Compare March 3, 2026 19:45

LNSD reviewed Mar 4, 2026

View reviewed changes

leoyvens force-pushed the incremental-distinct-on branch 2 times, most recently from e73514b to 3948177 Compare March 5, 2026 11:44

leoyvens marked this pull request as draft March 5, 2026 12:01

leoyvens marked this pull request as ready for review March 5, 2026 13:34

leoyvens force-pushed the incremental-distinct-on branch from 64f28c7 to a0ab4ad Compare March 5, 2026 13:40

leoyvens and others added 12 commits March 6, 2026 17:48

feat(common): support incremental DISTINCT ON with _block_num

14aa077

Signed-off-by: Leonardo Yvens <leoyvens@gmail.com>

test(tests): add integration tests for streaming DISTINCT ON and GROU…

b6d8beb

…P BY

refactor(common): extract plan_visitors tests to separate file

521055c

chore(common): fmt

bc60ab3

refactor(common): rename block_num_udf module to block_num

c768565

Signed-off-by: Leo <leo@edgeandnode.com>

chore(common): fmt

d57804a

fix(tests): Adjust error message

4d0263e

fix(tests): update expected error for select * cross join with _block…

899a06a

…_num ambiguity

fix(common): tighten multi-table _block_num check to count duplicates…

70fbbe8

… not input qualifiers

fix(common,tests): reject _block_num from any column in multi-table p…

2e44a75

…rojection; update tests

leoyvens added 6 commits March 6, 2026 17:48

fix(common,tests): improve multi-table _block_num error message with …

d1c4c98

…wildcard hint Signed-off-by: Leo <leo@edgeandnode.com>

fix(common,tests): Require block_num() not _block_num in aggs

af497dd

fix(common): fix conflicts

2eae8e8

chore(server): fix clippy

a235f7f

fix(common,tests): adjust incremental check for incrementalizer use

65c959b

chore(common): fmt

a6f016e

leoyvens force-pushed the incremental-distinct-on branch from a0ab4ad to bdca7e3 Compare March 6, 2026 17:58

chore(common): move block_num.rs

85f869c

leoyvens force-pushed the incremental-distinct-on branch from bdca7e3 to 85f869c Compare March 6, 2026 18:23

Theodus approved these changes Mar 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental `GROUP BY _block_num` and `DISTINCT BY _block_num`#1877

Incremental `GROUP BY _block_num` and `DISTINCT BY _block_num`#1877
leoyvens wants to merge 19 commits intomainfrom
incremental-distinct-on

leoyvens commented Feb 27, 2026

Uh oh!

LNSD left a comment

Uh oh!

LNSD Feb 27, 2026

Uh oh!

leoyvens Mar 2, 2026

Uh oh!

LNSD Mar 4, 2026

Uh oh!

leoyvens Mar 5, 2026

Uh oh!

LNSD commented Mar 4, 2026

Uh oh!

LNSD commented Mar 5, 2026

Uh oh!

leoyvens commented Mar 5, 2026

Uh oh!

LNSD commented Mar 5, 2026

Uh oh!

leoyvens commented Mar 5, 2026

Uh oh!

Theodus left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

leoyvens commented Feb 27, 2026

Uh oh!

LNSD left a comment

Choose a reason for hiding this comment

Uh oh!

LNSD Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

leoyvens Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

LNSD Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

leoyvens Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

LNSD commented Mar 4, 2026

Uh oh!

LNSD commented Mar 5, 2026

Uh oh!

leoyvens commented Mar 5, 2026

Uh oh!

LNSD commented Mar 5, 2026

Uh oh!

leoyvens commented Mar 5, 2026

Uh oh!

Theodus left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants