Excessive Catalog Round-trips in IcebergDocument.hasNext()

### Summary
The current implementation of `IcebergDocument.hasNext()` triggers a remote catalog lookup (PostgreSQL) and a storage listing (S3) for every single tuple while processing the final Parquet file in a snapshot. This leads to significant latency and unnecessary load on the metadata database.


### Background
The `IcebergDocument` manages a `usableFileIterator` to support live-streaming (concurrent read/write). The current logic follows these steps:
1. When the current file is empty, it pulls the next file from `usableFileIterator`.
2. When `usableFileIterator` is empty, it calls `seekToUsableFile()` to check the catalog for newly committed files.

### Problem
The check for `usableFileIterator.isEmpty()` occurs before checking if the current file still has records. As a result, as soon as the reader starts the last known file in the list, `usableFileIterator` becomes empty, triggering `seekToUsableFile()` for every subsequent call to `hasNext()`.

#### Example Scenario
A result set consists of `file1` and `file2` (4,096 rows each).
- During `file1`: `usableFileIterator` contains `[file2]`. `hasNext()` returns true.
- During `file2`: `usableFileIterator` is now empty. `hasNext()` is called 4,096 times to read the rows. Because the iterator is empty, 4,096 network requests are made to PostgreSQL and S3 to seek new files, even though the reader is still busy processing the current file.

### Proposed Fix
Add a guard condition to ensure that `seekToUsableFile()` is only invoked when the current record iterator is actually exhausted.

- If the current file has more records, return true immediately.
- Only if the current file is exhausted, check `usableFileIterator`.
- Only if `usableFileIterator` is also empty, call `seekToUsableFile()`.

### Impact
- Without Fix: >4,096 catalog/S3 calls (Total Rows in last file + 1).
- With Fix: 2 calls (one at initialization, one at the very end when all records are truly exhausted).
- Result: Significant reduction in IOPS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive Catalog Round-trips in IcebergDocument.hasNext() #4289

Summary

Background

Problem

Example Scenario

Proposed Fix

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Excessive Catalog Round-trips in IcebergDocument.hasNext() #4289

Description

Summary

Background

Problem

Example Scenario

Proposed Fix

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions