Skip to content

improve row number estimation#30

Merged
severo merged 7 commits into
mainfrom
26-improve-guess-by-offset-for-serial-rows
Dec 11, 2025
Merged

improve row number estimation#30
severo merged 7 commits into
mainfrom
26-improve-guess-by-offset-for-serial-rows

Conversation

@severo

@severo severo commented Dec 11, 2025

Copy link
Copy Markdown
Contributor

fixes #26

@severo severo requested a review from Copilot December 11, 2025 22:36
@severo severo changed the title 26 improve guess by offset for serial rows improve row number estimation Dec 11, 2025

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the algorithm for guessing byte offsets when fetching rows at random positions in a CSV file, specifically addressing issue #26 related to handling serial rows more efficiently.

Key Changes:

  • Replaced the isStored() and guessByteOffset() methods in the Estimator class with more comprehensive getStatus(), getFirstMissingRow(), and getLastMissingRowNumber() methods
  • Updated the fetch logic in csvDataFrame to use the new methods, avoiding unnecessary fetches when rows are already cached
  • Enhanced the algorithm to better estimate byte offsets, particularly for scenarios where average row size had been overestimated

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File Description
src/cache.ts Refactored Estimator class to replace isStored() and guessByteOffset() with new methods (getStatus(), getFirstMissingRow(), getLastMissingRowNumber()) that provide more detailed information about row status and byte offsets
src/dataframe.ts Updated fetch logic to use new estimator methods, eliminating the previous loop-based approach for determining which rows need to be fetched and improving the initial state detection
test/cache.test.ts Updated and expanded tests to cover the new estimator methods, including edge cases for complete caches, empty caches, and rows stored at various positions
test/dataframe.test.ts Updated test expectations to reflect improved byte offset estimation behavior for serial rows

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/cache.test.ts
Comment thread src/dataframe.ts Outdated
Comment thread src/cache.ts Outdated
Comment thread test/cache.test.ts
Comment thread test/cache.test.ts
Comment thread src/dataframe.ts Outdated
Comment thread test/cache.test.ts
Comment thread test/cache.test.ts
Comment thread test/cache.test.ts
severo and others added 4 commits December 11, 2025 18:41
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…erial-rows' into 26-improve-guess-by-offset-for-serial-rows
@severo severo merged commit 5fd395a into main Dec 11, 2025
4 checks passed
@severo severo deleted the 26-improve-guess-by-offset-for-serial-rows branch December 11, 2025 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve the row number estimation

2 participants