Skip to content

Bug: paginateSearchWithRetry triggers 403 rate limit and discards already-fetched data on retry #33

@jadeproheshan

Description

@jadeproheshan

Problem Description

Running openmeta scout --limit 5 or any command that triggers fetchTrendingIssues() quickly hits GitHub's Search API rate limit (403). The retry mechanism then discards all previously fetched pages and restarts from page 0, creating a fatal retry loop that eventually crashes the process with zero results returned.


Reproduction

openmeta scout --refresh --limit 5

Observed: the command hits 403 within seconds, retries 3 times, then crashes with "GitHub issue discovery failed."Add the "--refresh" argument is to make sure the local cache to be skipped so that the bug can be reproduced.


Root Cause Analysis

Three compounding issues in paginateSearchWithRetry() (github.ts):

A. Unbounded greedy pagination (root cause)

octokit.paginate.iterator walks every available page with no upper bound. Labels like good first issue have hundreds of pages. GitHub's Search API allows only 30 authenticated requests per minute. The iterator fires requests back-to-back with no throttling, exhausting the quota within seconds.

B. All-or-nothing data loss on retry (fatal amplifier)

const items = [] is declared inside the try block, scoped to each retry attempt. When page N triggers a 403:

Attempt 1: fetch pages 1..4 successfully (120 items) → page 5 hits 403
           → catch block: items is out of scope, 120 items GONE
           → wait ~60s for rate limit window reset
Attempt 2: fetch pages 1..4 again → page 5 hits 403 again
           → 120 items GONE again
Attempt 3: same → crash with zero results

The retry always starts from page 0 with a fresh empty array. Since the rate limit window (30 requests) is exactly consumed by the pages before the crash point, every retry hits the same wall.

C. Rate-limit retry targets the wrong problem

The old code reads X-RateLimit-Reset headers and applies exponential backoff (1s → 2s → 4s) specifically for 403/429 errors. But waiting and retrying from scratch is futile when the underlying problem is unbounded fetch depth — the quota will always be exhausted before all pages are collected.


Proposed Solutions

1. Preserve already-fetched data across retries

Move const items = [] outside the while loop so that when a rate-limit error occurs mid-pagination, the method can gracefully return whatever has been collected so far instead of discarding it.

// Before (broken): items dies with the try block
while (attempt < MAX_PAGINATION_RETRIES) {
  try {
    const items = [];  // ← scoped here, lost on error
    for await (...) { items.push(...); }
    return items;
  } catch { /* items is gone */ }
}

// After (fixed): items survives across retry attempts
const items = [];
while (attempt < MAX_PAGINATION_RETRIES) {
  try {
    for await (...) { items.push(...); }
    return items;
  } catch (error) {
    if (isRateLimit && items.length > 0) {
      return items;  // ← graceful degradation
    }
    // ...
  }
}

Getting 120 results with a warning is infinitely better than crashing with 0 results.

2. [For discussion] Add a maximum page limit to avoid triggering rate limits entirely

Introduce MAX_SEARCH_PAGES to cap pagination depth. With the current setup:

Parameter Value
Authenticated Search API quota 30 requests/min
Results per page (SEARCH_RESULTS_PER_PAGE) 30
Label groups (FILTER_LABEL_GROUPS) 2

Math for a safe limit:

  • If we set MAX_SEARCH_PAGES = 13 per label group: 13 × 2 = 26 requests total
  • Results: 13 × 30 = ~390 results per label group, ~780 total
  • Leaves 4 requests (of 30) as safety margin

3. [For discussion] If full pagination is truly desired, the retry must resume from the breakpoint

If the authors genuinely want to fetch all pages (hundreds), the current retry-from-page-0 approach is fundamentally broken regardless of backoff strategy. The fix would require tracking the last successful page and resuming from there on retry:

const items: SearchIssueItem[] = [];
let currentPage = 1;

while (attempt < MAX_PAGINATION_RETRIES) {
  try {
    for await (const response of this.octokit.paginate.iterator(
      this.octokit.rest.search.issuesAndPullRequests,
      {
        q: searchQuery,
        sort: 'updated',
        order: 'desc',
        per_page: SEARCH_RESULTS_PER_PAGE,
        page: currentPage,  // ← resume from breakpoint
      },
    )) {
      const pageItems = Array.isArray((response as any).data) ? (response as any).data : [];
      items.push(...pageItems);
      currentPage++;
    }
    return items;
  } catch (error) {
    // currentPage stays where it was — retry picks up from the breakpoint
    // items survives across attempts — no data loss
  }
}

octokit.paginate.iterator accepts a page parameter that is passed through to the underlying API call, so resuming from a breakpoint does not require abandoning the iterator — just tracking currentPage and passing it on retry.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions