From c84a6b33da3eb45f5fe6fe32bf4eed25c667740a Mon Sep 17 00:00:00 2001
From: Justin Abrahms <justin@abrah.ms>
Date: Thu, 22 Jan 2026 11:02:42 -0800
Subject: [PATCH] Document pagination performance issues from PR #33

Investigation reveals three issues causing slowdown:

1. Cache invalidation (one-time): URL changes invalidated ETags
2. Extra API calls for exactly 100 items: Main issue causing 23-50%
   slowdown for affected users
3. No multi-page caching: Minor cumulative impact

The main culprit is issue #2: current logic makes an extra API call
when result sets have exactly 100 items because it can't distinguish
between 'exactly 100 total' and '100 with more pages'.

Solution: Use GitHub's Link header to detect pagination instead of
guessing based on result count.

Files:
- PAGINATION_ANALYSIS.md: Complete technical analysis with impact tables
- PAGINATION_FIX_POC.md: Implementation guide showing how to fix issue #2

For users following 100 people where 30% have exactly 100 items:
- Current: ~391 API calls
- After fix: ~301 API calls (23% reduction)
---
 PAGINATION_ANALYSIS.md | 222 ++++++++++++++++++++++++++++++++++
 PAGINATION_FIX_POC.md  | 264 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 486 insertions(+)
 create mode 100644 PAGINATION_ANALYSIS.md
 create mode 100644 PAGINATION_FIX_POC.md

diff --git a/PAGINATION_ANALYSIS.md b/PAGINATION_ANALYSIS.md
new file mode 100644
index 0000000..14406c2
--- /dev/null
+++ b/PAGINATION_ANALYSIS.md
@@ -0,0 +1,222 @@
+# Pagination Performance Analysis for PR #33
+
+## Summary
+
+After investigating the pagination changes in PR #33, I've identified three performance issues that cause syncs to be slower, especially noticeable in certain scenarios.
+
+## Issues Identified
+
+### 1. Cache Invalidation (First Sync After Update) ⚠️
+
+**Impact**: Severe slowdown on first sync after updating
+
+**Root Cause**: The pagination implementation changed all API endpoint URLs by adding query parameters:
+- **Before**: `/user/following`, `/users/bob/starred`
+- **After**: `/user/following?page=1&per_page=100`, `/users/bob/starred?page=1&per_page=100`
+
+**Problem**: The GitHub client uses path-based ETag caching (github/client.go:168-173). When paths change, all cached ETags become useless. This means:
+- The first sync after updating to PR #33 hits GitHub's API for **every endpoint** instead of returning 304 Not Modified
+- No benefit from conditional requests on first sync
+- For users following 100 people, this is 1 + (3 × 100) = **301 API calls** instead of potentially just a few
+
+**Evidence**: See cache lookup in github/client.go:168-173
+
+### 2. Extra API Call for Exactly 100 Items ⚠️
+
+**Impact**: Moderate slowdown for users with exactly 100 items in any category
+
+**Root Cause**: The pagination logic in `getPaginated()` (github/client.go:301-333) continues fetching when it gets exactly `perPage` items:
+
+```go
+// If we got fewer results than per_page, this is the last page
+if pageSlice.Len() < perPage {
+    break
+}
+```
+
+**Problem**: When a result set has exactly 100 items (or any multiple of 100), the code:
+1. Fetches page 1 → gets 100 items
+2. Sees 100 == 100 (perPage), so continues
+3. Fetches page 2 → gets 0 items
+4. Only then stops
+
+**Example Scenario**: A user who follows exactly 100 people:
+- Expected: 1 API call
+- Actual: 2 API calls (100% increase)
+
+**Per-User Impact**: For each followed user, we call 3 methods:
+- `GetStarredReposByUsername`
+- `GetOwnedReposByUsername`
+- `GetRecentEvents`
+
+If any of these returns exactly 100 items, that's an extra API call. For 100 followed users, this could add up to 300 extra API calls in the worst case.
+
+**Evidence**: Test in github/client_test.go:637-690 explicitly expects 2 requests for exactly 100 items (see comment on line 687)
+
+### 3. No Incremental Caching for Multi-Page Results
+
+**Impact**: Minor but cumulative
+
+**Root Cause**: GitHub only returns Link headers for pagination, not stable ETags across page boundaries.
+
+**Problem**: Users with > 100 items in any category can't benefit from conditional requests for pages beyond the first. Each page is a new URL that's never been cached.
+
+**Example**: A user with 250 followed users:
+- Page 1: `/user/following?page=1&per_page=100` (can be cached)
+- Page 2: `/user/following?page=2&per_page=100` (new path, no cache benefit)
+- Page 3: `/user/following?page=3&per_page=100` (new path, no cache benefit)
+
+## Real-World Impact
+
+### Single User Scenario (following 5 people)
+
+**Before PR #33**:
+- Fetch followed users: 1 API call
+- For each of 5 users: 3 API calls
+- Total: 16 API calls
+- With ETag caching on subsequent syncs: potentially just 16 × 304 responses
+
+**After PR #33** (first sync):
+- Fetch followed users: 1 API call (cache miss due to path change)
+- For each of 5 users: 3 API calls (all cache misses)
+- Total: 16 API calls **with no cache hits** → full data transfers
+
+**After PR #33** (subsequent syncs):
+- Same as before (16 calls), but paths are now cached again
+- Performance returns to normal after first sync
+
+### User with Exactly 100 Followed Users
+
+**Before PR #33**:
+- Total API calls: ~301 (1 + 3×100)
+
+**After PR #33**:
+- Followed users endpoint: 2 calls (100 users, then check page 2)
+- Per user: potentially 2 calls if starred repos = 100, 2 if owned repos = 100, 2 if events = 100
+- Worst case: 2 + (6 × 100) = **602 API calls** (100% increase!)
+- Typical case: 2 + (4 × 100) = **402 API calls** (33% increase)
+
+## Proposed Fixes
+
+### Fix #1: Preserve Cache Keys (Migration Strategy)
+
+**Option A - Maintain Backward Compatibility**:
+- Store cache entries for both old and new path formats during transition
+- Check both `/user/following` and `/user/following?page=1&per_page=100` caches
+- Gradually migrate over time
+
+**Option B - Accept One-Time Cache Miss**:
+- Document that first sync after update will be slower
+- This is a one-time cost and resolves itself automatically
+
+**Recommendation**: Option B with documentation. The cache miss is a one-time event and fixing it adds complexity.
+
+### Fix #2: Optimize Exactly-100 Detection ⭐ PRIORITY
+
+**Current Logic**:
+```go
+if pageSlice.Len() < perPage {
+    break
+}
+```
+
+**Proposed Fix**: Use GitHub's Link header to detect if there's a next page, instead of assuming there is when we get exactly `perPage` items.
+
+**Alternative Fix** (simpler): Special case handling for exactly perPage:
+```go
+// If we got fewer than per_page, this is the last page
+if pageSlice.Len() < perPage {
+    break
+}
+
+// If we got exactly per_page items, peek at the next page
+// but only if we haven't made too many requests
+if pageSlice.Len() == perPage {
+    // Could check Link header here instead of making another request
+    // GitHub provides: Link: <url?page=2>; rel="next"
+}
+```
+
+**Impact**: Reduces API calls by up to 50% for users with exactly 100 items in any category.
+
+### Fix #3: Implement Link Header Pagination ⭐ BEST SOLUTION
+
+**Proper Approach**: Parse GitHub's `Link` header to determine if there are more pages:
+
+```go
+// Check for Link header with rel="next"
+linkHeader := resp.Header.Get("Link")
+hasNext := strings.Contains(linkHeader, `rel="next"`)
+
+if !hasNext || pageSlice.Len() == 0 {
+    break
+}
+```
+
+**Benefits**:
+- Eliminates extra API calls for exactly 100 items
+- More reliable than guessing based on result count
+- Standard GitHub API practice
+- No assumptions about page size
+
+**Impact**: Fixes issue #2 completely, reducing unnecessary API calls by up to 300 for users following many people.
+
+## Recommendation
+
+1. **Immediate**: Implement Link header pagination (Fix #3) - this is the proper solution and fixes issue #2 completely
+2. **Document**: Note that first sync after update will be slower due to cache invalidation (issue #1)
+3. **Consider**: Future optimization for multi-page caching (issue #3) only if it becomes a bottleneck
+
+## Test Case to Add
+
+```go
+func TestGetPaginatedNoExtraCallForExactly100Items(t *testing.T) {
+    // Test that we don't make an extra API call when exactly 100 items exist
+    // Should use Link header to detect end of pagination
+    repos := make([]Repository, 100)
+    for i := 0; i < 100; i++ {
+        repos[i] = Repository{ID: int64(i), Name: fmt.Sprintf("repo%d", i)}
+    }
+
+    requestCount := 0
+    server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+        requestCount++
+        page := r.URL.Query().Get("page")
+
+        if page == "1" {
+            w.Header().Set("Content-Type", "application/json")
+            // No Link header with rel="next" means this is the last page
+            json.NewEncoder(w).Encode(repos)
+        } else {
+            t.Errorf("should not request page %s when page 1 has no next link", page)
+        }
+    }))
+    defer server.Close()
+
+    c := NewClient("token", WithBaseURL(server.URL))
+    result, err := c.GetStarredRepos(context.Background())
+
+    require.NoError(t, err)
+    assert.Equal(t, 100, len(result))
+    assert.Equal(t, 1, requestCount, "should only make 1 request when Link header indicates no next page")
+}
+```
+
+## Files to Modify
+
+1. **github/client.go**: Update `getPaginated()` to check Link header (lines 286-338)
+2. **github/client.go**: Update `get()` method to return Link header or parse it
+3. **github/client_test.go**: Update test expectations (line 687-689)
+4. **github/client_test.go**: Add new test for Link header pagination
+
+## API Call Comparison
+
+| Scenario | Before PR #33 | After PR #33 (current) | After Fix |
+|----------|---------------|------------------------|-----------|
+| 5 followed users (first sync) | 16 | 16 | 16 |
+| 5 followed users (cached) | 16 (mostly 304s) | 16 (mostly 304s after first) | 16 (mostly 304s) |
+| 100 followed users, all with ~50 items each | 301 | 301 | 301 |
+| 100 followed users, all with exactly 100 items | 301 | 602 | 301 |
+| **100 followed users, 50 with exactly 100 items** | **301** | **451** | **301** |
+
+**Worst case impact**: Currently adding **150 extra API calls** (50% increase) for users in the exactly-100-items scenario.
diff --git a/PAGINATION_FIX_POC.md b/PAGINATION_FIX_POC.md
new file mode 100644
index 0000000..7e94831
--- /dev/null
+++ b/PAGINATION_FIX_POC.md
@@ -0,0 +1,264 @@
+# Proof of Concept: Link Header Pagination Fix
+
+## Problem
+
+Current code makes an extra API call when result set has exactly 100 items, because it can't distinguish between:
+- "We got 100 items and there are more"
+- "We got 100 items and that's all there is"
+
+## Solution
+
+Use GitHub's `Link` header which explicitly tells us if there's a next page.
+
+## Implementation
+
+### Step 1: Update `get()` method to return Link header
+
+```go
+// get makes an HTTP GET request and returns the response headers along with the decoded result
+func (c *Client) get(ctx context.Context, path string, result any) (*http.Header, error) {
+    // ... existing code ...
+
+    resp, err := c.httpClient.Do(req)
+    if err != nil {
+        return nil, fmt.Errorf("executing request: %w", err)
+    }
+    defer func() { _ = resp.Body.Close() }()
+
+    // Parse and store rate limit headers
+    c.parseRateLimitHeaders(resp)
+
+    // Handle 304 Not Modified - return cached data
+    if resp.StatusCode == http.StatusNotModified && cached != nil {
+        c.logger.Debug("using cached response",
+            "path", path,
+            "etag", cached.etag,
+        )
+        if result != nil {
+            if unmarshalErr := json.Unmarshal(cached.data, result); unmarshalErr != nil {
+                return nil, fmt.Errorf("decoding cached response: %w", unmarshalErr)
+            }
+        }
+        return &resp.Header, nil  // Return headers even for cached responses
+    }
+
+    // ... existing code for non-cached responses ...
+
+    if result != nil {
+        if err := json.Unmarshal(body, result); err != nil {
+            return nil, fmt.Errorf("decoding response: %w", err)
+        }
+    }
+
+    return &resp.Header, nil  // Return headers
+}
+```
+
+### Step 2: Update `getPaginated()` to check Link header
+
+```go
+// hasNextPage checks if the Link header contains rel="next"
+func hasNextPage(linkHeader string) bool {
+    // GitHub Link header format:
+    // Link: <https://api.github.com/user/repos?page=2>; rel="next", <https://api.github.com/user/repos?page=3>; rel="last"
+    return strings.Contains(linkHeader, `rel="next"`)
+}
+
+// getPaginated fetches all pages of results for a given path.
+// It uses GitHub's Link header to detect when there are no more pages.
+func (c *Client) getPaginated(ctx context.Context, basePath string, result any) error {
+    // Use reflection to work with any slice type
+    resultVal := reflect.ValueOf(result)
+    if resultVal.Kind() != reflect.Ptr || resultVal.Elem().Kind() != reflect.Slice {
+        return fmt.Errorf("result must be a pointer to a slice")
+    }
+
+    sliceVal := resultVal.Elem()
+
+    page := 1
+    perPage := 100 // GitHub's maximum per_page value
+
+    for {
+        // Build path with pagination parameters
+        separator := "?"
+        if strings.Contains(basePath, "?") {
+            separator = "&"
+        }
+        path := fmt.Sprintf("%s%spage=%d&per_page=%d", basePath, separator, page, perPage)
+
+        // Create a new slice to hold this page's results
+        pageResult := reflect.New(sliceVal.Type()).Interface()
+
+        headers, err := c.get(ctx, path, pageResult)
+        if err != nil {
+            return err
+        }
+
+        // Get the slice value from the pointer
+        pageSlice := reflect.ValueOf(pageResult).Elem()
+
+        // If we got no results, we're done
+        if pageSlice.Len() == 0 {
+            break
+        }
+
+        // Append this page's results to the total
+        sliceVal = reflect.AppendSlice(sliceVal, pageSlice)
+
+        // Check Link header to see if there's a next page
+        // This is more reliable than checking if we got fewer than perPage items
+        linkHeader := ""
+        if headers != nil {
+            linkHeader = headers.Get("Link")
+        }
+
+        if !hasNextPage(linkHeader) {
+            // No next page indicated by GitHub, we're done
+            break
+        }
+
+        page++
+    }
+
+    // Set the final result
+    resultVal.Elem().Set(sliceVal)
+    return nil
+}
+```
+
+### Step 3: Update tests to expect 1 call for exactly 100 items
+
+```go
+func TestGetStarredReposPagination(t *testing.T) {
+    // Test with exactly 100 repos (single page, should not request page 2)
+    repos := make([]Repository, 100)
+    for i := 0; i < 100; i++ {
+        repos[i] = Repository{
+            ID:       int64(i),
+            Name:     fmt.Sprintf("repo%d", i),
+            FullName: fmt.Sprintf("owner/repo%d", i),
+        }
+    }
+
+    requestCount := 0
+    server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+        requestCount++
+
+        query := r.URL.Query()
+        page := query.Get("page")
+
+        if page == "1" {
+            w.Header().Set("Content-Type", "application/json")
+            // No Link header with rel="next" indicates this is the last page
+            // When there are more pages, GitHub includes:
+            // Link: <url?page=2>; rel="next"
+            if err := json.NewEncoder(w).Encode(repos); err != nil {
+                t.Fatalf("encoding repos: %v", err)
+            }
+        } else {
+            t.Errorf("should not request page %s when page 1 has no Link header with rel=\"next\"", page)
+        }
+    }))
+    defer server.Close()
+
+    c := NewClient("test-token", WithBaseURL(server.URL))
+    result, err := c.GetStarredRepos(context.Background())
+    if err != nil {
+        t.Fatalf("GetStarredRepos() error: %v", err)
+    }
+
+    if len(result) != 100 {
+        t.Errorf("expected 100 repos, got %d", len(result))
+    }
+
+    // Should now make exactly 1 request when Link header indicates no next page
+    if requestCount != 1 {
+        t.Errorf("expected 1 request, got %d", requestCount)
+    }
+}
+```
+
+### Step 4: Add test for multi-page with Link headers
+
+```go
+func TestGetFollowedUsersPaginationWithLinkHeaders(t *testing.T) {
+    // Test proper Link header pagination for 202 users across 3 pages
+    page1Users := make([]User, 100)
+    for i := 0; i < 100; i++ {
+        page1Users[i] = User{Login: fmt.Sprintf("user%d", i), ID: int64(i)}
+    }
+
+    page2Users := make([]User, 100)
+    for i := 0; i < 100; i++ {
+        page2Users[i] = User{Login: fmt.Sprintf("user%d", i+100), ID: int64(i + 100)}
+    }
+
+    page3Users := []User{
+        {Login: "user200", ID: 200},
+        {Login: "user201", ID: 201},
+    }
+
+    server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+        query := r.URL.Query()
+        page := query.Get("page")
+
+        w.Header().Set("Content-Type", "application/json")
+
+        switch page {
+        case "1":
+            // Page 1 has a next page
+            w.Header().Set("Link", `<http://test/user/following?page=2&per_page=100>; rel="next"`)
+            json.NewEncoder(w).Encode(page1Users)
+        case "2":
+            // Page 2 has a next page
+            w.Header().Set("Link", `<http://test/user/following?page=3&per_page=100>; rel="next"`)
+            json.NewEncoder(w).Encode(page2Users)
+        case "3":
+            // Page 3 is the last page (no next link)
+            json.NewEncoder(w).Encode(page3Users)
+        default:
+            t.Errorf("unexpected page: %s", page)
+        }
+    }))
+    defer server.Close()
+
+    c := NewClient("test-token", WithBaseURL(server.URL))
+    result, err := c.GetFollowedUsers(context.Background())
+    if err != nil {
+        t.Fatalf("GetFollowedUsers() error: %v", err)
+    }
+
+    if len(result) != 202 {
+        t.Errorf("expected 202 users, got %d", len(result))
+    }
+}
+```
+
+## Benefits of This Fix
+
+1. **Eliminates extra API call**: When there are exactly 100 items, we only make 1 request instead of 2
+2. **More reliable**: Uses GitHub's explicit pagination signal instead of guessing
+3. **Standard practice**: This is how GitHub recommends handling pagination
+4. **Backward compatible**: Existing behavior is preserved, just more efficient
+
+## Impact
+
+For a user following 100 people where 30% have exactly 100 items in some category:
+- **Before fix**: ~391 API calls (301 base + 90 extra)
+- **After fix**: ~301 API calls (no extra calls)
+- **Savings**: 23% reduction in API calls
+
+## Files to Modify
+
+1. `github/client.go` - Update `get()` return signature and `getPaginated()` logic
+2. `github/client_test.go` - Update test expectations and add Link header tests
+3. All call sites of `get()` method - Update to handle returned headers (or ignore them)
+
+## Note on Compatibility
+
+The `get()` method signature change will require updating all callers. Alternative approach:
+- Create `getWithHeaders()` method
+- Keep `get()` unchanged
+- Use `getWithHeaders()` only in `getPaginated()`
+
+This minimizes the change surface area.