From c84a6b33da3eb45f5fe6fe32bf4eed25c667740a Mon Sep 17 00:00:00 2001 From: Justin Abrahms Date: Thu, 22 Jan 2026 11:02:42 -0800 Subject: [PATCH] Document pagination performance issues from PR #33 Investigation reveals three issues causing slowdown: 1. Cache invalidation (one-time): URL changes invalidated ETags 2. Extra API calls for exactly 100 items: Main issue causing 23-50% slowdown for affected users 3. No multi-page caching: Minor cumulative impact The main culprit is issue #2: current logic makes an extra API call when result sets have exactly 100 items because it can't distinguish between 'exactly 100 total' and '100 with more pages'. Solution: Use GitHub's Link header to detect pagination instead of guessing based on result count. Files: - PAGINATION_ANALYSIS.md: Complete technical analysis with impact tables - PAGINATION_FIX_POC.md: Implementation guide showing how to fix issue #2 For users following 100 people where 30% have exactly 100 items: - Current: ~391 API calls - After fix: ~301 API calls (23% reduction) --- PAGINATION_ANALYSIS.md | 222 ++++++++++++++++++++++++++++++++++ PAGINATION_FIX_POC.md | 264 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 486 insertions(+) create mode 100644 PAGINATION_ANALYSIS.md create mode 100644 PAGINATION_FIX_POC.md diff --git a/PAGINATION_ANALYSIS.md b/PAGINATION_ANALYSIS.md new file mode 100644 index 0000000..14406c2 --- /dev/null +++ b/PAGINATION_ANALYSIS.md @@ -0,0 +1,222 @@ +# Pagination Performance Analysis for PR #33 + +## Summary + +After investigating the pagination changes in PR #33, I've identified three performance issues that cause syncs to be slower, especially noticeable in certain scenarios. + +## Issues Identified + +### 1. Cache Invalidation (First Sync After Update) ⚠️ + +**Impact**: Severe slowdown on first sync after updating + +**Root Cause**: The pagination implementation changed all API endpoint URLs by adding query parameters: +- **Before**: `/user/following`, `/users/bob/starred` +- **After**: `/user/following?page=1&per_page=100`, `/users/bob/starred?page=1&per_page=100` + +**Problem**: The GitHub client uses path-based ETag caching (github/client.go:168-173). When paths change, all cached ETags become useless. This means: +- The first sync after updating to PR #33 hits GitHub's API for **every endpoint** instead of returning 304 Not Modified +- No benefit from conditional requests on first sync +- For users following 100 people, this is 1 + (3 × 100) = **301 API calls** instead of potentially just a few + +**Evidence**: See cache lookup in github/client.go:168-173 + +### 2. Extra API Call for Exactly 100 Items ⚠️ + +**Impact**: Moderate slowdown for users with exactly 100 items in any category + +**Root Cause**: The pagination logic in `getPaginated()` (github/client.go:301-333) continues fetching when it gets exactly `perPage` items: + +```go +// If we got fewer results than per_page, this is the last page +if pageSlice.Len() < perPage { + break +} +``` + +**Problem**: When a result set has exactly 100 items (or any multiple of 100), the code: +1. Fetches page 1 → gets 100 items +2. Sees 100 == 100 (perPage), so continues +3. Fetches page 2 → gets 0 items +4. Only then stops + +**Example Scenario**: A user who follows exactly 100 people: +- Expected: 1 API call +- Actual: 2 API calls (100% increase) + +**Per-User Impact**: For each followed user, we call 3 methods: +- `GetStarredReposByUsername` +- `GetOwnedReposByUsername` +- `GetRecentEvents` + +If any of these returns exactly 100 items, that's an extra API call. For 100 followed users, this could add up to 300 extra API calls in the worst case. + +**Evidence**: Test in github/client_test.go:637-690 explicitly expects 2 requests for exactly 100 items (see comment on line 687) + +### 3. No Incremental Caching for Multi-Page Results + +**Impact**: Minor but cumulative + +**Root Cause**: GitHub only returns Link headers for pagination, not stable ETags across page boundaries. + +**Problem**: Users with > 100 items in any category can't benefit from conditional requests for pages beyond the first. Each page is a new URL that's never been cached. + +**Example**: A user with 250 followed users: +- Page 1: `/user/following?page=1&per_page=100` (can be cached) +- Page 2: `/user/following?page=2&per_page=100` (new path, no cache benefit) +- Page 3: `/user/following?page=3&per_page=100` (new path, no cache benefit) + +## Real-World Impact + +### Single User Scenario (following 5 people) + +**Before PR #33**: +- Fetch followed users: 1 API call +- For each of 5 users: 3 API calls +- Total: 16 API calls +- With ETag caching on subsequent syncs: potentially just 16 × 304 responses + +**After PR #33** (first sync): +- Fetch followed users: 1 API call (cache miss due to path change) +- For each of 5 users: 3 API calls (all cache misses) +- Total: 16 API calls **with no cache hits** → full data transfers + +**After PR #33** (subsequent syncs): +- Same as before (16 calls), but paths are now cached again +- Performance returns to normal after first sync + +### User with Exactly 100 Followed Users + +**Before PR #33**: +- Total API calls: ~301 (1 + 3×100) + +**After PR #33**: +- Followed users endpoint: 2 calls (100 users, then check page 2) +- Per user: potentially 2 calls if starred repos = 100, 2 if owned repos = 100, 2 if events = 100 +- Worst case: 2 + (6 × 100) = **602 API calls** (100% increase!) +- Typical case: 2 + (4 × 100) = **402 API calls** (33% increase) + +## Proposed Fixes + +### Fix #1: Preserve Cache Keys (Migration Strategy) + +**Option A - Maintain Backward Compatibility**: +- Store cache entries for both old and new path formats during transition +- Check both `/user/following` and `/user/following?page=1&per_page=100` caches +- Gradually migrate over time + +**Option B - Accept One-Time Cache Miss**: +- Document that first sync after update will be slower +- This is a one-time cost and resolves itself automatically + +**Recommendation**: Option B with documentation. The cache miss is a one-time event and fixing it adds complexity. + +### Fix #2: Optimize Exactly-100 Detection ⭐ PRIORITY + +**Current Logic**: +```go +if pageSlice.Len() < perPage { + break +} +``` + +**Proposed Fix**: Use GitHub's Link header to detect if there's a next page, instead of assuming there is when we get exactly `perPage` items. + +**Alternative Fix** (simpler): Special case handling for exactly perPage: +```go +// If we got fewer than per_page, this is the last page +if pageSlice.Len() < perPage { + break +} + +// If we got exactly per_page items, peek at the next page +// but only if we haven't made too many requests +if pageSlice.Len() == perPage { + // Could check Link header here instead of making another request + // GitHub provides: Link: ; rel="next" +} +``` + +**Impact**: Reduces API calls by up to 50% for users with exactly 100 items in any category. + +### Fix #3: Implement Link Header Pagination ⭐ BEST SOLUTION + +**Proper Approach**: Parse GitHub's `Link` header to determine if there are more pages: + +```go +// Check for Link header with rel="next" +linkHeader := resp.Header.Get("Link") +hasNext := strings.Contains(linkHeader, `rel="next"`) + +if !hasNext || pageSlice.Len() == 0 { + break +} +``` + +**Benefits**: +- Eliminates extra API calls for exactly 100 items +- More reliable than guessing based on result count +- Standard GitHub API practice +- No assumptions about page size + +**Impact**: Fixes issue #2 completely, reducing unnecessary API calls by up to 300 for users following many people. + +## Recommendation + +1. **Immediate**: Implement Link header pagination (Fix #3) - this is the proper solution and fixes issue #2 completely +2. **Document**: Note that first sync after update will be slower due to cache invalidation (issue #1) +3. **Consider**: Future optimization for multi-page caching (issue #3) only if it becomes a bottleneck + +## Test Case to Add + +```go +func TestGetPaginatedNoExtraCallForExactly100Items(t *testing.T) { + // Test that we don't make an extra API call when exactly 100 items exist + // Should use Link header to detect end of pagination + repos := make([]Repository, 100) + for i := 0; i < 100; i++ { + repos[i] = Repository{ID: int64(i), Name: fmt.Sprintf("repo%d", i)} + } + + requestCount := 0 + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + requestCount++ + page := r.URL.Query().Get("page") + + if page == "1" { + w.Header().Set("Content-Type", "application/json") + // No Link header with rel="next" means this is the last page + json.NewEncoder(w).Encode(repos) + } else { + t.Errorf("should not request page %s when page 1 has no next link", page) + } + })) + defer server.Close() + + c := NewClient("token", WithBaseURL(server.URL)) + result, err := c.GetStarredRepos(context.Background()) + + require.NoError(t, err) + assert.Equal(t, 100, len(result)) + assert.Equal(t, 1, requestCount, "should only make 1 request when Link header indicates no next page") +} +``` + +## Files to Modify + +1. **github/client.go**: Update `getPaginated()` to check Link header (lines 286-338) +2. **github/client.go**: Update `get()` method to return Link header or parse it +3. **github/client_test.go**: Update test expectations (line 687-689) +4. **github/client_test.go**: Add new test for Link header pagination + +## API Call Comparison + +| Scenario | Before PR #33 | After PR #33 (current) | After Fix | +|----------|---------------|------------------------|-----------| +| 5 followed users (first sync) | 16 | 16 | 16 | +| 5 followed users (cached) | 16 (mostly 304s) | 16 (mostly 304s after first) | 16 (mostly 304s) | +| 100 followed users, all with ~50 items each | 301 | 301 | 301 | +| 100 followed users, all with exactly 100 items | 301 | 602 | 301 | +| **100 followed users, 50 with exactly 100 items** | **301** | **451** | **301** | + +**Worst case impact**: Currently adding **150 extra API calls** (50% increase) for users in the exactly-100-items scenario. diff --git a/PAGINATION_FIX_POC.md b/PAGINATION_FIX_POC.md new file mode 100644 index 0000000..7e94831 --- /dev/null +++ b/PAGINATION_FIX_POC.md @@ -0,0 +1,264 @@ +# Proof of Concept: Link Header Pagination Fix + +## Problem + +Current code makes an extra API call when result set has exactly 100 items, because it can't distinguish between: +- "We got 100 items and there are more" +- "We got 100 items and that's all there is" + +## Solution + +Use GitHub's `Link` header which explicitly tells us if there's a next page. + +## Implementation + +### Step 1: Update `get()` method to return Link header + +```go +// get makes an HTTP GET request and returns the response headers along with the decoded result +func (c *Client) get(ctx context.Context, path string, result any) (*http.Header, error) { + // ... existing code ... + + resp, err := c.httpClient.Do(req) + if err != nil { + return nil, fmt.Errorf("executing request: %w", err) + } + defer func() { _ = resp.Body.Close() }() + + // Parse and store rate limit headers + c.parseRateLimitHeaders(resp) + + // Handle 304 Not Modified - return cached data + if resp.StatusCode == http.StatusNotModified && cached != nil { + c.logger.Debug("using cached response", + "path", path, + "etag", cached.etag, + ) + if result != nil { + if unmarshalErr := json.Unmarshal(cached.data, result); unmarshalErr != nil { + return nil, fmt.Errorf("decoding cached response: %w", unmarshalErr) + } + } + return &resp.Header, nil // Return headers even for cached responses + } + + // ... existing code for non-cached responses ... + + if result != nil { + if err := json.Unmarshal(body, result); err != nil { + return nil, fmt.Errorf("decoding response: %w", err) + } + } + + return &resp.Header, nil // Return headers +} +``` + +### Step 2: Update `getPaginated()` to check Link header + +```go +// hasNextPage checks if the Link header contains rel="next" +func hasNextPage(linkHeader string) bool { + // GitHub Link header format: + // Link: ; rel="next", ; rel="last" + return strings.Contains(linkHeader, `rel="next"`) +} + +// getPaginated fetches all pages of results for a given path. +// It uses GitHub's Link header to detect when there are no more pages. +func (c *Client) getPaginated(ctx context.Context, basePath string, result any) error { + // Use reflection to work with any slice type + resultVal := reflect.ValueOf(result) + if resultVal.Kind() != reflect.Ptr || resultVal.Elem().Kind() != reflect.Slice { + return fmt.Errorf("result must be a pointer to a slice") + } + + sliceVal := resultVal.Elem() + + page := 1 + perPage := 100 // GitHub's maximum per_page value + + for { + // Build path with pagination parameters + separator := "?" + if strings.Contains(basePath, "?") { + separator = "&" + } + path := fmt.Sprintf("%s%spage=%d&per_page=%d", basePath, separator, page, perPage) + + // Create a new slice to hold this page's results + pageResult := reflect.New(sliceVal.Type()).Interface() + + headers, err := c.get(ctx, path, pageResult) + if err != nil { + return err + } + + // Get the slice value from the pointer + pageSlice := reflect.ValueOf(pageResult).Elem() + + // If we got no results, we're done + if pageSlice.Len() == 0 { + break + } + + // Append this page's results to the total + sliceVal = reflect.AppendSlice(sliceVal, pageSlice) + + // Check Link header to see if there's a next page + // This is more reliable than checking if we got fewer than perPage items + linkHeader := "" + if headers != nil { + linkHeader = headers.Get("Link") + } + + if !hasNextPage(linkHeader) { + // No next page indicated by GitHub, we're done + break + } + + page++ + } + + // Set the final result + resultVal.Elem().Set(sliceVal) + return nil +} +``` + +### Step 3: Update tests to expect 1 call for exactly 100 items + +```go +func TestGetStarredReposPagination(t *testing.T) { + // Test with exactly 100 repos (single page, should not request page 2) + repos := make([]Repository, 100) + for i := 0; i < 100; i++ { + repos[i] = Repository{ + ID: int64(i), + Name: fmt.Sprintf("repo%d", i), + FullName: fmt.Sprintf("owner/repo%d", i), + } + } + + requestCount := 0 + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + requestCount++ + + query := r.URL.Query() + page := query.Get("page") + + if page == "1" { + w.Header().Set("Content-Type", "application/json") + // No Link header with rel="next" indicates this is the last page + // When there are more pages, GitHub includes: + // Link: ; rel="next" + if err := json.NewEncoder(w).Encode(repos); err != nil { + t.Fatalf("encoding repos: %v", err) + } + } else { + t.Errorf("should not request page %s when page 1 has no Link header with rel=\"next\"", page) + } + })) + defer server.Close() + + c := NewClient("test-token", WithBaseURL(server.URL)) + result, err := c.GetStarredRepos(context.Background()) + if err != nil { + t.Fatalf("GetStarredRepos() error: %v", err) + } + + if len(result) != 100 { + t.Errorf("expected 100 repos, got %d", len(result)) + } + + // Should now make exactly 1 request when Link header indicates no next page + if requestCount != 1 { + t.Errorf("expected 1 request, got %d", requestCount) + } +} +``` + +### Step 4: Add test for multi-page with Link headers + +```go +func TestGetFollowedUsersPaginationWithLinkHeaders(t *testing.T) { + // Test proper Link header pagination for 202 users across 3 pages + page1Users := make([]User, 100) + for i := 0; i < 100; i++ { + page1Users[i] = User{Login: fmt.Sprintf("user%d", i), ID: int64(i)} + } + + page2Users := make([]User, 100) + for i := 0; i < 100; i++ { + page2Users[i] = User{Login: fmt.Sprintf("user%d", i+100), ID: int64(i + 100)} + } + + page3Users := []User{ + {Login: "user200", ID: 200}, + {Login: "user201", ID: 201}, + } + + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + query := r.URL.Query() + page := query.Get("page") + + w.Header().Set("Content-Type", "application/json") + + switch page { + case "1": + // Page 1 has a next page + w.Header().Set("Link", `; rel="next"`) + json.NewEncoder(w).Encode(page1Users) + case "2": + // Page 2 has a next page + w.Header().Set("Link", `; rel="next"`) + json.NewEncoder(w).Encode(page2Users) + case "3": + // Page 3 is the last page (no next link) + json.NewEncoder(w).Encode(page3Users) + default: + t.Errorf("unexpected page: %s", page) + } + })) + defer server.Close() + + c := NewClient("test-token", WithBaseURL(server.URL)) + result, err := c.GetFollowedUsers(context.Background()) + if err != nil { + t.Fatalf("GetFollowedUsers() error: %v", err) + } + + if len(result) != 202 { + t.Errorf("expected 202 users, got %d", len(result)) + } +} +``` + +## Benefits of This Fix + +1. **Eliminates extra API call**: When there are exactly 100 items, we only make 1 request instead of 2 +2. **More reliable**: Uses GitHub's explicit pagination signal instead of guessing +3. **Standard practice**: This is how GitHub recommends handling pagination +4. **Backward compatible**: Existing behavior is preserved, just more efficient + +## Impact + +For a user following 100 people where 30% have exactly 100 items in some category: +- **Before fix**: ~391 API calls (301 base + 90 extra) +- **After fix**: ~301 API calls (no extra calls) +- **Savings**: 23% reduction in API calls + +## Files to Modify + +1. `github/client.go` - Update `get()` return signature and `getPaginated()` logic +2. `github/client_test.go` - Update test expectations and add Link header tests +3. All call sites of `get()` method - Update to handle returned headers (or ignore them) + +## Note on Compatibility + +The `get()` method signature change will require updating all callers. Alternative approach: +- Create `getWithHeaders()` method +- Keep `get()` unchanged +- Use `getWithHeaders()` only in `getPaginated()` + +This minimizes the change surface area.