Just like Gmail, the Outlook handler had pagination limits that prevented it from processing all matching emails!
params = {
'$filter': filter_query,
'$top': 100 # Only first 100 emails!
}With large mailboxes, this meant:
- ❌ Only 100 promotions processed (even if you have 1000+)
- ❌ Only 100 job alerts processed
- ❌ Only 100 social media notifications processed
- ❌ Missing thousands of emails that should be cleaned up
while True:
# Process batch of 100
# Get next page link
next_link = data.get('@odata.nextLink')
if not next_link:
break # All pages processed!Now processes ALL matching emails across all pages! 🎉
Microsoft Graph API uses OData pagination:
GET https://graph.microsoft.com/v1.0/me/messages?$top=100&$filter=...{
"value": [ /* 100 messages */ ],
"@odata.nextLink": "https://graph.microsoft.com/v1.0/me/messages?$skip=100&$top=100&$filter=..."
}GET https://graph.microsoft.com/v1.0/me/messages?$skip=100&$top=100&$filter=...{
"value": [ /* last batch of messages */ ]
// No @odata.nextLink = we're done!
}Before:
# Only gets first 100 emails
response = requests.get(url, headers=headers, params=params)
messages = response.json().get('value', [])
# Process 100 messages and stopAfter:
while True:
# Get batch of 100
if next_link:
response = requests.get(next_link, headers=headers)
else:
response = requests.get(url, headers=headers, params=params)
messages = data.get('value', [])
# Process all messages in batch
# Get next page
next_link = data.get('@odata.nextLink')
if not next_link:
break # All pages done!Same pagination logic added to archiving operations.
Now continues fetching until:
- 100 important emails found (limit for calendar events), OR
- No more pages available
All three core methods now support pagination:
| Method | Before | After |
|---|---|---|
_delete_emails_by_filter |
First 100 only | All matching emails |
_archive_emails_by_filter |
First 100 only | All matching emails |
_find_important_emails |
First 50 only | Up to 100 (with pagination) |
Just like Gmail, you'll now see real-time progress:
Searching for old promotional emails...
Processing batch of 100 messages (total found so far: 100)...
Progress: 100 messages processed...
Processing batch of 100 messages (total found so far: 200)...
Progress: 200 messages processed...
Processing batch of 100 messages (total found so far: 300)...
Progress: 300 messages processed...
Processing batch of 87 messages (total found so far: 387)...
Would delete 387 old promotional emails
Outlook Processing Summary:
Promotions deleted: 100 ← Hit limit!
Job alerts deleted: 100 ← Hit limit!
Social media deleted: 100 ← Hit limit!
App acknowledgements: 100 ← Hit limit!
Outlook Processing Summary:
Promotions deleted: 428 ← All found!
Job alerts deleted: 419 ← All found!
Social media deleted: 892 ← All found!
App acknowledgements: 156 ← All found!
- Throttling: 10,000 requests per 10 minutes per user
- Typical operations: 1-2 requests per email (fetch + delete/archive)
- For 1000 emails: ~2000 requests = well under limit
- Small mailbox (500 emails): 1-2 minutes
- Medium mailbox (2000 emails): 5-10 minutes
- Large mailbox (10,000 emails): 30-60 minutes
If you hit rate limits, you'll see:
Error fetching old promotions: 429
429 = Too Many Requests
Current implementation will:
- Log the error
- Stop processing that category
- Continue with other categories
Future enhancement: Could add automatic retry with exponential backoff.
Outlook API processes in batches of 100:
'$top': 100 # Max per page for OutlookWhy 100?
- Microsoft Graph recommended batch size
- Good balance between:
- Network overhead (fewer requests)
- Memory usage (not too many at once)
- Progress visibility (see updates frequently)
Could increase to 500?
- No! Graph API typically limits to 100-200
- Requests with $top > 999 will fail
- 100 is the sweet spot
cd ~/email-manager
source venv/bin/activate
python email_manager.py --provider outlookLook for output like:
Processing batch of 100 messages (total found so far: 100)...
Processing batch of 100 messages (total found so far: 200)...
Processing batch of 100 messages (total found so far: 300)...
If you see this pattern, pagination is working!
Before fix:
Promotions deleted: 100 ← Suspiciously round
Job alerts deleted: 100 ← Suspiciously round
After fix:
Promotions deleted: 387 ← Real number!
Job alerts deleted: 219 ← Real number!
- Next link tracking
next_link = None
total_found = 0- Pagination loop
while True:
if next_link:
response = requests.get(next_link, headers=headers)
else:
response = requests.get(url, headers=headers, params=params)
# Process batch
next_link = data.get('@odata.nextLink')
if not next_link:
break- Progress logging
total_found += len(messages)
logger.info(f"Processing batch of {len(messages)} (total: {total_found})...")
if processed % 100 == 0:
logger.info(f"Progress: {processed} messages processed...")results = self.service.users().messages().list(
userId='me',
q=query,
maxResults=500,
pageToken=page_token # Gmail's pagination token
).execute()
page_token = results.get('nextPageToken')response = requests.get(url, headers=headers, params={
'$filter': filter_query,
'$top': 100
})
data = response.json()
next_link = data.get('@odata.nextLink') # Outlook's next page URLBoth achieve the same result: Process ALL matching emails!
Check:
-
Are you using the updated handler?
grep "@odata.nextLink" ~/email-manager/outlook_handler.py
Should return matches!
-
Do you actually have more than 100?
- Run query manually to verify
- Check different date ranges
This is normal for large mailboxes!
- Outlook API processes ~2-5 emails/second
- 1000 emails = 3-8 minutes
- Be patient and watch the progress logs
You're processing too fast!
- Outlook has rate limits
- Wait a few minutes
- Run again
- Consider processing one category at a time
Long-running operation timed out:
- Your network connection dropped
- Run again - it will continue where filters allow
- Consider shorter date ranges
You might think: "Why not use $skip to paginate?"
# DON'T DO THIS:
for skip in range(0, 10000, 100):
params = {'$skip': skip, '$top': 100}Problems:
- ❌ Inefficient: Each request still scans from the start
- ❌ Unreliable: Emails can be deleted mid-pagination, causing skips
- ❌ Not recommended: Microsoft explicitly says use @odata.nextLink
Use @odata.nextLink instead:
- ✅ Efficient: Server maintains cursor position
- ✅ Reliable: Consistent results even if data changes
- ✅ Recommended: Official Microsoft best practice
while True:
# Process batch
if not next_link:
breakAlternative (BAD):
# Get all emails first, then process
all_emails = []
while next_link:
all_emails.extend(get_batch())
# Now process all_emailsWhy the loop approach is better:
- ✅ Memory efficient: Don't load 10,000 emails into memory
- ✅ Faster feedback: See progress immediately
- ✅ Failure recovery: If it crashes, some work is done
- ✅ Rate limit friendly: Can add delays between batches
- ❌ Only processed first 100 emails per category
- ❌ Missed thousands of emails in large mailboxes
- ❌ No progress tracking
- ❌ Suspiciously round numbers (100, 100, 100)
- ✅ Processes ALL matching emails
- ✅ Handles mailboxes of any size
- ✅ Real-time progress tracking
- ✅ Accurate totals (387, 219, 892, etc.)
Before: "Found 100 old promotions"
After: "Processing batch of 100... (total: 100)
Processing batch of 100... (total: 200)
Processing batch of 100... (total: 300)
Processing batch of 87... (total: 387)
Would delete 387 old promotions"
cd ~/email-manager
# Backup current file
cp outlook_handler.py outlook_handler.py.backup
# Copy updated file
cp /mnt/user-data/outputs/outlook_handler.py ~/email-manager/
# Test with dry-run
source venv/bin/activate
python email_manager.py --provider outlook
# Look for pagination in logs
grep "Processing batch" email_manager.log
# When confident
python email_manager.py --provider outlook --live🎉 Complete email processing - No more missed emails
🎉 Progress visibility - Know exactly what's happening
🎉 Large mailbox support - Handles thousands of emails
🎉 Better statistics - Accurate counts, not limits
Your Outlook account will now be just as thoroughly cleaned as Gmail! 🚀