feat: add Playwright connector for Amazon profile and order history#18
Open
letonchanh wants to merge 4 commits intomainfrom
Open
feat: add Playwright connector for Amazon profile and order history#18letonchanh wants to merge 4 commits intomainfrom
letonchanh wants to merge 4 commits intomainfrom
Conversation
Add Amazon connector that exports profile info (name, email, Prime status) and full order history with per-item prices via DOM scraping. Two-phase architecture: - Phase 1 (visible browser): Manual login with CAPTCHA/2FA support - Phase 2 (headless): Scrape account settings + paginated order history Key design decisions: - Uses two-step session verification (nav bar check + orders page redirect) to handle stale cookies that make the nav bar show "Hello, Name" even with expired sessions - Uses innerText instead of textContent for delivery status to avoid matching JS keywords from embedded <script> tags - Fetches each order's detail page to get per-item prices, since the order list page only shows order totals - Year-by-year extraction via the time filter dropdown with pagination Also adds data-connect as a playwright-runner search path in test-connector. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Email requires re-authentication on Amazon's account pages which cannot be done in headless mode. Remove email field from profile scope entirely rather than returning empty values. Fix isPrime detection: old selectors matched promotional "Try Prime" elements present for non-members. Now checks for actual membership indicators like "Your Prime" or "Prime Benefits" text. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After goHeadless() the browser starts on a blank page with no DOM, so the nav bar greeting selector found nothing. Add page.goto to load amazon.com first so the nav bar is available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Member
Author
|
@copilot Review the PR |
|
@letonchanh I've opened a new pull request, #19, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
data-connectas a playwright-runner search path intest-connector.cjsDesign Decisions
Session Verification
Amazon aggressively caches cookies — the nav bar shows "Hello, Name" even with expired sessions. The connector uses a two-step verification: quick nav bar check → deep check by navigating to
/your-orders/ordersand detecting sign-in redirects.Per-Item Price Fetching
Amazon's order list page (
/your-orders/orders) only shows the order total, not individual item prices. To get per-item prices, the connector fetches each order's detail page (/gp/your-account/order-details?orderID=xxx) after collecting the order list. This adds ~1.5s per order but provides complete price data.Delivery Status Extraction
Uses
innerTextinstead oftextContentto match delivery status patterns, becausetextContentincludes embedded<script>tag content (e.g., JSreturn;keyword) that false-matches the "Return" status pattern.DOM Scraping Approach
Amazon uses server-rendered HTML with no clean JSON APIs and aggressive A/B testing of DOM structure. The connector uses text-based regex matching on card content (order ID pattern
###-#######-#######, date pattern, price pattern) rather than brittle CSS selectors for metadata extraction.Files Changed
amazon/amazon-playwright.jsamazon/amazon-playwright.jsonschemas/amazon.profile.jsonschemas/amazon.orders.jsonregistry.jsontest-connector.cjsdata-connectrunner pathTest plan
node test-connector.cjs ./amazon/amazon-playwright.js --headedand log in manuallyamazon.profileandamazon.ordersschemas🤖 Generated with Claude Code