- Introduction
- Technology Stack
- Architecture Overview
- How It Works
- Setup Guide
- Testing the Solution
- Results
- Summary
- References
mbx-getstock-aws-puppeteer is a fully automated, serverless solution that gathers real-time financial data from the web. Hosted entirely on AWS, it accepts a stock symbol via a REST endpoint, navigates Yahoo Finance using a headless browser, extracts the current stock price, and captures a high-resolution screenshot — all without managing a single server.
| Layer | Service | Purpose |
|---|---|---|
| Compute | AWS Lambda (Node.js) | Executes scraping logic on-demand |
| Orchestration | Amazon EventBridge | Schedules periodic invocations |
| API | Amazon API Gateway | Exposes public REST endpoint |
| Storage | Amazon S3 | Stores webpage screenshot captures |
| Database | Amazon DynamoDB | Stores stock price records |
| Automation | Puppeteer + chrome-aws-lambda |
Headless browser for data extraction |
Data analytics solutions rely on rich data sources to hydrate data lakes and warehouses. When data isn't available through structured APIs, web scraping becomes a practical alternative. Puppeteer — originally built for automated browser testing — is a powerful tool for web data capture.
By deploying Puppeteer on AWS Lambda, this solution:
- Scales automatically — no idle servers, pure on-demand compute
- Stores durably — screenshots in S3, price records in DynamoDB
- Runs periodically — EventBridge cron rules trigger the pipeline on any schedule
A single API call kicks off the entire pipeline:
GET https://<api-id>.execute-api.<region>.amazonaws.com/dev/stock/{SYMBOL}
[EventBridge / Browser]
│
▼
[API Gateway REST Endpoint]
│
▼
[Lambda Function (Node.js + Puppeteer)]
│
├──▶ [Yahoo Finance] ──scrape──▶ stock price + screenshot
│
├──▶ [Amazon S3] ──save──▶ webpage screenshot (.jpg)
│
└──▶ [Amazon DynamoDB] ──save──▶ { timestamp, symbol, price, s3_link }
The key extraction is a single page.evaluate() call:
price = await page.evaluate(() =>
document.querySelector(
"#quote-header-info > div.Pos\\(r\\) > div > div > span"
).textContent
);Before deploying, open index.js and update these two constants with your own resource names:
const dbname = 'your-dynamodb-table-name';
const dstBucket = 'your-s3-bucket-name';Log in to your AWS account and provision the following:
- S3 Bucket — any unique name; this stores screenshot captures
- DynamoDB Table — partition key:
timestamp(type:String), all other settings default
Build and upload the chrome-aws-lambda binary as a Lambda Layer. Follow the official guide:
github.com/alixaxel/chrome-aws-lambda
Create a Lambda function with the following settings:
| Setting | Value |
|---|---|
| Runtime | Node.js 12.x |
| Timeout | 3 minutes |
| Memory | 2048 MB |
Copy and paste the contents of index.js into the Lambda code editor.
Then update the Lambda IAM execution role to grant access to DynamoDB and S3.
Create a REST API with the following resource path structure:
/stock/{symbol} → GET → Lambda Integration
Set up Lambda proxy integration for the endpoint:
Create one EventBridge Rule per stock symbol. Set the API Gateway resource as the target and configure a cron expression or rate expression.
Example:
rate(15 minutes)— invokes the API every 15 minutes for a given symbol.
Follow AWS best practices: apply least-privilege IAM permissions and enable encryption at rest and in transit.
Via Browser
Navigate to your API Gateway endpoint with any stock symbol:
https://<account-id>.execute-api.us-east-1.amazonaws.com/dev/stock/IBM
Via API Gateway Test Panel
Use the built-in test console to pass a stock symbol directly. Note: requests may occasionally time out — consider adding retry logic.
After a successful invocation, you'll find:
DynamoDB — a new record with timestamp, stock_label, stock_value, and an s3_link:
S3 — a screenshot file named <ISO-timestamp>-<SYMBOL>.jpg:
Captured webpage screenshot (e.g. AMZN):
mbx-getstock-aws-puppeteer is a blueprint for serverless web scraping on AWS. By pairing Lambda with Puppeteer, it eliminates the overhead of managing persistent scraping servers while offering seamless scaling through AWS's on-demand model. DynamoDB and S3 provide durable, queryable storage for both structured price data and visual page captures — making this pattern well-suited for feeding analytical pipelines and building long-running financial datasets.
















