mbx-getstock-aws-puppeteer

Serverless Stock Data Collector — Powered by AWS & Puppeteer

Introduction

mbx-getstock-aws-puppeteer is a fully automated, serverless solution that gathers real-time financial data from the web. Hosted entirely on AWS, it accepts a stock symbol via a REST endpoint, navigates Yahoo Finance using a headless browser, extracts the current stock price, and captures a high-resolution screenshot — all without managing a single server.

Technology Stack

Layer	Service	Purpose
Compute	AWS Lambda (Node.js)	Executes scraping logic on-demand
Orchestration	Amazon EventBridge	Schedules periodic invocations
API	Amazon API Gateway	Exposes public REST endpoint
Storage	Amazon S3	Stores webpage screenshot captures
Database	Amazon DynamoDB	Stores stock price records
Automation	Puppeteer + `chrome-aws-lambda`	Headless browser for data extraction

Architecture Overview

Data analytics solutions rely on rich data sources to hydrate data lakes and warehouses. When data isn't available through structured APIs, web scraping becomes a practical alternative. Puppeteer — originally built for automated browser testing — is a powerful tool for web data capture.

By deploying Puppeteer on AWS Lambda, this solution:

Scales automatically — no idle servers, pure on-demand compute
Stores durably — screenshots in S3, price records in DynamoDB
Runs periodically — EventBridge cron rules trigger the pipeline on any schedule

How It Works

A single API call kicks off the entire pipeline:

GET https://<api-id>.execute-api.<region>.amazonaws.com/dev/stock/{SYMBOL}

[EventBridge / Browser]
        │
        ▼
[API Gateway REST Endpoint]
        │
        ▼
[Lambda Function (Node.js + Puppeteer)]
        │
        ├──▶ [Yahoo Finance] ──scrape──▶ stock price + screenshot
        │
        ├──▶ [Amazon S3] ──save──▶ webpage screenshot (.jpg)
        │
        └──▶ [Amazon DynamoDB] ──save──▶ { timestamp, symbol, price, s3_link }

The key extraction is a single page.evaluate() call:

price = await page.evaluate(() =>
  document.querySelector(
    "#quote-header-info > div.Pos\\(r\\) > div > div > span"
  ).textContent
);

Setup Guide

1. Configure the Lambda Function

Before deploying, open index.js and update these two constants with your own resource names:

const dbname    = 'your-dynamodb-table-name';
const dstBucket = 'your-s3-bucket-name';

2. Create AWS Resources

Log in to your AWS account and provision the following:

S3 Bucket — any unique name; this stores screenshot captures
DynamoDB Table — partition key: timestamp (type: String), all other settings default

3. Create Lambda Layer

Build and upload the chrome-aws-lambda binary as a Lambda Layer. Follow the official guide: github.com/alixaxel/chrome-aws-lambda

4. Create the Lambda Function

Create a Lambda function with the following settings:

Setting	Value
Runtime	Node.js 12.x
Timeout	3 minutes
Memory	2048 MB

Copy and paste the contents of index.js into the Lambda code editor.

Then update the Lambda IAM execution role to grant access to DynamoDB and S3.

5. Configure API Gateway

Create a REST API with the following resource path structure:

/stock/{symbol}   →  GET  →  Lambda Integration

Set up Lambda proxy integration for the endpoint:

6. Schedule with EventBridge

Create one EventBridge Rule per stock symbol. Set the API Gateway resource as the target and configure a cron expression or rate expression.

Example: rate(15 minutes) — invokes the API every 15 minutes for a given symbol.

Follow AWS best practices: apply least-privilege IAM permissions and enable encryption at rest and in transit.

Testing the Solution

Via Browser

Navigate to your API Gateway endpoint with any stock symbol:

https://<account-id>.execute-api.us-east-1.amazonaws.com/dev/stock/IBM

Via API Gateway Test Panel

Use the built-in test console to pass a stock symbol directly. Note: requests may occasionally time out — consider adding retry logic.

Results

After a successful invocation, you'll find:

DynamoDB — a new record with timestamp, stock_label, stock_value, and an s3_link:

S3 — a screenshot file named <ISO-timestamp>-<SYMBOL>.jpg:

Captured webpage screenshot (e.g. AMZN):

Summary

mbx-getstock-aws-puppeteer is a blueprint for serverless web scraping on AWS. By pairing Lambda with Puppeteer, it eliminates the overhead of managing persistent scraping servers while offering seamless scaling through AWS's on-demand model. DynamoDB and S3 provide durable, queryable storage for both structured price data and visual page captures — making this pattern well-suited for feeding analytical pipelines and building long-running financial datasets.

References

_{Built with Node.js · AWS Lambda · Puppeteer · DynamoDB · S3 · EventBridge}

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md
index.js		index.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mbx-getstock-aws-puppeteer

Serverless Stock Data Collector — Powered by AWS & Puppeteer

Table of Contents

Introduction

Technology Stack

Architecture Overview

How It Works

Setup Guide

1. Configure the Lambda Function

2. Create AWS Resources

3. Create Lambda Layer

4. Create the Lambda Function

5. Configure API Gateway

6. Schedule with EventBridge

Testing the Solution

Results

Summary

References

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mbx-getstock-aws-puppeteer

Serverless Stock Data Collector — Powered by AWS & Puppeteer

Table of Contents

Introduction

Technology Stack

Architecture Overview

How It Works

Setup Guide

1. Configure the Lambda Function

2. Create AWS Resources

3. Create Lambda Layer

4. Create the Lambda Function

5. Configure API Gateway

6. Schedule with EventBridge

Testing the Solution

Results

Summary

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages