API Sales Data Pipeline

Project Overview

This project implements a complete data pipeline for fetching, processing, and visualizing stock market data. It demonstrates modern data engineering practices using containerization and workflow orchestration.

The pipeline automates the ETL (Extract, Transform, Load) process for multiple stock tickers, storing data in CSV files and providing an interactive dashboard for analysis.

Key features:

Automated data fetching from a public API
Data transformation and validation
Scheduled pipeline execution with Apache Airflow
Interactive dashboard with Streamlit and Plotly
Containerized deployment with Docker

Tech Stack

Python 3.9+
Apache Airflow - Workflow orchestration
Docker & Docker Compose - Containerization
PostgreSQL - Database for Airflow metadata
Streamlit - Web dashboard framework
Plotly - Interactive charts
Pandas - Data manipulation
Requests - HTTP API calls

Dependencies

The following Python packages are required:

certifi==2026.1.4
charset-normalizer==3.4.4
idna==3.11
python-dotenv==1.2.1
requests==2.32.5
urllib3==2.6.3
streamlit
plotly
pandas

Additionally, the project uses:

Apache Airflow (containerized)
PostgreSQL (containerized)

Project Structure

api_sales_data_pipeline/
│
├── README.md
├── requirements.txt
├── docker-compose.yml
├── Dockerfile
├── Dockerfile.airflow
│
├── app.py                    # Main ETL pipeline script
├── fetch_api_data.py         # Data fetching utility
├── dashboard.py              # Streamlit dashboard
├── test.py                   # Unit tests
├── api_data.csv              # Sample output CSV
│
├── airflow/
│   ├── dags/
│   │   └── api_pipeline_dag.py  # Airflow DAG definition
│   ├── config/
│   ├── logs/
│   └── plugins/
│
├── data/                     # Processed stock data CSVs
│   ├── AAPL_data.csv
│   ├── AMZN_data.csv
│   ├── FUN_data.csv
│   ├── GOOGL_data.csv
│   ├── MSFT_data.csv
│   ├── NVDA_data.csv
│   └── TSLA_data.csv
│
└── logs/                     # Application logs

Quick Start

Prerequisites

Docker and Docker Compose installed
Git

Installation & Setup

Clone the repository:

git clone https://github.com/yourusername/api-sales-data-pipeline.git
cd api-sales-data-pipeline

Start the services:
```
docker-compose up --build
```
This will:
- Build the Airflow containers
- Start PostgreSQL database
- Initialize Airflow database
- Start Airflow scheduler and webserver
- Run the data pipeline automatically
Access the services:
- Airflow Web UI: http://localhost:8080 (username: airflow, password: airflow)
- Streamlit Dashboard: Run locally after data is processed (see below)

Running the Dashboard Locally

After the pipeline has run and generated data:

Install Python dependencies:
```
pip install -r requirements.txt
```
Run the dashboard:
```
streamlit run dashboard.py
```
Access the dashboard: http://localhost:8501

Pipeline Workflow

The pipeline is orchestrated by Apache Airflow and runs the following steps:

Extract: Fetch stock data from the PocketPortfolio API for configured tickers
Transform: Clean and validate the JSON data, convert to CSV format
Load: Save processed data to individual CSV files in the data/ directory

Airflow DAG

The DAG (api_pipeline_dag.py) is scheduled to run daily but can also be triggered manually through the Airflow UI.

Data Sources

API: https://pocketportfolio.app/api/tickers
Tickers: FUN, AAPL, GOOGL, AMZN, MSFT, TSLA, NVDA

Configuration

Environment variables can be modified in docker-compose.yml:

BASE_URL: API endpoint URL
TICKERS: Comma-separated list of stock tickers

Development

Running Tests

python test.py

Manual Pipeline Execution

python app.py

Building Docker Images

docker-compose build

Example Output

The pipeline generates CSV files with the following structure:

symbol	date	open	high	low	close	volume
AAPL	2026-02-14	150.25	152.10	149.80	151.75	52847392
AAPL	2026-02-13	148.90	150.50	148.50	150.20	48273928

Troubleshooting

Common Issues

Port conflicts: Ensure ports 8080 (Airflow) and 5432 (PostgreSQL) are available
Docker build failures: Check Docker Desktop is running
Data not loading: Verify API connectivity and ticker symbols

Logs

Application logs: logs/pipeline.log
Airflow logs: airflow/logs/
Docker logs: docker-compose logs

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

This project is for educational purposes. Please check the API terms of service for commercial use.

Usage

Run the pipeline:

python fetch_api_data.py

The script will:

Fetch data from the API
Transform and clean the data
Output the results to api_data.csv

Requirements

Python 3.6+
requests library

Future Enhancements

Add error handling for API failures
Implement retry logic for failed requests
Add data validation tests
Schedule the pipeline to run automatically
Store data in a SQL database instead of CSV

License

This project is for educational purposes.

Author

Amogelang Ngene

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

API Sales Data Pipeline

Project Overview

Tech Stack

Dependencies

Project Structure

Quick Start

Prerequisites

Installation & Setup

Running the Dashboard Locally

Pipeline Workflow

Airflow DAG

Data Sources

Configuration

Development

Running Tests

Manual Pipeline Execution

Building Docker Images

Example Output

Troubleshooting

Common Issues

Logs

Contributing

License

Usage

Requirements

Future Enhancements

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.streamlit		.streamlit
airflow/dags		airflow/dags
data		data
logs		logs
venv		venv
.env		.env
Dockerfile		Dockerfile
Dockerfile.airflow		Dockerfile.airflow
README.md		README.md
api_data.csv		api_data.csv
app.py		app.py
dashboard.py		dashboard.py
docker-compose.yml		docker-compose.yml
fetch_api_data.py		fetch_api_data.py
gitignore		gitignore
requirements.txt		requirements.txt
stocks.db		stocks.db
test.py		test.py

Folders and files

Latest commit

History

Repository files navigation

API Sales Data Pipeline

Project Overview

Tech Stack

Dependencies

Project Structure

Quick Start

Prerequisites

Installation & Setup

Running the Dashboard Locally

Pipeline Workflow

Airflow DAG

Data Sources

Configuration

Development

Running Tests

Manual Pipeline Execution

Building Docker Images

Example Output

Troubleshooting

Common Issues

Logs

Contributing

License

Usage

Requirements

Future Enhancements

License

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages