Skip to content

SthembisoMfusi/WhatISaw

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧭 What I Saw

Smart Substitute Recommender - An intelligent e-commerce solution powered by AI vector search

🎯 Project Overview

This project demonstrates a smarter way to handle out-of-stock products in e-commerce. Instead of simply showing a "sold out" message, our system uses AI-powered semantic search to recommend the most suitable alternatives from the product catalog.

The core of this application is a BigQuery-powered backend that leverages vector embeddings - a way of representing text as numerical vectors, allowing us to find products that are not just in the same category but are also functionally and stylistically similar.

This is an excellent demonstration of how to integrate cutting-edge AI features from Google Cloud into a practical, real-world application to improve user experience and recover lost sales.

✨ Key Features

  • 🎯 Intelligent Recommendations: Go beyond simple keyword or category matching to provide genuinely useful product substitutes
  • πŸ” Vector Similarity Search: Powered by BigQuery ML's text embedding models
  • ⚑ Efficient Vector Search: Blazing-fast similarity lookups using BigQuery's VECTOR_SEARCH function
  • πŸ—οΈ Scalable Architecture: Production-ready backend and modern frontend
  • πŸ’Ό Business Value: Practical solution to recover lost sales from out-of-stock products
  • βœ… Comprehensive Testing: Full test coverage for both backend and frontend

πŸ›οΈ Architecture

System Components

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  React Frontend β”‚  ←───→  β”‚   Flask Backend  β”‚  ←───→  β”‚  BigQuery + Vertex β”‚
β”‚   (TypeScript)  β”‚  REST   β”‚     (Python)     β”‚  SQL    β”‚    AI Embeddings   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   API   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

1. Backend (/backend) - Flask REST API

  • Handles all communication with Google Cloud services (BigQuery)
  • Exposes REST API endpoints for products and substitutes
  • Uses BigQuery's VECTOR_SEARCH for similarity matching
  • Comprehensive test suite with pytest

2. Frontend (/frontend) - React TypeScript SPA

  • Modern single-page application
  • Displays product catalog with filtering
  • Shows AI-powered substitute recommendations for out-of-stock items
  • Responsive design with comprehensive tests

3. Database (/sql) - BigQuery with Vector Embeddings

  • Stores product information and vector embeddings
  • Uses Vertex AI text-embedding-004 model
  • Vector index for fast similarity search
  • Sample dataset across multiple categories

πŸš€ Quick Start

Prerequisites

βœ… Google Cloud Project with billing enabled
βœ… gcloud CLI installed and authenticated
βœ… Python 3.8+ for backend
βœ… Node.js 16+ for frontend
βœ… BigQuery API and Vertex AI API enabled

Installation Steps

1. Clone the Repository

git clone https://github.com/your-username/WhatISaw.git
cd WhatISaw

2. Set Up BigQuery (First!)

cd sql
# Follow instructions in sql/README.md
# This creates tables, generates embeddings, and creates vector index

3. Set Up Backend

cd ../backend
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Create .env file
cp .env.example .env
# Edit .env with your Google Cloud project details

# Run backend
python run.py

Backend will be available at http://localhost:5000

4. Set Up Frontend

cd ../frontend
npm install

# Set API URL
export REACT_APP_API_URL=http://localhost:5000

# Run frontend
npm start

Frontend will open at http://localhost:3000

πŸ§ͺ Testing

Backend Tests

cd backend
source venv/bin/activate
pytest                    # Run all tests
pytest --cov=app          # With coverage
pytest -v                 # Verbose output

Test coverage includes:

  • βœ… API endpoint tests
  • βœ… BigQuery client tests
  • βœ… Configuration tests
  • βœ… Error handling tests

Frontend Tests

cd frontend
npm test                  # Run all tests
npm run test:coverage     # With coverage

Test coverage includes:

  • βœ… Component tests
  • βœ… Hook tests
  • βœ… API service tests
  • βœ… Integration tests

πŸ“ Project Structure

WhatISaw/
β”œβ”€β”€ sql/                          # BigQuery setup scripts
β”‚   β”œβ”€β”€ 01_create_dataset_and_tables.sql
β”‚   β”œβ”€β”€ 02_create_embeddings.sql
β”‚   β”œβ”€β”€ 03_create_vector_index.sql
β”‚   β”œβ”€β”€ 04_sample_data.sql
β”‚   β”œβ”€β”€ 05_query_substitutes.sql
β”‚   └── README.md
β”œβ”€β”€ backend/                      # Flask REST API
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ __init__.py          # App factory
β”‚   β”‚   β”œβ”€β”€ config.py            # Configuration
β”‚   β”‚   β”œβ”€β”€ bigquery_client.py   # BigQuery operations
β”‚   β”‚   └── routes.py            # API endpoints
β”‚   β”œβ”€β”€ tests/                   # Pytest test suite
β”‚   β”œβ”€β”€ run.py                   # Entry point
β”‚   β”œβ”€β”€ requirements.txt
β”‚   └── README.md
β”œβ”€β”€ frontend/                     # React TypeScript app
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/          # React components
β”‚   β”‚   β”œβ”€β”€ hooks/               # Custom hooks
β”‚   β”‚   β”œβ”€β”€ services/            # API services
β”‚   β”‚   β”œβ”€β”€ types/               # TypeScript types
β”‚   β”‚   └── App.tsx
β”‚   β”œβ”€β”€ package.json
β”‚   └── README.md
β”œβ”€β”€ environment/                  # Service account credentials
└── README.md                    # This file

πŸ”Œ API Endpoints

Health Check

GET /health

List Products

GET /api/products?limit=50&in_stock=false

Get Single Product

GET /api/products/{product_id}

Get Product Substitutes (AI-Powered)

GET /api/substitutes/{product_id}?limit=5&price_filter=true

🎨 Features Showcase

Smart Product Substitutes

When a product is out of stock, the system:

  1. Retrieves the product's vector embedding
  2. Performs cosine similarity search in BigQuery
  3. Returns the most similar products (by description, features, category)
  4. Displays similarity scores (AI match percentage)
  5. Optionally filters by price range (Β±30%)

Vector Embedding Pipeline

Product Description β†’ Vertex AI (text-embedding-004) β†’ 768-dim Vector β†’ BigQuery
                                                                             ↓
User Queries Out-of-Stock Product ← Similarity Search ← Vector Index ← Storage

πŸ’‘ How It Works

  1. Data Preparation: Product descriptions are stored in BigQuery
  2. Embedding Generation: Vertex AI creates vector embeddings for each product
  3. Vector Index: BigQuery creates an optimized index for fast search
  4. User Request: Frontend requests substitutes for an out-of-stock product
  5. Similarity Search: Backend queries BigQuery using VECTOR_SEARCH
  6. Results: Top similar products returned with similarity scores
  7. Display: Frontend shows AI-recommended alternatives

πŸ”§ Configuration

Backend Environment Variables

FLASK_ENV=development
GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_APPLICATION_CREDENTIALS=../environment/your-key.json
BIGQUERY_DATASET=what_i_saw
BIGQUERY_TABLE=product_embeddings
MAX_SUBSTITUTES=5
PRICE_RANGE_PERCENTAGE=0.3
CORS_ORIGINS=http://localhost:3000

Frontend Environment Variables

REACT_APP_API_URL=http://localhost:5000

πŸ“Š Sample Data

The project includes sample products across multiple categories:

  • πŸ’» Electronics (laptops, headphones)
  • πŸ‘Ÿ Clothing (running shoes)
  • 🏠 Home & Kitchen (coffee makers)
  • πŸ“š Books
  • 🚴 Sports Equipment

🚒 Deployment

Backend Deployment (Cloud Run)

cd backend
gcloud run deploy what-i-saw-api \
  --source . \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated

Frontend Deployment (Vercel/Netlify)

cd frontend
npm run build
# Deploy build/ directory to your hosting provider

πŸ› Troubleshooting

Common Issues

"BigQuery connection failed"

  • Verify GOOGLE_APPLICATION_CREDENTIALS path
  • Check service account permissions
  • Ensure BigQuery API is enabled

"No substitutes found"

  • Verify embeddings were generated
  • Check vector index status
  • Ensure product exists in database

"Frontend can't connect to backend"

  • Verify backend is running on port 5000
  • Check CORS settings
  • Ensure REACT_APP_API_URL is set correctly

See detailed troubleshooting in /backend/README.md and /frontend/README.md

πŸ“ˆ Performance

  • Vector Search: Sub-second query times with vector index
  • Embeddings: Generated once, reused for all queries
  • Scalability: BigQuery handles datasets from thousands to millions of products
  • Caching: Consider adding Redis for frequently accessed products

πŸ” Security Best Practices

  • βœ… Never commit credentials or .env files
  • βœ… Use service accounts with minimal required permissions
  • βœ… Enable CORS only for trusted origins
  • βœ… Validate all API inputs
  • βœ… Use HTTPS in production
  • βœ… Implement rate limiting for API endpoints

πŸŽ“ Learning Resources

🀝 Contributing

Contributions are welcome! Please follow these guidelines:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Write tests for new functionality
  4. Ensure all tests pass (pytest for backend, npm test for frontend)
  5. Update documentation
  6. Commit your changes (git commit -m 'Add amazing feature')
  7. Push to the branch (git push origin feature/amazing-feature)
  8. Open a Pull Request

πŸ“ License

This project is part of What I Saw demonstration.

πŸ™ Acknowledgments

  • Google Cloud BigQuery - Vector search and data warehouse
  • Vertex AI - Text embedding models
  • Flask - Backend web framework
  • React - Frontend library
  • TypeScript - Type safety

πŸ“§ Contact

For questions or feedback about this project, please open an issue on GitHub.


Built with ❀️ using AI-powered vector search

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors