Smart Substitute Recommender - An intelligent e-commerce solution powered by AI vector search
This project demonstrates a smarter way to handle out-of-stock products in e-commerce. Instead of simply showing a "sold out" message, our system uses AI-powered semantic search to recommend the most suitable alternatives from the product catalog.
The core of this application is a BigQuery-powered backend that leverages vector embeddings - a way of representing text as numerical vectors, allowing us to find products that are not just in the same category but are also functionally and stylistically similar.
This is an excellent demonstration of how to integrate cutting-edge AI features from Google Cloud into a practical, real-world application to improve user experience and recover lost sales.
- π― Intelligent Recommendations: Go beyond simple keyword or category matching to provide genuinely useful product substitutes
- π Vector Similarity Search: Powered by BigQuery ML's text embedding models
- β‘ Efficient Vector Search: Blazing-fast similarity lookups using BigQuery's VECTOR_SEARCH function
- ποΈ Scalable Architecture: Production-ready backend and modern frontend
- πΌ Business Value: Practical solution to recover lost sales from out-of-stock products
- β Comprehensive Testing: Full test coverage for both backend and frontend
βββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββββ
β React Frontend β βββββ β Flask Backend β βββββ β BigQuery + Vertex β
β (TypeScript) β REST β (Python) β SQL β AI Embeddings β
βββββββββββββββββββ API ββββββββββββββββββββ ββββββββββββββββββββββ
- Handles all communication with Google Cloud services (BigQuery)
- Exposes REST API endpoints for products and substitutes
- Uses BigQuery's VECTOR_SEARCH for similarity matching
- Comprehensive test suite with pytest
- Modern single-page application
- Displays product catalog with filtering
- Shows AI-powered substitute recommendations for out-of-stock items
- Responsive design with comprehensive tests
- Stores product information and vector embeddings
- Uses Vertex AI text-embedding-004 model
- Vector index for fast similarity search
- Sample dataset across multiple categories
β
Google Cloud Project with billing enabled
β
gcloud CLI installed and authenticated
β
Python 3.8+ for backend
β
Node.js 16+ for frontend
β
BigQuery API and Vertex AI API enabled
git clone https://github.com/your-username/WhatISaw.git
cd WhatISawcd sql
# Follow instructions in sql/README.md
# This creates tables, generates embeddings, and creates vector indexcd ../backend
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Create .env file
cp .env.example .env
# Edit .env with your Google Cloud project details
# Run backend
python run.pyBackend will be available at http://localhost:5000
cd ../frontend
npm install
# Set API URL
export REACT_APP_API_URL=http://localhost:5000
# Run frontend
npm startFrontend will open at http://localhost:3000
cd backend
source venv/bin/activate
pytest # Run all tests
pytest --cov=app # With coverage
pytest -v # Verbose outputTest coverage includes:
- β API endpoint tests
- β BigQuery client tests
- β Configuration tests
- β Error handling tests
cd frontend
npm test # Run all tests
npm run test:coverage # With coverageTest coverage includes:
- β Component tests
- β Hook tests
- β API service tests
- β Integration tests
WhatISaw/
βββ sql/ # BigQuery setup scripts
β βββ 01_create_dataset_and_tables.sql
β βββ 02_create_embeddings.sql
β βββ 03_create_vector_index.sql
β βββ 04_sample_data.sql
β βββ 05_query_substitutes.sql
β βββ README.md
βββ backend/ # Flask REST API
β βββ app/
β β βββ __init__.py # App factory
β β βββ config.py # Configuration
β β βββ bigquery_client.py # BigQuery operations
β β βββ routes.py # API endpoints
β βββ tests/ # Pytest test suite
β βββ run.py # Entry point
β βββ requirements.txt
β βββ README.md
βββ frontend/ # React TypeScript app
β βββ src/
β β βββ components/ # React components
β β βββ hooks/ # Custom hooks
β β βββ services/ # API services
β β βββ types/ # TypeScript types
β β βββ App.tsx
β βββ package.json
β βββ README.md
βββ environment/ # Service account credentials
βββ README.md # This file
GET /healthGET /api/products?limit=50&in_stock=falseGET /api/products/{product_id}GET /api/substitutes/{product_id}?limit=5&price_filter=trueWhen a product is out of stock, the system:
- Retrieves the product's vector embedding
- Performs cosine similarity search in BigQuery
- Returns the most similar products (by description, features, category)
- Displays similarity scores (AI match percentage)
- Optionally filters by price range (Β±30%)
Product Description β Vertex AI (text-embedding-004) β 768-dim Vector β BigQuery
β
User Queries Out-of-Stock Product β Similarity Search β Vector Index β Storage
- Data Preparation: Product descriptions are stored in BigQuery
- Embedding Generation: Vertex AI creates vector embeddings for each product
- Vector Index: BigQuery creates an optimized index for fast search
- User Request: Frontend requests substitutes for an out-of-stock product
- Similarity Search: Backend queries BigQuery using VECTOR_SEARCH
- Results: Top similar products returned with similarity scores
- Display: Frontend shows AI-recommended alternatives
FLASK_ENV=development
GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_APPLICATION_CREDENTIALS=../environment/your-key.json
BIGQUERY_DATASET=what_i_saw
BIGQUERY_TABLE=product_embeddings
MAX_SUBSTITUTES=5
PRICE_RANGE_PERCENTAGE=0.3
CORS_ORIGINS=http://localhost:3000REACT_APP_API_URL=http://localhost:5000The project includes sample products across multiple categories:
- π» Electronics (laptops, headphones)
- π Clothing (running shoes)
- π Home & Kitchen (coffee makers)
- π Books
- π΄ Sports Equipment
cd backend
gcloud run deploy what-i-saw-api \
--source . \
--platform managed \
--region us-central1 \
--allow-unauthenticatedcd frontend
npm run build
# Deploy build/ directory to your hosting provider"BigQuery connection failed"
- Verify
GOOGLE_APPLICATION_CREDENTIALSpath - Check service account permissions
- Ensure BigQuery API is enabled
"No substitutes found"
- Verify embeddings were generated
- Check vector index status
- Ensure product exists in database
"Frontend can't connect to backend"
- Verify backend is running on port 5000
- Check CORS settings
- Ensure
REACT_APP_API_URLis set correctly
See detailed troubleshooting in /backend/README.md and /frontend/README.md
- Vector Search: Sub-second query times with vector index
- Embeddings: Generated once, reused for all queries
- Scalability: BigQuery handles datasets from thousands to millions of products
- Caching: Consider adding Redis for frequently accessed products
- β
Never commit credentials or
.envfiles - β Use service accounts with minimal required permissions
- β Enable CORS only for trusted origins
- β Validate all API inputs
- β Use HTTPS in production
- β Implement rate limiting for API endpoints
Contributions are welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Write tests for new functionality
- Ensure all tests pass (
pytestfor backend,npm testfor frontend) - Update documentation
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is part of What I Saw demonstration.
- Google Cloud BigQuery - Vector search and data warehouse
- Vertex AI - Text embedding models
- Flask - Backend web framework
- React - Frontend library
- TypeScript - Type safety
For questions or feedback about this project, please open an issue on GitHub.
Built with β€οΈ using AI-powered vector search