A FastAPI-based semantic search system that processes company and product data from Excel/CSV, generates AI embeddings with OpenAI, and enables intelligent search via PostgreSQL + pgvector.
About
Features
Tech Stack
Installation
Usage
Configuration
Screenshots
API Documentation
Contact
Acknowledgements
This project provides an intuitive, production-ready API for semantic search over company and product data. It addresses the need to search by meaning (e.g. “high-quality fastener suppliers”) rather than exact keywords. Data is uploaded as Excel/CSV, grouped by industry, scored for quality, embedded with OpenAI’s text-embedding model, and stored in PostgreSQL with the pgvector extension. Users can run natural-language queries with optional filters and get ranked results combining completeness and semantic similarity.
Key goals: scalable vector search, flexible filtering (industry/country), quality-aware ranking, and a simple upload → embed → search workflow.
- Semantic search – Natural-language and Chinese/English queries with meaning-based matching via embeddings.
- Excel/CSV upload – Ingest company/product data with industry grouping and automatic quality scoring.
- Vector storage – PostgreSQL + pgvector for embedding storage and similarity search.
- Multi-factor ranking – Combines completeness score (60%) and semantic similarity (40%).
- Filtering – Industry and country filters; product-code and metric-intent detection (e.g. “highest quantity”).
- Feedback API – Submit user feedback (keep/reject/compare) on search results.
- Production-ready – Async FastAPI, Gunicorn + Uvicorn workers, connection pooling, Render.com deployment support.
| Category | Technologies |
|---|---|
| Languages | Python 3.8+ |
| Frameworks | FastAPI, Uvicorn, Gunicorn |
| Database | PostgreSQL with pgvector |
| AI / Embeddings | OpenAI API (text-embedding-3-small) |
| Data & ORM | Pandas, SQLAlchemy (async) |
| Tools | python-dotenv, Docker-friendly, Render.com |
# Clone the repository
git clone https://github.com/Phoenix-dev11/Semantic_search_V2.git
# Navigate to the project directory
cd Semantic_search_V2
# Create virtual environment (recommended)
python -m venv venv
# Windows:
venv\Scripts\activate
# Linux/macOS:
# source venv/bin/activate
# Install dependencies
pip install -r requirements.txtPrerequisites: Python 3.8+, PostgreSQL with pgvector extension, OpenAI API key.
Enable pgvector in your database:
CREATE EXTENSION IF NOT EXISTS vector;Development (with auto-reload):
uvicorn app:app --reload --host 0.0.0.0 --port 8000Production (Gunicorn):
python start.py
# or: web: python start.py (Procfile)Then open your browser or API client:
👉 Base URL: http://localhost:8000
👉 Interactive API docs: http://localhost:8000/docs (when DISABLE_DOCS is not set)
Create a .env file (use env.example as a template):
Required:
DATABASE_URL– PostgreSQL connection string (e.g.postgresql://user:password@host:5432/database_name)OPENAI_API_KEY– Your OpenAI API key for embeddings
Optional (defaults shown):
EMBEDDING_MODEL=text-embedding-3-smallDISABLE_DOCS=false– Set totrueto disable/docsand/redocPORT=8000WEB_CONCURRENCY=4ENVIRONMENT=development
Example:
DATABASE_URL=postgresql://user:password@localhost:5432/semantic_search
OPENAI_API_KEY=your_openai_api_key_hereAdd demo images, GIFs, or UI preview screenshots here.
Example: Swagger UI at /docs, sample search request/response, or dashboard screens.
Main endpoints (see http://localhost:8000/docs for full request/response schemas when docs are enabled):
| Method | Endpoint | Description |
|---|---|---|
| GET | / |
Health check |
| GET | /health |
Detailed health status |
| POST | /api/upload |
Upload Excel/CSV for processing and embedding |
| POST | /api/search |
Semantic search (body: query_text, filters, top_k) |
| GET | /api/debug/industries |
Debug: list industries |
| GET | /api/debug/standard-scoring |
Debug: standard scoring info |
| POST | /api/feedback |
Submit feedback on a search result (e.g. keep/reject/compare) |
Example search request:
POST /api/search
{
"query_text": "I need Q02 highest quantity product",
"filters": "扣件",
"top_k": 5
}- Author: Hiroshi Nagaya
- Email: phoenixryan1111@gmail.com
- GitHub: @Phoenix-dev11
- Website/Portfolio: hiroshi-nagaya.vercel.app
(Replace with your details.)
- OpenAI – Text embedding API (text-embedding-3-small).
- FastAPI – Modern async API framework.
- pgvector – PostgreSQL extension for vector similarity search.
- Render.com – Deployment configuration (
render.yaml,Procfile).
Version: 2.0.0
Status: Production Ready