🧭 Semantic Search API [Project ID: P-697]

A FastAPI-based semantic search system that processes company and product data from Excel/CSV, generates AI embeddings with OpenAI, and enables intelligent search via PostgreSQL + pgvector.

📚 Table of Contents

About
Features
Tech Stack
Installation
Usage
Configuration
Screenshots
API Documentation
Contact
Acknowledgements

🧩 About

This project provides an intuitive, production-ready API for semantic search over company and product data. It addresses the need to search by meaning (e.g. “high-quality fastener suppliers”) rather than exact keywords. Data is uploaded as Excel/CSV, grouped by industry, scored for quality, embedded with OpenAI’s text-embedding model, and stored in PostgreSQL with the pgvector extension. Users can run natural-language queries with optional filters and get ranked results combining completeness and semantic similarity.

Key goals: scalable vector search, flexible filtering (industry/country), quality-aware ranking, and a simple upload → embed → search workflow.

✨ Features

Semantic search – Natural-language and Chinese/English queries with meaning-based matching via embeddings.
Excel/CSV upload – Ingest company/product data with industry grouping and automatic quality scoring.
Vector storage – PostgreSQL + pgvector for embedding storage and similarity search.
Multi-factor ranking – Combines completeness score (60%) and semantic similarity (40%).
Filtering – Industry and country filters; product-code and metric-intent detection (e.g. “highest quantity”).
Feedback API – Submit user feedback (keep/reject/compare) on search results.
Production-ready – Async FastAPI, Gunicorn + Uvicorn workers, connection pooling, Render.com deployment support.

🧠 Tech Stack

Category	Technologies
Languages	Python 3.8+
Frameworks	FastAPI, Uvicorn, Gunicorn
Database	PostgreSQL with pgvector
AI / Embeddings	OpenAI API (text-embedding-3-small)
Data & ORM	Pandas, SQLAlchemy (async)
Tools	python-dotenv, Docker-friendly, Render.com

⚙️ Installation

# Clone the repository
git clone https://github.com/Phoenix-dev11/Semantic_search_V2.git

# Navigate to the project directory
cd Semantic_search_V2

# Create virtual environment (recommended)
python -m venv venv
# Windows:
venv\Scripts\activate
# Linux/macOS:
# source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Prerequisites: Python 3.8+, PostgreSQL with pgvector extension, OpenAI API key.

Enable pgvector in your database:

CREATE EXTENSION IF NOT EXISTS vector;

🚀 Usage

Development (with auto-reload):

uvicorn app:app --reload --host 0.0.0.0 --port 8000

Production (Gunicorn):

python start.py
# or: web: python start.py (Procfile)

Then open your browser or API client:

👉 Base URL: http://localhost:8000
👉 Interactive API docs: http://localhost:8000/docs (when DISABLE_DOCS is not set)

🧾 Configuration

Create a .env file (use env.example as a template):

Required:

DATABASE_URL – PostgreSQL connection string (e.g. postgresql://user:password@host:5432/database_name)
OPENAI_API_KEY – Your OpenAI API key for embeddings

Optional (defaults shown):

EMBEDDING_MODEL=text-embedding-3-small
DISABLE_DOCS=false – Set to true to disable /docs and /redoc
PORT=8000
WEB_CONCURRENCY=4
ENVIRONMENT=development

Example:

DATABASE_URL=postgresql://user:password@localhost:5432/semantic_search
OPENAI_API_KEY=your_openai_api_key_here

🖼 Screenshots

Add demo images, GIFs, or UI preview screenshots here.

Example: Swagger UI at /docs, sample search request/response, or dashboard screens.

📜 API Documentation

Main endpoints (see http://localhost:8000/docs for full request/response schemas when docs are enabled):

Method	Endpoint	Description
GET	`/`	Health check
GET	`/health`	Detailed health status
POST	`/api/upload`	Upload Excel/CSV for processing and embedding
POST	`/api/search`	Semantic search (body: `query_text`, `filters`, `top_k`)
GET	`/api/debug/industries`	Debug: list industries
GET	`/api/debug/standard-scoring`	Debug: standard scoring info
POST	`/api/feedback`	Submit feedback on a search result (e.g. keep/reject/compare)

Example search request:

POST /api/search
{
  "query_text": "I need Q02 highest quantity product",
  "filters": "扣件",
  "top_k": 5
}

📬 Contact

Author: Hiroshi Nagaya
Email: phoenixryan1111@gmail.com
GitHub: @Phoenix-dev11
Website/Portfolio: hiroshi-nagaya.vercel.app

(Replace with your details.)

🌟 Acknowledgements

OpenAI – Text embedding API (text-embedding-3-small).
FastAPI – Modern async API framework.
pgvector – PostgreSQL extension for vector similarity search.
Render.com – Deployment configuration (render.yaml, Procfile).

Version: 2.0.0
Status: Production Ready

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
__pycache__		__pycache__
api		api
database		database
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
app.py		app.py
env.example		env.example
gunicorn.conf.py		gunicorn.conf.py
render.yaml		render.yaml
requirements.txt		requirements.txt
start.py		start.py
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧭 Semantic Search API [Project ID: P-697]

📚 Table of Contents

🧩 About

✨ Features

🧠 Tech Stack

⚙️ Installation

🚀 Usage

🧾 Configuration

🖼 Screenshots

📜 API Documentation

📬 Contact

🌟 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Phoenix-dev11/Semantic_Search_V2.0

Folders and files

Latest commit

History

Repository files navigation

🧭 Semantic Search API [Project ID: P-697]

📚 Table of Contents

🧩 About

✨ Features

🧠 Tech Stack

⚙️ Installation

🚀 Usage

🧾 Configuration

🖼 Screenshots

📜 API Documentation

📬 Contact

🌟 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages