Skip to content

Fix building path#102

Merged
openminddev merged 12 commits intomainfrom
add-embedding-server
Mar 5, 2026
Merged

Fix building path#102
openminddev merged 12 commits intomainfrom
add-embedding-server

Conversation

@openminddev
Copy link
Contributor

This pull request introduces a new embedding microservice based on FastAPI, optimized for NVIDIA Jetson (ARM64), and integrates it into the deployment and QA infrastructure. The changes automate the build and release of the ARM64 embedding service Docker image, add a new embedding service to Docker Compose, and provide a production-ready QAEngine that uses this service for fast, GPU-accelerated question answering.

Embedding Service Introduction and Integration

  • Added a new FastAPI-based embedding microservice in embedding_server.py that exposes /embed, /embed_batch, and /health endpoints for real-time and batch text embedding using the intfloat/e5-small-v2 model on CUDA. The service returns base64-encoded float32 vectors and is optimized for low-latency inference on NVIDIA Jetson devices. (src/embedding/embedding_server.py)
  • Created a dedicated Dockerfile for the embedding service (Dockerfile.embed), installing all dependencies, downloading the model at build time, and setting up the service to run on port 8100. (docker/Dockerfile.embed)

Deployment Automation and Infrastructure

  • Updated the GitHub Actions release workflow to build and push the ARM64 embedding service Docker image, and to create/push its manifest for multi-architecture support. (.github/workflows/release.yml) [1] [2] [3]
  • Added the embedding service to docker-compose.yml for easy local deployment, exposing port 8100 and using the new ARM64 image. (docker/docker-compose.yml)

Production QA Engine Integration

  • Introduced a new QAEngine class that uses the Dockerized embedding service for inference and FAISS for vector search. It supports both single and batch query modes, handles remote embedding calls, and provides detailed logging and latency metrics. (src/embedding/qa_engine.py)

dzhengAP and others added 12 commits February 24, 2026 13:48
Delete binary embedding artifacts: src/embedding/qa_data_combine.pkl and src/embedding/qa_index_combine.faiss. These large generated files were removed from the repository to avoid committing binary/index artifacts; they can be regenerated by the embedding pipeline as needed.
Update the GitHub Actions workflow name from "Release Riva Speech Server ARM64 Image" to "Release ARM64 Images" to make the workflow name more generic and reflect releasing multiple ARM64 images. No functional changes were made; only the workflow's name field was updated.
@openminddev openminddev merged commit 5030082 into main Mar 5, 2026
2 checks passed
@openminddev openminddev deleted the add-embedding-server branch March 5, 2026 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants