Fix building path by openminddev · Pull Request #102 · OpenMind/OM1-modules

openminddev · 2026-03-05T19:14:20Z

This pull request introduces a new embedding microservice based on FastAPI, optimized for NVIDIA Jetson (ARM64), and integrates it into the deployment and QA infrastructure. The changes automate the build and release of the ARM64 embedding service Docker image, add a new embedding service to Docker Compose, and provide a production-ready QAEngine that uses this service for fast, GPU-accelerated question answering.

Embedding Service Introduction and Integration

Added a new FastAPI-based embedding microservice in embedding_server.py that exposes /embed, /embed_batch, and /health endpoints for real-time and batch text embedding using the intfloat/e5-small-v2 model on CUDA. The service returns base64-encoded float32 vectors and is optimized for low-latency inference on NVIDIA Jetson devices. (src/embedding/embedding_server.py)
Created a dedicated Dockerfile for the embedding service (Dockerfile.embed), installing all dependencies, downloading the model at build time, and setting up the service to run on port 8100. (docker/Dockerfile.embed)

Deployment Automation and Infrastructure

Updated the GitHub Actions release workflow to build and push the ARM64 embedding service Docker image, and to create/push its manifest for multi-architecture support. (.github/workflows/release.yml) [1] [2] [3]
Added the embedding service to docker-compose.yml for easy local deployment, exposing port 8100 and using the new ARM64 image. (docker/docker-compose.yml)

Production QA Engine Integration

Introduced a new QAEngine class that uses the Dockerized embedding service for inference and FAISS for vector search. It supports both single and batch query modes, handles remote embedding calls, and provides detailed logging and latency metrics. (src/embedding/qa_engine.py)

Delete binary embedding artifacts: src/embedding/qa_data_combine.pkl and src/embedding/qa_index_combine.faiss. These large generated files were removed from the repository to avoid committing binary/index artifacts; they can be regenerated by the embedding pipeline as needed.

Update the GitHub Actions workflow name from "Release Riva Speech Server ARM64 Image" to "Release ARM64 Images" to make the workflow name more generic and reflect releasing multiple ARM64 images. No functional changes were made; only the workflow's name field was updated.

dzhengAP and others added 12 commits February 24, 2026 13:48

Add embedding service and QA engine with Docker support

bd60226

fix format

4f1f0a4

fix format

c16b05a

add manifest for embed server

f21fe3f

fix format

6411ece

update docker compose

42e2e95

update the structure

6246678

Fix lint

1c9fe31

Fix context path

2284f44

Merge branch 'main' into add-embedding-server

1fbbd56

openminddev merged commit 5030082 into main Mar 5, 2026
2 checks passed

openminddev deleted the add-embedding-server branch March 5, 2026 19:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix building path#102

Fix building path#102
openminddev merged 12 commits intomainfrom
add-embedding-server

openminddev commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

openminddev commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants