Merged
Conversation
Delete binary embedding artifacts: src/embedding/qa_data_combine.pkl and src/embedding/qa_index_combine.faiss. These large generated files were removed from the repository to avoid committing binary/index artifacts; they can be regenerated by the embedding pipeline as needed.
Update the GitHub Actions workflow name from "Release Riva Speech Server ARM64 Image" to "Release ARM64 Images" to make the workflow name more generic and reflect releasing multiple ARM64 images. No functional changes were made; only the workflow's name field was updated.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new embedding microservice based on FastAPI, optimized for NVIDIA Jetson (ARM64), and integrates it into the deployment and QA infrastructure. The changes automate the build and release of the ARM64 embedding service Docker image, add a new
embeddingservice to Docker Compose, and provide a production-readyQAEnginethat uses this service for fast, GPU-accelerated question answering.Embedding Service Introduction and Integration
embedding_server.pythat exposes/embed,/embed_batch, and/healthendpoints for real-time and batch text embedding using theintfloat/e5-small-v2model on CUDA. The service returns base64-encoded float32 vectors and is optimized for low-latency inference on NVIDIA Jetson devices. (src/embedding/embedding_server.py)Dockerfile.embed), installing all dependencies, downloading the model at build time, and setting up the service to run on port 8100. (docker/Dockerfile.embed)Deployment Automation and Infrastructure
.github/workflows/release.yml) [1] [2] [3]embeddingservice todocker-compose.ymlfor easy local deployment, exposing port 8100 and using the new ARM64 image. (docker/docker-compose.yml)Production QA Engine Integration
QAEngineclass that uses the Dockerized embedding service for inference and FAISS for vector search. It supports both single and batch query modes, handles remote embedding calls, and provides detailed logging and latency metrics. (src/embedding/qa_engine.py)