split-inference-grpc-demo

Splits LSTM inference into two gRPC microservices—“Prefill” (embedding) and “Decode”—with real-time Kafka → Flink → Grafana telemetry. This demo shows how to parallelize and scale the lightweight embedding stage separately from the heavier decode stage, track per-phase latency, and visualize metrics in Grafana.

Prerequisites

Operating System: Linux, macOS, or Windows Subsystem for Linux
Git (for cloning the repo)
Python 3.8+
- venv or virtualenv for isolated environments
Docker & Docker Compose (if you choose to containerize Kafka, Flink, and Grafana)
(Optional) Kafka & Flink CLI for local pipelines
Grafana (for dashboard visualization)

Installation

Clone the repository

git clone git@github.com:<your-username>/split-inference-grpc-demo.git
cd split-inference-grpc-demo

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
client		client
core		core
notebooks		notebooks
scripts		scripts
service		service
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
check_torch.py		check_torch.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

split-inference-grpc-demo

Table of Contents

Prerequisites

Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

split-inference-grpc-demo

Table of Contents

Prerequisites

Installation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages