Skip to content

justintulloch/split-inference-grpc-demo

Repository files navigation

split-inference-grpc-demo

Splits LSTM inference into two gRPC microservices—“Prefill” (embedding) and “Decode”—with real-time Kafka → Flink → Grafana telemetry. This demo shows how to parallelize and scale the lightweight embedding stage separately from the heavier decode stage, track per-phase latency, and visualize metrics in Grafana.


Table of Contents

  1. Prerequisites
  2. Installation
  3. Project Layout
  4. Usage
  5. Architecture Diagram
  6. Development Notes
  7. Contributing & Issues
  8. License

Prerequisites

  • Operating System: Linux, macOS, or Windows Subsystem for Linux
  • Git (for cloning the repo)
  • Python 3.8+
    • venv or virtualenv for isolated environments
  • Docker & Docker Compose (if you choose to containerize Kafka, Flink, and Grafana)
  • (Optional) Kafka & Flink CLI for local pipelines
  • Grafana (for dashboard visualization)

Installation

  1. Clone the repository
    git clone git@github.com:<your-username>/split-inference-grpc-demo.git
    cd split-inference-grpc-demo

About

Split LSTM inference into Prefill & Decode gRPC microservices with real-time Kafka/Flink/Grafana telemetry.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors