Skip to content
@xPyD-hub

xPyD-hub

Lightweight, rapidly deployable Prefill-Decode proxy for LLM serving

xPyD-hub

Lightweight, rapidly deployable Prefill-Decode (PD) proxy for LLM serving — minimal setup, minimal maintenance.

Projects

  • xPyD-proxy Rapidly deployable PD proxy server — scheduling, health monitoring, and dynamic instance management for prefill/decode instances.
  • xPyD-bench Benchmarking tool — measure throughput, latency, and TTFT against xPyD proxy.
  • xPyD-plan PD ratio planner — recommend optimal Prefill:Decode node allocation from benchmark data or dataset analysis.

About

xPyD-proxy implements a two-phase serving pattern:

  • Prefill — KV cache preparation on dedicated prefill nodes
  • Decode — autoregressive token generation on decode nodes

The proxy handles round-robin / load-balanced scheduling, health checks, and hot-reload of instance configs. Designed for local dev and lightweight deployment.

Pinned Loading

  1. xPyD-proxy xPyD-proxy Public

    PD Proxy Server

    Python 3

  2. xPyD-plan xPyD-plan Public

    PD ratio planner for xPyD proxy — recommend optimal Prefill:Decode node allocation

    Python 1

  3. xPyD-bench xPyD-bench Public

    Benchmarking & PD ratio planning tool for xPyD proxy

    Python 1

Repositories

Showing 8 of 8 repositories

Top languages

Loading…

Most used topics

Loading…