gpu-inference

Here are 11 public repositories matching this topic...

Talnz007 / VulkanIlm

GPU-accelerated LLaMA inference wrapper for legacy Vulkan-capable systems a Pythonic way to run AI with knowledge (Ilm) on fire (Vulkan).

machine-learning vulkan python-wrapper fastai amd-gpu intel-gpu llama-cpp gpu-inference llm-inference localllm local-ai open-source-llm llama-cpp-python gguf legacy-gpus

Updated Oct 14, 2025
Python

redbco / infermesh

Star

GPU-aware inference mesh for large-scale AI serving

rust distributed-systems fault-tolerance high-availability service-mesh observability inference-engine model-serving ml-infrastructure ai-inference gpu-inference ai-infrastructure gpu-mesh

Updated Sep 25, 2025
Rust

🚀 ClipServe: A fast API server for embedding text, images, and performing zero-shot classification using OpenAI’s CLIP model. Powered by FastAPI, Redis, and CUDA for lightning-fast, scalable AI applications. Transform texts and images into embeddings or classify images with custom labels—all through easy-to-use endpoints. 🌐📊

docker redis docker-compose cuda transformers python3 gradio fastapi openai-clip gpu-inference

Updated Sep 29, 2024
Python

mpaepper / docker-cifar

Star

Docker based GPU inference of machine learning models

deep-learning pytorch gpu-inference

Updated May 9, 2019
Python

jozsefszalma / intranet_image_generator

Star

Generating images with diffusion models on a mobile device, with an intranet GPU box as backend

python fun react-native backend rest-api mobile-app pytorch image-generation diffusion prompt-engineering huggingface-diffusers gpu-inference

Updated Dec 3, 2025
Jupyter Notebook

FurkanAtass / Scalable-ML-Inference-Eks

Star

End-to-end scalable ML inference on EKS: KEDA-driven pod autoscaling with Prometheus custom metrics, Cluster Autoscaler for GPU node scaling, and NVIDIA GPU time-slicing to run multiple pods per GPU.

kubernetes machine-learning terraform scalability prometheus aws-eks mlops keda gpu-inference

Updated Aug 29, 2025
HCL

fanyi-zhao / cloud-llm-dev-scripts

Star

Instant setup scripts for cloud-based LLM development.

devops aws-s3 developer-tools cloud-development setup-script mooncake llm gpu-inference open-webui sglang

Updated Jan 19, 2026
Shell

evz / deepseek_ocr_server

Star

A tiny Python / ZMQ server to faciliate experimenting with DeepSeek-OCR

ocr zeromq document-processing gpu-inference deepseek

Updated Oct 25, 2025
Python

dmitryturcan / runpod-vllm-tailscale

Star

Secure, private LLM inference on RunPod GPUs with Tailscale networking. Zero public exposure.

docker tailscale llm runpod gpu-inference vllm private-ai

Updated Dec 20, 2025
Python

FlosMume / CUDA-AI-Inference-Starter

Star

A minimal, high-performance starter kit for running AI model inference on NVIDIA GPUs using CUDA. Includes environment setup, sample kernels, and guidance for integrating ONNX/TensorRT pipelines for fast, optimized inference on modern GPU hardware.

benchmarking deep-learning cuda nvidia high-performance-computing fp16 int8 inference-engine onnx tensor-rt devtech model-optimization gpu-inference ai-deployment adge-ai

Updated Nov 2, 2025
Cuda

paralleliq / modelspec

Star

ModelSpec is an open, declarative specification for describing how AI models especially LLMs are deployed, served, and operated in production. It captures execution, serving, and orchestration intent to enable validation, reasoning, and automation across modern AI infrastructure.

kubernetes runtime json-schema inference autoscaling observability model-deployment model-serving declarative-config production-ai mlops ai-systems llm ai-ops gpu-inference vllm llm-deployment ai-infrastructure modelspec

Updated Jan 16, 2026
Python

Improve this page

Add a description, image, and links to the gpu-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gpu-inference topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu-inference

Here are 11 public repositories matching this topic...

Talnz007 / VulkanIlm

redbco / infermesh

Armaggheddon / ClipServe

mpaepper / docker-cifar

jozsefszalma / intranet_image_generator

FurkanAtass / Scalable-ML-Inference-Eks

fanyi-zhao / cloud-llm-dev-scripts

evz / deepseek_ocr_server

dmitryturcan / runpod-vllm-tailscale

FlosMume / CUDA-AI-Inference-Starter

paralleliq / modelspec

Improve this page

Add this topic to your repo