Skip to content

Enable Xeon support#25

Merged
ganeshmurthy merged 10 commits into
rh-ai-quickstart:mainfrom
jgespino:main
Jun 29, 2026
Merged

Enable Xeon support#25
ganeshmurthy merged 10 commits into
rh-ai-quickstart:mainfrom
jgespino:main

Conversation

@jgespino

@jgespino jgespino commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

README improvements

  • Added Xeon-based local deployment option (Option C) with clear example config
  • Replaced curl example with port-forwarding

Helm configuration updates

  • Updated dependency versions in Chart.yaml (llm-service)
  • Updated rag-values.yaml.example with Xeon model config comments
  • Updated llama-stack image tag (0.2.23 → 0.6.1)

jgespino added 5 commits June 16, 2026 10:57
Added XEON example configuration
Bumped llama-stack image version
Added deploying on Xeon
Updated test command for vLLM
Bumped dependencies versions
Fixed formatting
@ganeshmurthy

Copy link
Copy Markdown
Collaborator

Do we need to add deploy/helm/rag/templates/route-llamastack.yaml LlamaStack route?

The architecture is:
User → Streamlit (exposed via route) → F5 XC Security → LlamaStack (internal only)

By exposing LlamaStack directly with an external route, we're creating a path that bypasses all F5 security controls (WAF, rate limiting, API spec enforcement) that this quickstart is designed to showcase.

The README currently has curl examples that assume this route exists (lines 187-193), but I believe those examples should be updated to use port-forwarding instead:

  # For local testing
  oc port-forward svc/llamastack 8321:8321
  curl -sS http://localhost:8321/v1/models

This allows developers to test LlamaStack without exposing it externally.

Is there a specific reason you are adding that route. Is it necessary to enable Xeon support ?

@keklundrh keklundrh requested a review from ganeshmurthy June 26, 2026 18:36
jgespino added 2 commits June 29, 2026 09:35
Updates Verify section to use port-forwarding for llama-stack service testing
@jgespino

Copy link
Copy Markdown
Contributor Author

@ganeshmurthy Thanks for the feedback! I initially added the route-llamastack to test the llama-stack using the README steps. Based on your suggestion, I’ve removed it and updated the README to use port-forwarding for testing instead.

@ganeshmurthy

Copy link
Copy Markdown
Collaborator

Thank you for removing the route-llamastack

To Enable Xeon support, is it necessary to change these versions -

dependencies:
  - name: pgvector
    version: 0.5.6
    repository: https://rh-ai-quickstart.github.io/ai-architecture-charts
  - name: llm-service
    version: 0.5.10
    repository: https://rh-ai-quickstart.github.io/ai-architecture-charts
  - name: llama-stack
    version: 0.8.7
    repository: https://rh-ai-quickstart.github.io/ai-architecture-charts

@jgespino

Copy link
Copy Markdown
Contributor Author

Only the update to llm-service version 0.5.10 is required, as it includes the 3.4.0 image with Xeon support. I can revert the other components if needed.

@ganeshmurthy

ganeshmurthy commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Can you please only update the llm-service version to 0.5.10 in this PR. As for the version changes to pgvector and llama-stack, please do raise a separate PR. This PR should only contain the changes that are necessary to Enable Xeon support

Reverted the dependency version updates for pgvector and llama-stack
@jgespino

Copy link
Copy Markdown
Contributor Author

Sure, I've made the updates.

@ganeshmurthy

ganeshmurthy commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Sorry, this is my last comment

This line was added to the README:
| Generation| meta-llama/Llama-3.1-8B-Instruct | L4/HPU/XEON | g6.2xlarge |

But no corresponding Xeon configuration example was added to deploy/helm/rag-values.yaml.example.

The file only includes:

   # Example Xeon configurations:
   # llama-3-2-3b-instruct:
   #   id: meta-llama/Llama-3.2-3B-Instruct
   #   enabled: true
   #   device: "xeon"
   #   args:
   #   - --max-model-len
   #   - "14336"
   #   - --max-num-seqs
   #   - "32"

Should we either:

  1. Add a llama-3-1-8b-instruct Xeon example to the values file, or
  2. Remove XEON from the 8B model's hardware column in the README if it's not validated/recommended?

Users following the README will expect to find configuration examples for all advertised hardware options.

Added llama-3-1-8b-instruct example for Xeon
@jgespino

Copy link
Copy Markdown
Contributor Author

Thanks for the feedback! I added llama-3-1-8b-instruct Xeon example to the values file.

@ganeshmurthy ganeshmurthy merged commit 0cc13b4 into rh-ai-quickstart:main Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants