Skip to content

refactor(helm): unify ingress and envoy proxy as a single gateway#4191

Merged
aicam merged 35 commits intoapache:mainfrom
aicam:gateway-sec-ext-policies
Feb 10, 2026
Merged

refactor(helm): unify ingress and envoy proxy as a single gateway#4191
aicam merged 35 commits intoapache:mainfrom
aicam:gateway-sec-ext-policies

Conversation

@aicam
Copy link
Contributor

@aicam aicam commented Feb 3, 2026

What changes were proposed in this PR?

This PR consolidates the cluster networking architecture by replacing multiple disparate ingress/proxy solutions with a single, unified Envoy Gateway using the Kubernetes Gateway API.

Previously:

  • Texera Ingress: Handled by ingress-nginx controller (separate Helm dependency).
  • MinIO Ingress: Configured separately, often requiring its own ingress status or port exposure.
  • CU Envoy: A standalone, manually maintained Envoy deployment was used to proxy traffic to Computing Units (CUs).

Now (with Envoy Gateway):

  • Unified Gateway: A single Gateway resource (texera-gateway) manages traffic for Texera Webserver, MinIO, and Computing Units.
  • Gateway API: Uses standard HTTPRoute resources to define routing rules (prefix matching, rewrites) instead of proprietary Ingress annotations or custom config.
  • SSL/TLS Automation: Integrated cert-manager with Envoy Gateway to automatically provision and renew Let's Encrypt certificates for both the main Texera domain and the MinIO subdomain.

New Kubernetes Resources:

  • bin/k8s/templates/gateway.yaml: Defines the Gateway resource, configuring listeners for HTTP, HTTPS, and MinIO. Handles TLS termination using Let's Encrypt via cert-manager.

  • bin/k8s/templates/routes.yaml: Defines HTTPRoute resources.

    • Static Routes: Standard path-based routing for Texera services (Webserver, API, etc.).
    • Dynamic Routes: Captures regex paths for Computing Units and delegates them to the dynamic backend.
  • bin/k8s/templates/backend.yaml: Defines a Backend resource of type DynamicResolver. This allows Envoy to route to targets defined dynamically (e.g., by the ExtAuth service modifying headers) rather than static Kubernetes services.

  • bin/k8s/templates/security-policy.yaml: Defines the SecurityPolicy that attaches to the dynamic routes. It configures the External Authorization filter to point to the access-control-service.

  • bin/k8s/templates/eg-config-hook.yaml: A Helm Hook (pre-install/pre-upgrade) that automatically patches the Envoy Gateway configuration to enable necessary features (enableBackend, enableEnvoyPatchPolicy) which are disabled by default. It ensures the environment is correctly configured without manual intervention.

Any related issues, documentation, discussions?

Closes #4190

How was this PR tested?

Tested on the production RKE2 cluster:

We tested on http and https on both production server and local environment.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Antigravity

Instruction for migration from Ingress to Envoy Gateway

If you manage a Texera deployment and need to replace the old architecture (using Ingress), follow the following steps to replace it with Envoy Gateway.
Note: SSL certificates and load balancing are managed differently. You may refer to your cluster configuration and ignore some of the steps below.

Step 1: Disable the Default RKE2 Nginx Ingress

Since you are switching to Envoy Gateway, you must disable the default RKE2 Nginx controller to free up ports 80 and 443 on your server. This prevents IP and port conflicts when Envoy tries to bind to the host network.

  1. Open the RKE2 configuration file:
   sudo nano /etc/rancher/rke2/config.yaml
  1. Add the following line to disable Nginx:
disable: rke2-ingress-nginx
  1. Save the file and restart the RKE2 server:
sudo systemctl restart rke2-server

Check no Nginx exist:

kubectl get svc --all-namespaces | awk 'NR==1 || /LoadBalancer|ExternalName/'
  1. Verify the removal: Ensure no Nginx pods are running in the kube-system namespace.
kubectl get pods -n kube-system -l app.kubernetes.io/name=rke2-ingress-nginx

(This should return empty).

Step 2: Configure MetalLB for Local IP Allocation

If your cluster does not have a cloud provider to automatically hand out LoadBalancer IPs, you need to use MetalLB.

  1. Create a file named metallb-config.yaml:
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: first-pool
  namespace: metallb-system
spec:
  addresses:
  - 128.200.71.196/32  # Using your primary public IP
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: example
  namespace: metallb-system
  1. Apply the configuration to the cluster:
kubectl apply -f metallb-config.yaml

Step 3: Install Envoy Gateway

Install Envoy Gateway using Helm. This command creates the necessary namespace and enables the Backend extension right out of the box.

helm install eg oci://docker.io/envoyproxy/gateway-helm \
  --version v1.6.3 \
  -n envoy-gateway-system \
  --create-namespace \
  --set config.envoyGateway.extensionApis.enableBackend=true

Step 4: Configure Certificate Management (Let's Encrypt)

Set up a ClusterIssuer to automatically provision Let's Encrypt certificates using the HTTP-01 challenge.

  1. Create a cluster-issuer.yaml file:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: texera.noreply@gmail.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
      - http01:
          gatewayHTTPRoute:
            # IMPORTANT: This must match the name generated by your Helm chart.
            # Assuming you install with release name "texera":
            parentRefs:
              - name: texera-gateway
                namespace: texera # Or whichever namespace your gateway resides in
  1. Apply the configuration:
kubectl apply -f cluster-issuer.yaml

Step 5: Verification and Troubleshooting

Check your cluster to ensure the Envoy load balancer successfully acquired the IP address and isn't being blocked by old services.

1. Check General Status

# Check the Gateway status
kubectl get gateway -n texera

# Check your application's services
kubectl get svc -n texera

# List ALL LoadBalancer services cluster-wide to spot IP hogs
kubectl get svc -A --field-selector spec.type=LoadBalancer

What to look for:

  • Empty List (except yours): This means you have no conflicts, but if your service is still pending, you likely don't have an IP allocator like MetalLB running properly.
  • rke2-ingress-nginx-controller has an IP: RKE2's default Ingress is still running and is hogging your only available IP. Revisit Step 1.

2. Debugging a Pending Service

If your LoadBalancer stays in a <pending> state, ask the service directly why it failed to sync:

kubectl describe svc <your-envoy-service-name> -n texera
# Example: kubectl describe svc envoy-texera-texera-gateway-2f9ebfe6 -n texera

Scroll to the Events section at the bottom:

  • If it says "IP is already in use": You have a conflict. Another service is holding the IP.
  • If it says <none> or is empty: The cluster is ignoring the request, confirming the load balancer controller (MetalLB) is missing or misconfigured.

After finalizing the above steps, you can install Texera and it will use Envoy-Gateway you just installed.

@aicam aicam requested a review from bobbai00 February 3, 2026 20:27
@aicam aicam self-assigned this Feb 3, 2026
@bobbai00 bobbai00 changed the title Replace Texera and MinIO Ingress and CU Envoy with single Envoy Gateway refactor(helm): Unify ingress and envoy proxy as a single Envoy Gateway Feb 5, 2026
@bobbai00 bobbai00 changed the title refactor(helm): Unify ingress and envoy proxy as a single Envoy Gateway refactor(helm): unify ingress and envoy proxy as a single gateway Feb 5, 2026
Copy link
Contributor

@bobbai00 bobbai00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.

One general comment: for all the gateway related files, the filename should start with gateway-

@bobbai00 bobbai00 added the refactor Refactor the code label Feb 5, 2026
@github-actions github-actions bot removed the refactor Refactor the code label Feb 5, 2026
@aicam aicam requested a review from bobbai00 February 5, 2026 19:57
Copy link
Contributor

@bobbai00 bobbai00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments

@github-actions github-actions bot added the common label Feb 9, 2026
@aicam aicam requested a review from bobbai00 February 9, 2026 17:59
@github-actions github-actions bot added engine dependencies Pull requests that update a dependency file labels Feb 9, 2026
@github-actions github-actions bot removed engine dependencies Pull requests that update a dependency file labels Feb 9, 2026
Copy link
Contributor

@bobbai00 bobbai00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@aicam aicam merged commit 99348fb into apache:main Feb 10, 2026
10 checks passed
madisonmlin pushed a commit to madisonmlin/texera that referenced this pull request Mar 10, 2026
…ache#4191)

<!--
Thanks for sending a pull request (PR)! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
[Contributing to
Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
  2. Ensure you have added or run the appropriate tests for your PR
  3. If the PR is work in progress, mark it a draft on GitHub.
  4. Please write your PR title to summarize what this PR proposes, we 
    are following Conventional Commits style for PR titles as well.
  5. Be sure to keep the PR description updated to reflect all changes.
-->

### What changes were proposed in this PR?
<!--
Please clarify what changes you are proposing. The purpose of this
section
is to outline the changes. Here are some tips for you:
  1. If you propose a new API, clarify the use case for a new API.
  2. If you fix a bug, you can clarify why it is a bug.
  3. If it is a refactoring, clarify what has been changed.
  3. It would be helpful to include a before-and-after comparison using 
     screenshots or GIFs.
  4. Please consider writing useful notes for better and faster reviews.
-->

This PR consolidates the cluster networking architecture by replacing
multiple disparate ingress/proxy solutions with a single, unified
**Envoy Gateway** using the Kubernetes Gateway API.

**Previously:**
- **Texera Ingress**: Handled by `ingress-nginx` controller (separate
Helm dependency).
- **MinIO Ingress**: Configured separately, often requiring its own
ingress status or port exposure.
- **CU Envoy**: A standalone, manually maintained Envoy deployment was
used to proxy traffic to Computing Units (CUs).

**Now (with Envoy Gateway):**
- **Unified Gateway**: A single `Gateway` resource (`texera-gateway`)
manages traffic for Texera Webserver, MinIO, and Computing Units.
- **Gateway API**: Uses standard `HTTPRoute` resources to define routing
rules (prefix matching, rewrites) instead of proprietary Ingress
annotations or custom config.
- **SSL/TLS Automation**: Integrated `cert-manager` with Envoy Gateway
to automatically provision and renew Let's Encrypt certificates for both
the main Texera domain and the MinIO subdomain.

**New Kubernetes Resources**:
    
* `bin/k8s/templates/gateway.yaml`: Defines the `Gateway` resource,
configuring listeners for HTTP, HTTPS, and MinIO. Handles TLS
termination using Let's Encrypt via cert-manager.

*   `bin/k8s/templates/routes.yaml`: Defines `HTTPRoute` resources.
* **Static Routes**: Standard path-based routing for Texera services
(Webserver, API, etc.).
* **Dynamic Routes**: Captures regex paths for Computing Units and
delegates them to the dynamic backend.

* `bin/k8s/templates/backend.yaml`: Defines a `Backend` resource of type
`DynamicResolver`. This allows Envoy to route to targets defined
dynamically (e.g., by the ExtAuth service modifying headers) rather than
static Kubernetes services.

* `bin/k8s/templates/security-policy.yaml`: Defines the `SecurityPolicy`
that attaches to the dynamic routes. It configures the External
Authorization filter to point to the `access-control-service`.

* `bin/k8s/templates/eg-config-hook.yaml`: A **Helm Hook**
(pre-install/pre-upgrade) that automatically patches the Envoy Gateway
configuration to enable necessary features (`enableBackend`,
`enableEnvoyPatchPolicy`) which are disabled by default. It ensures the
environment is correctly configured without manual intervention.

### Any related issues, documentation, discussions?
<!--
Please use this section to link other resources if not mentioned
already.
1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves
apache#1234`
or `Closes apache#1234`. If it is only related, simply mention the issue
number.
  2. If there is design documentation, please add the link.
  3. If there is a discussion in the mailing list, please add the link.
-->
Closes apache#4190 

### How was this PR tested?
<!--
If tests were added, say they were added here. Or simply mention that if
the PR
is tested with existing test cases. Make sure to include/update test
cases that
check the changes thoroughly including negative and positive cases if
possible.
If it was tested in a way different from regular unit tests, please
clarify how
you tested step by step, ideally copy and paste-able, so that other
reviewers can
test and check, and descendants can verify in the future. If tests were
not added,
please describe why they were not added and/or why it was difficult to
add.
-->

Tested on the production RKE2 cluster:

We tested on http and https on both production server and local
environment.

### Was this PR authored or co-authored using generative AI tooling?
<!--
If generative AI tooling has been used in the process of authoring this
PR,
please include the phrase: 'Generated-by: ' followed by the name of the
tool
and its version. If no, write 'No'. 
Please refer to the [ASF Generative Tooling
Guidance](https://www.apache.org/legal/generative-tooling.html) for
details.
-->
Generated-by: Antigravity
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace Ingress of MinIO and Texera + Envoy for computing units with one single Envoy Gateway as proxy

2 participants