diff --git a/content/integrate/redis-data-integration/_index.md b/content/integrate/redis-data-integration/_index.md index c3032b2e0f..060c660ab0 100644 --- a/content/integrate/redis-data-integration/_index.md +++ b/content/integrate/redis-data-integration/_index.md @@ -74,7 +74,7 @@ RDI provides enterprise-grade streaming data pipelines with the following featur - **Backpressure mechanism** - RDI is designed to backoff writing data when the cache gets disconnected, which prevents cascading failure. Since the change data is persisted in the source database and Redis is very fast, RDI can easily catch up with missed changes after a short period of - disconnection. See [Backpressure mechanism]({{< relref "/integrate/redis-data-integration/architecture#backpressure-mechanism">}}) for more information. + disconnection. See [Backpressure mechanism]({{< relref "/integrate/redis-data-integration/data-pipelines#backpressure-mechanism">}}) for more information. - **Recovering from full failure** - If the cache fails or gets disconnected for a long time, RDI can reconstruct the cache data in Redis using a full snapshot of the defined dataset. - **High throughput** - Because RDI uses Redis for staging and writes to Redis as a target, @@ -92,9 +92,11 @@ to find out if your use case is a good fit for RDI's features. ## Supported source databases -RDI can capture data from any of the following sources: - -{{< embed-md "rdi-supported-source-versions.md" >}} +RDI can capture data from a range of sources, including PostgreSQL, MySQL, +MariaDB, Oracle, SQL Server, and MongoDB. See +[Prepare source databases]({{< relref "/integrate/redis-data-integration/data-pipelines/prepare-dbs" >}}) +for the full list of supported databases and versions, along with instructions +to get each one ready for use with RDI. ## Continue learning with Redis University diff --git a/content/integrate/redis-data-integration/architecture/_index.md b/content/integrate/redis-data-integration/architecture/_index.md index f9d09f0453..b16f692868 100644 --- a/content/integrate/redis-data-integration/architecture/_index.md +++ b/content/integrate/redis-data-integration/architecture/_index.md @@ -41,7 +41,11 @@ in sequence: 1. A *CDC collector* captures changes to the source database. RDI currently uses an open source collector called - [Debezium](https://debezium.io/) for this step. + [Debezium](https://debezium.io/) for this step, which uses + [Debezium Server](https://debezium.io/documentation/reference/stable/operations/debezium-server.html) + connectors to support a range of database sources. See + [Prepare source databases]({{< relref "/integrate/redis-data-integration/data-pipelines/prepare-dbs" >}}) + for the full list of supported databases and versions. 1. The collector records the captured changes using [Redis streams]({{< relref "/develop/data-types/streams" >}}) @@ -70,51 +74,6 @@ RDI automatically enters a second phase called *change streaming*, where changes in the data are captured as they happen. Changes are usually added to the target within a few seconds after capture. -## At-least-once delivery guarantee - -RDI guarantees *at-least-once delivery* to the target. This means that -a given change will never be lost, but it might be added to the target -more than once. Apart from a slight performance overhead, adding a -change multiple times is harmless because the multiple writes -are [*idempotent*](https://en.wikipedia.org/wiki/Idempotence) (that is -to say that all writes after the first one make no change to the -overall state). - -## Checkpointing - -RDI uses Redis streams to store the sequence of change events -captured from the source. The events are then retrieved in order -from the streams, processed, and written to the target. The stream -processor uses a *checkpoint* mechanism to keep track of the last -event in the sequence that it has successfully processed and stored. If the processor fails -for any reason, it can restart from the last checkpoint and -re-process any events that might not have been written to the target. -This ensures that all changes are eventually recorded, even in the -face of failures. - -## Backpressure mechanism - -Sometimes, data records can get added to the streams faster than RDI can -process them. This can happen if the target is slowed or disconnected -or simply if the source quickly generates a lot of change data. -If this continues, then the streams will eventually occupy all the -available memory. When RDI detects this situation, it applies a -*backpressure* mechanism to slow or stop the flow of incoming data. -Change data is held at the source until RDI clears the backlog and has -enough free memory to resume streaming. - -{{}}The Debezium log sometimes reports that RDI has run out -of memory (usually while creating the initial snapshot). This is not -an error, just an informative message to note that RDI has applied -the backpressure mechanism. -{{}} - -## Supported sources - -RDI supports the following database sources using [Debezium Server](https://debezium.io/documentation/reference/stable/operations/debezium-server.html) connectors: - -{{< embed-md "rdi-supported-source-versions.md" >}} - ## How RDI is deployed RDI is designed with three *planes* that provide its services. diff --git a/content/integrate/redis-data-integration/data-pipelines/_index.md b/content/integrate/redis-data-integration/data-pipelines/_index.md index 62f7f255b6..7246313a79 100644 --- a/content/integrate/redis-data-integration/data-pipelines/_index.md +++ b/content/integrate/redis-data-integration/data-pipelines/_index.md @@ -156,6 +156,51 @@ When your configuration is ready, you must deploy it to start using the pipeline [Deploy a pipeline]({{< relref "/integrate/redis-data-integration/data-pipelines/deploy" >}}) to learn how to do this. +## Pipeline features + +RDI pipelines include several built-in features that keep your data accurate and +the system stable, even when components fail or change data arrives faster than +RDI can process it. The sections below describe the most important ones. + +### At-least-once delivery guarantee + +RDI guarantees *at-least-once delivery* to the target. This means that +a given change will never be lost, but it might be added to the target +more than once. Apart from a slight performance overhead, adding a +change multiple times is harmless because the multiple writes +are [*idempotent*](https://en.wikipedia.org/wiki/Idempotence) (that is +to say that all writes after the first one make no change to the +overall state). + +### Checkpointing + +RDI uses Redis streams to store the sequence of change events +captured from the source. The events are then retrieved in order +from the streams, processed, and written to the target. The stream +processor uses a *checkpoint* mechanism to keep track of the last +event in the sequence that it has successfully processed and stored. If the processor fails +for any reason, it can restart from the last checkpoint and +re-process any events that might not have been written to the target. +This ensures that all changes are eventually recorded, even in the +face of failures. + +### Backpressure mechanism + +Sometimes, data records can get added to the streams faster than RDI can +process them. This can happen if the target is slowed or disconnected +or simply if the source quickly generates a lot of change data. +If this continues, then the streams will eventually occupy all the +available memory. When RDI detects this situation, it applies a +*backpressure* mechanism to slow or stop the flow of incoming data. +Change data is held at the source until RDI clears the backlog and has +enough free memory to resume streaming. + +{{}}The Debezium log sometimes reports that RDI has run out +of memory (usually while creating the initial snapshot). This is not +an error, just an informative message to note that RDI has applied +the backpressure mechanism. +{{}} + ## More information See the other pages in this section for more information and examples: diff --git a/content/integrate/redis-data-integration/faq.md b/content/integrate/redis-data-integration/faq.md index 353a5ccbb0..b2932a9d4c 100644 --- a/content/integrate/redis-data-integration/faq.md +++ b/content/integrate/redis-data-integration/faq.md @@ -76,8 +76,8 @@ new or renamed tables and columns. Sometimes the Debezium log will contain a message saying that RDI is out of memory. This is not an error but an informative message to say that RDI is applying *backpressure* to Debezium. See -[Backpressure mechanism]({{< relref "/integrate/redis-data-integration/architecture#backpressure-mechanism" >}}) -in the Architecture guide for more information. +[Backpressure mechanism]({{< relref "/integrate/redis-data-integration/data-pipelines#backpressure-mechanism" >}}) +in the Data pipelines guide for more information. ## What happens when RDI can't write to the target Redis database? diff --git a/content/integrate/redis-data-integration/installation/install-k8s.md b/content/integrate/redis-data-integration/installation/install-k8s.md index a54ee5a3f7..49c2f355d8 100644 --- a/content/integrate/redis-data-integration/installation/install-k8s.md +++ b/content/integrate/redis-data-integration/installation/install-k8s.md @@ -42,11 +42,21 @@ including cloud providers' K8s managed clusters. You can configure the RDI Helm chart to pull the RDI images from [dockerhub](https://hub.docker.com/u/redis) or from your own [private image registry](#using-a-private-image-registry). -## Before you install +## Prerequisites -Complete the following steps before installing the RDI Helm chart: +Before you install: -- [Create the RDI database](#create-the-rdi-database) on your Redis Enterprise cluster. +- Check that your version of Kubernetes or OpenShift is supported. See + [Kubernetes/OpenShift supported versions]({{< relref "/integrate/redis-data-integration/installation/reqsummary#kubernetesopenshift-supported-versions" >}}) + in the requirements summary. + +- Create the RDI database on your Redis Enterprise cluster, which RDI uses to + store its state information. Use the Redis Enterprise Cluster Manager UI to + create it, and see + [RDI database requirements]({{< relref "/integrate/redis-data-integration/installation/reqsummary#rdi-database-requirements" >}}) + in the requirements summary for the configuration it needs. You will provide + the connection details for this database in the [`values.yaml`](#the-valuesyaml-file) + file as described below. - Create a [user]({{< relref "/operate/rs/security/access-control/create-users" >}}) for the RDI database if you prefer not to use the default password (see @@ -64,18 +74,13 @@ Complete the following steps before installing the RDI Helm chart: - If you want to use a private image registry, [prepare it with the RDI images](#using-a-private-image-registry). -### Create the RDI database - -RDI uses a database on your Redis Enterprise cluster to store its state -information. Use the Redis Enterprise Cluster Manager UI to create the RDI database with the following -requirements: +Before you run RDI: -{{< embed-md "rdi-db-reqs.md" >}} +- Prepare your source database to enable change data capture (CDC). See + [Prepare source databases]({{< relref "/integrate/redis-data-integration/data-pipelines/prepare-dbs" >}}) + to learn how to do this. -You should then provide the details of this database in the [`values.yaml`](#the-valuesyaml-file) -file as described below. - -### Using a private image registry +## Using a private image registry Add the RDI images from [dockerhub](https://hub.docker.com/u/redis) to your local registry. You need the following RDI images with tags matching the RDI version you want to install: @@ -146,10 +151,6 @@ To pull images from a private image registry, you must provide the image pull se - [Google Kubernetes Engine (GKE)](https://cloud.google.com/artifact-registry/docs/pull-cached-dockerhub-images) - [Azure Kubernetes Service (AKS)](https://learn.microsoft.com/en-us/azure/aks/cluster-container-registry-integration?tabs=azure-cli) -## Supported versions of Kubernetes and OpenShift - -{{< embed-md "rdi-k8s-reqs.md" >}} - ## Install the RDI Helm chart 1. Scaffold the default `values.yaml` file from the chart into a local @@ -381,12 +382,6 @@ Specifically, ensure that one or both of the following Helm chart values is set: - `controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-health-probe-request-path"=/healthz` - `controller.service.externalTrafficPolicy=Local` -## Prepare your source database - -Before deploying a pipeline, you must configure your source database to enable CDC. See the -[Prepare source databases]({{< relref "/integrate/redis-data-integration/data-pipelines/prepare-dbs" >}}) -section to learn how to do this. - ## Deploy a pipeline When the Helm installation is complete and you have prepared the source database for CDC, diff --git a/content/integrate/redis-data-integration/installation/install-vm.md b/content/integrate/redis-data-integration/installation/install-vm.md index 8cbd20bef1..89826fe889 100644 --- a/content/integrate/redis-data-integration/installation/install-vm.md +++ b/content/integrate/redis-data-integration/installation/install-vm.md @@ -23,20 +23,26 @@ your source database. You can also {{< note >}}We recommend you always use the latest version, which is RDI v{{< rdi-version >}}. {{< /note >}} -## Create the RDI database +## Prerequisites -RDI uses a database on your Redis Enterprise cluster to store its state -information. Use the Redis Enterprise Cluster Manager UI to create the RDI database with the following -requirements: +Before you install: -{{< embed-md "rdi-db-reqs.md" >}} +- Create the RDI database on your Redis Enterprise cluster, which RDI uses to + store its state information. Use the Redis Enterprise Cluster Manager UI to + create it, and see + [RDI database requirements]({{< relref "/integrate/redis-data-integration/installation/reqsummary#rdi-database-requirements" >}}) + in the requirements summary for the configuration it needs. -## Hardware sizing +- Check that each RDI VM meets the + [hardware requirements]({{< relref "/integrate/redis-data-integration/installation/reqsummary#hardware-requirements-for-vm-installation" >}}) + (RDI is mainly CPU and network bound) and runs a + [supported operating system]({{< relref "/integrate/redis-data-integration/installation/reqsummary#os-requirements-for-vm-installation" >}}). -RDI is mainly CPU and network bound. -Each of the RDI VMs should have at least: +Before you run RDI: -{{< embed-md "rdi-vm-reqs.md" >}} +- Prepare your source database to enable change data capture (CDC). See + [Prepare source databases]({{< relref "/integrate/redis-data-integration/data-pipelines/prepare-dbs" >}}) + to learn how to do this. ## VM Installation Requirements @@ -70,10 +76,6 @@ from working, especially on RHEL 8. Ideally, use `iptables` v1.8.8, which is known to work correctly with RDI. {{< /note >}} -The supported OS versions for RDI are: - -{{< embed-md "rdi-os-reqs.md" >}} - You must run the RDI installer as a privileged user because it installs [containerd](https://containerd.io/) and registers services. However, you don't need any special privileges to run RDI processes for normal operation. @@ -271,12 +273,6 @@ and the RDI pipeline will be active on that VM. You may find it useful to trigger a failover deliberately to check that RDI is correctly configured to handle it. See [Test HA failover]({{< relref "/integrate/redis-data-integration/installation/ha-test" >}}) to learn how to do this. -## Prepare your source database - -Before deploying a pipeline, you must configure your source database to enable CDC. See the -[Prepare source databases]({{< relref "/integrate/redis-data-integration/data-pipelines/prepare-dbs" >}}) -section to learn how to do this. - ## Deploy a pipeline When the installation is complete, and you have prepared the source database for CDC, diff --git a/content/integrate/redis-data-integration/troubleshooting.md b/content/integrate/redis-data-integration/troubleshooting.md index a5cfe43ae5..c98d4d5093 100644 --- a/content/integrate/redis-data-integration/troubleshooting.md +++ b/content/integrate/redis-data-integration/troubleshooting.md @@ -54,8 +54,8 @@ log analysis tools can use. saying RDI is out of memory. This is not an error but an informative message to say that RDI is applying *backpressure* to the collector. See -[Backpressure mechanism]({{< relref "/integrate/redis-data-integration/architecture#backpressure-mechanism" >}}) -in the Architecture guide for more information. +[Backpressure mechanism]({{< relref "/integrate/redis-data-integration/data-pipelines#backpressure-mechanism" >}}) +in the Data pipelines guide for more information. {{< /note >}} ## Dump support package