Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions content/integrate/redis-data-integration/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ RDI provides enterprise-grade streaming data pipelines with the following featur
- **Backpressure mechanism** - RDI is designed to backoff writing data when the cache gets
disconnected, which prevents cascading failure. Since the change data is persisted in the source
database and Redis is very fast, RDI can easily catch up with missed changes after a short period of
disconnection. See [Backpressure mechanism]({{< relref "/integrate/redis-data-integration/architecture#backpressure-mechanism">}}) for more information.
disconnection. See [Backpressure mechanism]({{< relref "/integrate/redis-data-integration/data-pipelines#backpressure-mechanism">}}) for more information.
- **Recovering from full failure** - If the cache fails or gets disconnected for a long time,
RDI can reconstruct the cache data in Redis using a full snapshot of the defined dataset.
- **High throughput** - Because RDI uses Redis for staging and writes to Redis as a target,
Expand All @@ -92,9 +92,11 @@ to find out if your use case is a good fit for RDI's features.

## Supported source databases

RDI can capture data from any of the following sources:

{{< embed-md "rdi-supported-source-versions.md" >}}
RDI can capture data from a range of sources, including PostgreSQL, MySQL,
MariaDB, Oracle, SQL Server, and MongoDB. See
[Prepare source databases]({{< relref "/integrate/redis-data-integration/data-pipelines/prepare-dbs" >}})
for the full list of supported databases and versions, along with instructions
to get each one ready for use with RDI.

## Continue learning with Redis University

Expand Down
51 changes: 5 additions & 46 deletions content/integrate/redis-data-integration/architecture/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,11 @@ in sequence:

1. A *CDC collector* captures changes to the source database. RDI
currently uses an open source collector called
[Debezium](https://debezium.io/) for this step.
[Debezium](https://debezium.io/) for this step, which uses
[Debezium Server](https://debezium.io/documentation/reference/stable/operations/debezium-server.html)
connectors to support a range of database sources. See
[Prepare source databases]({{< relref "/integrate/redis-data-integration/data-pipelines/prepare-dbs" >}})
for the full list of supported databases and versions.

1. The collector records the captured changes using
[Redis streams]({{< relref "/develop/data-types/streams" >}})
Expand Down Expand Up @@ -70,51 +74,6 @@ RDI automatically enters a second phase called *change streaming*, where
changes in the data are captured as they happen. Changes are usually
added to the target within a few seconds after capture.

## At-least-once delivery guarantee

RDI guarantees *at-least-once delivery* to the target. This means that
a given change will never be lost, but it might be added to the target
more than once. Apart from a slight performance overhead, adding a
change multiple times is harmless because the multiple writes
are [*idempotent*](https://en.wikipedia.org/wiki/Idempotence) (that is
to say that all writes after the first one make no change to the
overall state).

## Checkpointing

RDI uses Redis streams to store the sequence of change events
captured from the source. The events are then retrieved in order
from the streams, processed, and written to the target. The stream
processor uses a *checkpoint* mechanism to keep track of the last
event in the sequence that it has successfully processed and stored. If the processor fails
for any reason, it can restart from the last checkpoint and
re-process any events that might not have been written to the target.
This ensures that all changes are eventually recorded, even in the
face of failures.

## Backpressure mechanism

Sometimes, data records can get added to the streams faster than RDI can
process them. This can happen if the target is slowed or disconnected
or simply if the source quickly generates a lot of change data.
If this continues, then the streams will eventually occupy all the
available memory. When RDI detects this situation, it applies a
*backpressure* mechanism to slow or stop the flow of incoming data.
Change data is held at the source until RDI clears the backlog and has
enough free memory to resume streaming.

{{<note>}}The Debezium log sometimes reports that RDI has run out
of memory (usually while creating the initial snapshot). This is not
an error, just an informative message to note that RDI has applied
the backpressure mechanism.
{{</note>}}

## Supported sources

RDI supports the following database sources using [Debezium Server](https://debezium.io/documentation/reference/stable/operations/debezium-server.html) connectors:

{{< embed-md "rdi-supported-source-versions.md" >}}

## How RDI is deployed

RDI is designed with three *planes* that provide its services.
Expand Down
45 changes: 45 additions & 0 deletions content/integrate/redis-data-integration/data-pipelines/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,51 @@ When your configuration is ready, you must deploy it to start using the pipeline
[Deploy a pipeline]({{< relref "/integrate/redis-data-integration/data-pipelines/deploy" >}})
to learn how to do this.

## Pipeline features

RDI pipelines include several built-in features that keep your data accurate and
the system stable, even when components fail or change data arrives faster than
RDI can process it. The sections below describe the most important ones.

### At-least-once delivery guarantee

RDI guarantees *at-least-once delivery* to the target. This means that
a given change will never be lost, but it might be added to the target
more than once. Apart from a slight performance overhead, adding a
change multiple times is harmless because the multiple writes
are [*idempotent*](https://en.wikipedia.org/wiki/Idempotence) (that is
to say that all writes after the first one make no change to the
overall state).

### Checkpointing

RDI uses Redis streams to store the sequence of change events
captured from the source. The events are then retrieved in order
from the streams, processed, and written to the target. The stream
processor uses a *checkpoint* mechanism to keep track of the last
event in the sequence that it has successfully processed and stored. If the processor fails
for any reason, it can restart from the last checkpoint and
re-process any events that might not have been written to the target.
This ensures that all changes are eventually recorded, even in the
face of failures.

### Backpressure mechanism

Sometimes, data records can get added to the streams faster than RDI can
process them. This can happen if the target is slowed or disconnected
or simply if the source quickly generates a lot of change data.
If this continues, then the streams will eventually occupy all the
available memory. When RDI detects this situation, it applies a
*backpressure* mechanism to slow or stop the flow of incoming data.
Change data is held at the source until RDI clears the backlog and has
enough free memory to resume streaming.

{{<note>}}The Debezium log sometimes reports that RDI has run out
of memory (usually while creating the initial snapshot). This is not
an error, just an informative message to note that RDI has applied
the backpressure mechanism.
{{</note>}}

## More information

See the other pages in this section for more information and examples:
4 changes: 2 additions & 2 deletions content/integrate/redis-data-integration/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,8 @@ new or renamed tables and columns.
Sometimes the Debezium log will contain a message saying that RDI is out of
memory. This is not an error but an informative message to say that RDI
is applying *backpressure* to Debezium. See
[Backpressure mechanism]({{< relref "/integrate/redis-data-integration/architecture#backpressure-mechanism" >}})
in the Architecture guide for more information.
[Backpressure mechanism]({{< relref "/integrate/redis-data-integration/data-pipelines#backpressure-mechanism" >}})
in the Data pipelines guide for more information.

## What happens when RDI can't write to the target Redis database?

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,21 @@ including cloud providers' K8s managed clusters.
You can configure the RDI Helm chart to pull the RDI images from [dockerhub](https://hub.docker.com/u/redis)
or from your own [private image registry](#using-a-private-image-registry).

## Before you install
## Prerequisites

Complete the following steps before installing the RDI Helm chart:
Before you install:

- [Create the RDI database](#create-the-rdi-database) on your Redis Enterprise cluster.
- Check that your version of Kubernetes or OpenShift is supported. See
[Kubernetes/OpenShift supported versions]({{< relref "/integrate/redis-data-integration/installation/reqsummary#kubernetesopenshift-supported-versions" >}})
in the requirements summary.

- Create the RDI database on your Redis Enterprise cluster, which RDI uses to
store its state information. Use the Redis Enterprise Cluster Manager UI to
create it, and see
[RDI database requirements]({{< relref "/integrate/redis-data-integration/installation/reqsummary#rdi-database-requirements" >}})
in the requirements summary for the configuration it needs. You will provide
the connection details for this database in the [`values.yaml`](#the-valuesyaml-file)
file as described below.

- Create a [user]({{< relref "/operate/rs/security/access-control/create-users" >}})
for the RDI database if you prefer not to use the default password (see
Expand All @@ -64,18 +74,13 @@ Complete the following steps before installing the RDI Helm chart:
- If you want to use a private image registry,
[prepare it with the RDI images](#using-a-private-image-registry).

### Create the RDI database

RDI uses a database on your Redis Enterprise cluster to store its state
information. Use the Redis Enterprise Cluster Manager UI to create the RDI database with the following
requirements:
Before you run RDI:

{{< embed-md "rdi-db-reqs.md" >}}
- Prepare your source database to enable change data capture (CDC). See
[Prepare source databases]({{< relref "/integrate/redis-data-integration/data-pipelines/prepare-dbs" >}})
to learn how to do this.

You should then provide the details of this database in the [`values.yaml`](#the-valuesyaml-file)
file as described below.

### Using a private image registry
## Using a private image registry

Add the RDI images from [dockerhub](https://hub.docker.com/u/redis) to your local registry.
You need the following RDI images with tags matching the RDI version you want to install:
Expand Down Expand Up @@ -146,10 +151,6 @@ To pull images from a private image registry, you must provide the image pull se
- [Google Kubernetes Engine (GKE)](https://cloud.google.com/artifact-registry/docs/pull-cached-dockerhub-images)
- [Azure Kubernetes Service (AKS)](https://learn.microsoft.com/en-us/azure/aks/cluster-container-registry-integration?tabs=azure-cli)

## Supported versions of Kubernetes and OpenShift

{{< embed-md "rdi-k8s-reqs.md" >}}

## Install the RDI Helm chart

1. Scaffold the default `values.yaml` file from the chart into a local
Expand Down Expand Up @@ -381,12 +382,6 @@ Specifically, ensure that one or both of the following Helm chart values is set:
- `controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-health-probe-request-path"=/healthz`
- `controller.service.externalTrafficPolicy=Local`

## Prepare your source database

Before deploying a pipeline, you must configure your source database to enable CDC. See the
[Prepare source databases]({{< relref "/integrate/redis-data-integration/data-pipelines/prepare-dbs" >}})
section to learn how to do this.

## Deploy a pipeline

When the Helm installation is complete and you have prepared the source database for CDC,
Expand Down
34 changes: 15 additions & 19 deletions content/integrate/redis-data-integration/installation/install-vm.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,20 +23,26 @@ your source database. You can also
{{< note >}}We recommend you always use the latest version, which is RDI v{{< rdi-version >}}.
{{< /note >}}

## Create the RDI database
## Prerequisites

RDI uses a database on your Redis Enterprise cluster to store its state
information. Use the Redis Enterprise Cluster Manager UI to create the RDI database with the following
requirements:
Before you install:

{{< embed-md "rdi-db-reqs.md" >}}
- Create the RDI database on your Redis Enterprise cluster, which RDI uses to
store its state information. Use the Redis Enterprise Cluster Manager UI to
create it, and see
[RDI database requirements]({{< relref "/integrate/redis-data-integration/installation/reqsummary#rdi-database-requirements" >}})
in the requirements summary for the configuration it needs.

## Hardware sizing
- Check that each RDI VM meets the
[hardware requirements]({{< relref "/integrate/redis-data-integration/installation/reqsummary#hardware-requirements-for-vm-installation" >}})
(RDI is mainly CPU and network bound) and runs a
[supported operating system]({{< relref "/integrate/redis-data-integration/installation/reqsummary#os-requirements-for-vm-installation" >}}).

RDI is mainly CPU and network bound.
Each of the RDI VMs should have at least:
Before you run RDI:

{{< embed-md "rdi-vm-reqs.md" >}}
- Prepare your source database to enable change data capture (CDC). See
[Prepare source databases]({{< relref "/integrate/redis-data-integration/data-pipelines/prepare-dbs" >}})
to learn how to do this.

## VM Installation Requirements

Expand Down Expand Up @@ -70,10 +76,6 @@ from working, especially on RHEL 8. Ideally, use `iptables` v1.8.8, which is
known to work correctly with RDI.
{{< /note >}}

The supported OS versions for RDI are:

{{< embed-md "rdi-os-reqs.md" >}}

You must run the RDI installer as a privileged user because it installs
[containerd](https://containerd.io/) and registers services. However, you don't
need any special privileges to run RDI processes for normal operation.
Expand Down Expand Up @@ -271,12 +273,6 @@ and the RDI pipeline will be active on that VM.

You may find it useful to trigger a failover deliberately to check that RDI is correctly configured to handle it. See [Test HA failover]({{< relref "/integrate/redis-data-integration/installation/ha-test" >}}) to learn how to do this.

## Prepare your source database

Before deploying a pipeline, you must configure your source database to enable CDC. See the
[Prepare source databases]({{< relref "/integrate/redis-data-integration/data-pipelines/prepare-dbs" >}})
section to learn how to do this.

## Deploy a pipeline

When the installation is complete, and you have prepared the source database for CDC,
Expand Down
4 changes: 2 additions & 2 deletions content/integrate/redis-data-integration/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,8 @@ log analysis tools can use.
saying RDI is out of
memory. This is not an error but an informative message to say that RDI
is applying *backpressure* to the collector. See
[Backpressure mechanism]({{< relref "/integrate/redis-data-integration/architecture#backpressure-mechanism" >}})
in the Architecture guide for more information.
[Backpressure mechanism]({{< relref "/integrate/redis-data-integration/data-pipelines#backpressure-mechanism" >}})
in the Data pipelines guide for more information.
{{< /note >}}

## Dump support package
Expand Down
Loading