Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
dfda88c
Initial implementation for BoundedStreamConfig
aho135 Apr 23, 2026
4180ace
Implement isOffsetAtOrBeyond for Rabbit and Kinesis
aho135 Apr 23, 2026
8cb75f6
Unit test coverage
aho135 Apr 24, 2026
300ebe3
Fix BoundedStreamConfigTest
aho135 Apr 24, 2026
9af9729
Remove unused import
aho135 Apr 24, 2026
e0ffef6
Remove unneeded tests
aho135 Apr 24, 2026
162e1f3
Unit test fix
aho135 Apr 24, 2026
3ea2b0b
Fix import and add coverage for RabbitStreamSupervisor
aho135 Apr 24, 2026
8e3e81c
Test coverage for validateBoundedStreamConfig
aho135 Apr 24, 2026
4bed658
Re-initialize partition group and reset state after reset
aho135 Apr 30, 2026
c9181f0
Handle edge case where startOffset equals endOffset
aho135 Apr 30, 2026
9e85331
Compare Kinesis sequence numbers using BigInteger
aho135 Apr 30, 2026
9a32ce0
Remove stale test case
aho135 Apr 30, 2026
b04e907
Remove redundant validation of boundedStreamConfig
aho135 May 1, 2026
8e6dfb8
Throw DruidException with ADMIN persona for BoundedStreamConfig
aho135 May 1, 2026
f03abb6
Clean up unused Logger
aho135 May 1, 2026
ae9083f
javadoc and comment cleanup for isBoundedWorkComplete
aho135 May 1, 2026
f8a313b
Add embedded test for bounded ingestion
aho135 May 1, 2026
5965ac4
Add boundedStreamConfig to SeekableStreamDataSourceMetadata for metad…
aho135 May 2, 2026
9e66948
Revert pendingCompletionGroups check
aho135 May 2, 2026
902e118
Unit test fix
aho135 May 2, 2026
670749c
embedded-test for metadata mismatch
aho135 May 3, 2026
2457caf
Remove unused var
aho135 May 3, 2026
3943ad0
Unit test fix
aho135 May 3, 2026
395fa9a
Add boundedStreamConfig documentation
aho135 May 4, 2026
4cde39f
Fix spellcheck
aho135 May 4, 2026
7e86ec6
Increase code coverage
aho135 May 4, 2026
d985dc4
Increase coverage for BoundedStreamConfig
aho135 May 4, 2026
021e721
Remove unnecessary test
aho135 May 4, 2026
d23d9c4
Simplify completion check in createNewTasks
aho135 May 4, 2026
42ada89
Remove unused function
aho135 May 5, 2026
ed589c2
Unit test bounded supervisor completion
aho135 May 5, 2026
234bc82
Improve coverage on RabbitStreamSupervisor
aho135 May 5, 2026
1cd928d
Unit test coverage
aho135 May 5, 2026
094427d
Unit test for IllegalArgumentException for KafkaSupervisor
aho135 May 5, 2026
3ad278d
Check if end offsets are exclusive for bounded work completion
aho135 May 5, 2026
cf623b8
Increase branch coverage
aho135 May 5, 2026
a037ccb
Increase branch coverage
aho135 May 5, 2026
0e6466c
Unit test coverage
aho135 May 6, 2026
b1b1179
Fix import
aho135 May 6, 2026
126638f
Remove use of deprecated function
aho135 May 6, 2026
2ff9cfd
Revert to deprecated function since not initialized in mock object
aho135 May 6, 2026
2d42ab4
Merge branch 'master' into bounded-stream-supervisor
aho135 May 6, 2026
31c870e
Fix merge conflict
aho135 May 6, 2026
b68705a
Detect metadata mismatch when committed offset > bounded config end
aho135 May 7, 2026
6c33c8f
Clean up redundant tests in BoundedStreamConfigTest and use EqualsVer…
aho135 May 12, 2026
41a31a1
Compare Kinesis Sequence numbers using built in comparison
aho135 May 12, 2026
dd9bdd2
Clean up docs based on review comments
aho135 May 13, 2026
fd7981f
Early return before convert for hasTaskGroupReachedBoundedEnd
aho135 May 13, 2026
5bc87fb
Merge branch 'master' into bounded-stream-supervisor
aho135 May 13, 2026
cc13428
Resolve merge conflicts
aho135 May 13, 2026
c62b936
Fix KinesisSupervisorTest
aho135 May 13, 2026
35c4574
Update KinesisSupervisorTest.java
aho135 May 13, 2026
e0b5ef9
Cover case where start > end
aho135 May 13, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions docs/ingestion/supervisor.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ For configuration properties specific to Kafka and Kinesis, see [Kafka I/O confi
|`lateMessageRejectionPeriod`|ISO 8601 period|Configures tasks to reject messages with timestamps earlier than this period before the task was created. For example, if this property is set to `PT1H` and the supervisor creates a task at `2016-01-01T12:00Z`, Druid drops messages with timestamps earlier than `2016-01-01T11:00Z`. This may help prevent concurrency issues if your data stream has late messages and you have multiple pipelines that need to operate on the same segments, such as a streaming and a nightly batch ingestion pipeline. You can specify only one of the late message rejection properties.|No||
|`earlyMessageRejectionPeriod`|ISO 8601 period|Configures tasks to reject messages with timestamps later than this period after the task reached its task duration. For example, if this property is set to `PT1H`, the task duration is set to `PT1H` and the supervisor creates a task at `2016-01-01T12:00Z`, Druid drops messages with timestamps later than `2016-01-01T14:00Z`. Tasks sometimes run past their task duration, such as in cases of supervisor failover.|No||
|`stopTaskCount`|Integer|Limits the number of ingestion tasks Druid can cycle at any given time. If not set, Druid can cycle all tasks at the same time. If set to a value less than `taskCount`, your cluster needs fewer available slots to run the supervisor. You can save costs by scaling down your ingestion tier, but this can lead to slower cycle times and lag. See [`stopTaskCount`](#stoptaskcount) for more information.|No|`taskCount` value|
|`boundedStreamConfig`|Object|Configures the supervisor for bounded (one-time) ingestion with explicit start and end offsets. When set, the supervisor creates tasks that read from `startSequenceNumbers` to `endSequenceNumbers`, then automatically terminates when all data is ingested. The bounded configuration is stored with datasource metadata; if a supervisor is restarted or a new supervisor is created with different offsets for the same datasource, it will fail. To retry with different offsets, use the supervisor reset API to clear metadata or use a different supervisor ID. Useful for backfills and historical reprocessing. See [Bounded stream configuration](#bounded-stream-configuration) for details.|No|null|
|`serverPriorityToReplicas`|Object (`Map<Integer, Integer>`)|Map of server priorities to the number of replicas per priority. When set, each task replica is assigned a server priority that corresponds to `druid.server.priority` on the Peon process to enable query isolation for mixed workloads using [query routing strategies](../configuration/index.md#query-routing). If not configured, the `replicas` setting applies and all task replicas are assigned a default priority of 0.<br/><br/>For example, setting `serverPriorityToReplicas` to `{"1": 2, "0": 1}` creates 2 task replicas with `druid.server.priority=1` and 1 task replica with `druid.server.priority=0` per task group. This configuration scales proportionally with `taskCount`. For example, if `taskCount` is set to 5, this results in 15 total tasks - 10 tasks with priority 1 and 5 tasks with priority 0. If both `replicas` and `serverPriorityToReplicas` are set, the sum of replicas in `serverPriorityToReplicas` must equal `replicas`.|No|null|

#### Task autoscaler
Expand Down Expand Up @@ -251,6 +252,57 @@ Before you set `stopTaskCount`, note the following:
- The [task autoscaler](#task-autoscaler) ignores `stopTaskCount` when shutting down tasks in response to a task count change. The task autoscaler needs to redistribute partitions across tasks, which requires all tasks to be shut down.
- If you set `stopTaskCount` to a value less than `taskCount`, Druid cycles the longest running tasks first, then other tasks up to the value set.

#### Bounded stream configuration

Use `boundedStreamConfig` to configure one-time ingestion from a specific range of offsets. This is useful for backfilling historical data or reprocessing data with different configurations.

The `boundedStreamConfig` object contains the following properties:

|Property|Type|Description|Required|
|--------|----|-----------|--------|
|`startSequenceNumbers`|Object|Map of partition IDs to start offsets (inclusive for Kafka, inclusive for Kinesis).|Yes|
|`endSequenceNumbers`|Object|Map of partition IDs to end offsets (exclusive for Kafka, inclusive for Kinesis).|Yes|

When configured, the supervisor:
1. Creates tasks that start reading from `startSequenceNumbers`
2. Tasks automatically stop when they reach `endSequenceNumbers`
3. Supervisor does not create replacement tasks after tasks complete
4. Supervisor transitions to `COMPLETED` state and terminates when all tasks finish

**Metadata consistency:** The bounded configuration is stored in datasource metadata along with checkpointed offsets. If you restart the supervisor or create a new supervisor with a different `boundedStreamConfig` for the same datasource, the supervisor will fail with an error. To start a new bounded ingestion with different offsets, either:
- Use the [supervisor reset API](../api-reference/supervisor-api.md#reset-a-supervisor) to clear existing metadata
- Use a different supervisor ID

**Example (Kafka):**

```json
{
"type": "kafka",
"spec": {
"ioConfig": {
"topic": "my-topic",
"inputFormat": {
"type": "json"
},
"boundedStreamConfig": {
"startSequenceNumbers": {
"0": 1000,
"1": 2000,
"2": 1500
},
"endSequenceNumbers": {
"0": 5000,
"1": 6000,
"2": 5500
}
}
}
}
}
```

This configuration ingests data from partition 0 offsets 1000-4999, partition 1 offsets 2000-5999, and partition 2 offsets 1500-5499.

### Tuning configuration

The `tuningConfig` object is optional. If you don't specify the `tuningConfig` object, Druid uses the default configuration settings.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,312 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.apache.druid.testing.embedded.indexing;

import org.apache.druid.common.utils.IdUtils;
import org.apache.druid.data.input.impl.JsonInputFormat;
import org.apache.druid.indexing.kafka.simulate.KafkaResource;
import org.apache.druid.indexing.kafka.supervisor.KafkaSupervisorSpec;
import org.apache.druid.indexing.overlord.supervisor.SupervisorStatus;
import org.apache.druid.indexing.seekablestream.supervisor.BoundedStreamConfig;
import org.apache.druid.query.DruidMetrics;
import org.apache.druid.testing.embedded.EmbeddedDruidCluster;
import org.apache.druid.testing.embedded.StreamIngestResource;
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.Test;

import java.util.HashMap;
import java.util.Map;

/**
* Tests for bounded Kafka supervisors (one-time ingestion with explicit start/end offsets).
*/
public class KafkaBoundedSupervisorTest extends StreamIndexTestBase
{
private final KafkaResource kafkaServer = new KafkaResource();

@Override
protected StreamIngestResource<?> getStreamIngestResource()
{
return kafkaServer;
}

@Override
protected EmbeddedDruidCluster createCluster()
{
overlord.addProperty(
"druid.monitoring.monitors",
"[\"org.apache.druid.server.metrics.SupervisorStatsMonitor\"]"
);
overlord.addProperty("druid.monitoring.emissionPeriod", "PT1s");
return super.createCluster();
}

@Test
public void test_boundedSupervisor_ingestsDataAndCompletes()
Comment thread
aho135 marked this conversation as resolved.
{
final String topic = IdUtils.getRandomId();
kafkaServer.createTopicWithPartitions(topic, 2);

// Publish records before creating supervisor
final int totalRecords = publish1kRecords(topic, false);

// Get the current end offsets for all partitions
Map<String, Long> endOffsets = kafkaServer.getPartitionOffsets(topic);
Assertions.assertEquals(2, endOffsets.size(), "Should have 2 partitions");

// Create bounded config with start offset 0 and current end offsets
Map<String, Long> startOffsets = new HashMap<>();
startOffsets.put("0", 0L);
startOffsets.put("1", 0L);

BoundedStreamConfig boundedConfig = new BoundedStreamConfig(startOffsets, endOffsets);

// Create bounded supervisor
final KafkaSupervisorSpec supervisor = createBoundedKafkaSupervisor(
kafkaServer,
topic,
boundedConfig
);

cluster.callApi().postSupervisor(supervisor);

// Wait for records to be ingested
waitUntilPublishedRecordsAreIngested(totalRecords);

// Wait for supervisor to transition to COMPLETED state
waitForSupervisorToComplete(supervisor.getId());

// Verify row count
verifyRowCount(totalRecords);

// Verify supervisor is in COMPLETED state
final SupervisorStatus status = cluster.callApi().getSupervisorStatus(supervisor.getId());
Assertions.assertEquals("COMPLETED", status.getState());
Assertions.assertTrue(status.isHealthy());
}

@Test
public void test_boundedSupervisor_withEmptyRange_completesImmediately()
{
final String topic = IdUtils.getRandomId();
kafkaServer.createTopicWithPartitions(topic, 1);

// Publish some records
publish1kRecords(topic, false);

// Get current offset
Map<String, Long> currentOffsets = kafkaServer.getPartitionOffsets(topic);
Long currentOffset = currentOffsets.get("0");

// Create bounded config with start == end (empty range)
Map<String, Long> offsets = new HashMap<>();
offsets.put("0", currentOffset);

BoundedStreamConfig boundedConfig = new BoundedStreamConfig(offsets, offsets);

// Create bounded supervisor
final KafkaSupervisorSpec supervisor = createBoundedKafkaSupervisor(
kafkaServer,
topic,
boundedConfig
);

cluster.callApi().postSupervisor(supervisor);

// Wait for supervisor to transition to COMPLETED state
waitForSupervisorToComplete(supervisor.getId());

// Verify supervisor is in COMPLETED state
final SupervisorStatus status = cluster.callApi().getSupervisorStatus(supervisor.getId());
Assertions.assertEquals("COMPLETED", status.getState());
}

@Test
public void test_boundedSupervisor_withReversedRange_isUnhealthy()
{
final String topic = IdUtils.getRandomId();
kafkaServer.createTopicWithPartitions(topic, 1);

// start > end — invalid range, KafkaIndexTaskIOConfig rejects it when a task is created.
BoundedStreamConfig boundedConfig = new BoundedStreamConfig(Map.of("0", 500L), Map.of("0", 100L));
final KafkaSupervisorSpec supervisor = createBoundedKafkaSupervisor(kafkaServer, topic, boundedConfig);

cluster.callApi().postSupervisor(supervisor);
waitForSupervisorToBeUnhealthy(supervisor.getId());

final SupervisorStatus status = cluster.callApi().getSupervisorStatus(supervisor.getId());
Assertions.assertFalse(status.isHealthy());
Assertions.assertEquals("UNHEALTHY_SUPERVISOR", status.getState());
}

private KafkaSupervisorSpec createBoundedKafkaSupervisor(
KafkaResource kafkaServer,
String topic,
BoundedStreamConfig boundedConfig
)
{
return createKafkaSupervisor(kafkaServer)
.withIoConfig(io -> io
.withKafkaInputFormat(new JsonInputFormat(null, null, null, null, null))
.withBoundedStreamConfig(boundedConfig)
)
.build(dataSource, topic);
}

@Test
public void test_boundedSupervisor_withMismatchedMetadata_is_unhealthy()
{
final String topic = IdUtils.getRandomId();
kafkaServer.createTopicWithPartitions(topic, 2);
publish1kRecords(topic, false);

// Get the current end offsets for all partitions
Map<String, Long> currentOffsets = kafkaServer.getPartitionOffsets(topic);
Assertions.assertEquals(2, currentOffsets.size(), "Should have 2 partitions");

// Create first bounded config - ingest only the first 100 records from each partition
Map<String, Long> startOffsets1 = new HashMap<>();
startOffsets1.put("0", 0L);
startOffsets1.put("1", 0L);

Map<String, Long> endOffsets1 = new HashMap<>();
endOffsets1.put("0", 100L);
endOffsets1.put("1", 100L);

BoundedStreamConfig boundedConfig1 = new BoundedStreamConfig(startOffsets1, endOffsets1);

// Create first bounded supervisor and run it to completion
final KafkaSupervisorSpec supervisor1 = createBoundedKafkaSupervisor(
kafkaServer,
topic,
boundedConfig1
);

cluster.callApi().postSupervisor(supervisor1);

// Wait for records to be ingested (approximately 200 records total from both partitions)
waitUntilPublishedRecordsAreIngested(200);

// Wait for supervisor to transition to COMPLETED state
waitForSupervisorToComplete(supervisor1.getId());

// Verify supervisor is in COMPLETED state
final SupervisorStatus status1 = cluster.callApi().getSupervisorStatus(supervisor1.getId());
Assertions.assertEquals("COMPLETED", status1.getState());

// Now try to create a second bounded supervisor with different bounded config on the same datasource
Map<String, Long> startOffsets2 = new HashMap<>();
startOffsets2.put("0", 50L); // Different start offset
startOffsets2.put("1", 50L);

Map<String, Long> endOffsets2 = new HashMap<>();
endOffsets2.put("0", 200L); // Different end offset
endOffsets2.put("1", 200L);

BoundedStreamConfig boundedConfig2 = new BoundedStreamConfig(startOffsets2, endOffsets2);

final KafkaSupervisorSpec supervisor2 = createBoundedKafkaSupervisor(
kafkaServer,
topic,
boundedConfig2
);

// Post the second supervisor (it should use the same supervisor ID/datasource)
cluster.callApi().postSupervisor(supervisor2);

// Wait for the supervisor to process and detect the metadata mismatch
waitForSupervisorToBeUnhealthy(supervisor2.getId());

// Verify the supervisor is unhealthy
final SupervisorStatus status2 = cluster.callApi().getSupervisorStatus(supervisor2.getId());
Assertions.assertFalse(status2.isHealthy(), "Supervisor should be unhealthy after detecting metadata mismatch");
Assertions.assertEquals("UNHEALTHY_SUPERVISOR", status2.getState(), "Supervisor state should be UNHEALTHY_SUPERVISOR");
Comment on lines +235 to +240
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps also validate that the records ingested by this bounded supervisor is 0 and/or the tasks spun up by this supervisor actually failed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of metadata mismatch the tasks are never actually spun up. An exception is thrown before tasks are ever created.

}

/**
* A new bounded run whose endOffset is less than the offset committed by a prior
* run must not silently reach COMPLETED.
*/
Comment thread
aho135 marked this conversation as resolved.
@Test
public void test_boundedSupervisor_doesNotSilentlyCompleteWhenStaleOffsetExceedsNewEnd()
{
final String topic = IdUtils.getRandomId();
kafkaServer.createTopicWithPartitions(topic, 2);
publish1kRecords(topic, false);

// Run 1: ingest up to offset 100 on each partition and complete.
Map<String, Long> startOffsets1 = new HashMap<>();
startOffsets1.put("0", 0L);
startOffsets1.put("1", 0L);

Map<String, Long> endOffsets1 = new HashMap<>();
endOffsets1.put("0", 100L);
endOffsets1.put("1", 150L);

BoundedStreamConfig boundedConfig1 = new BoundedStreamConfig(startOffsets1, endOffsets1);
final KafkaSupervisorSpec supervisor1 = createBoundedKafkaSupervisor(kafkaServer, topic, boundedConfig1);

cluster.callApi().postSupervisor(supervisor1);
waitUntilPublishedRecordsAreIngested(250);
waitForSupervisorToComplete(supervisor1.getId());

final SupervisorStatus status1 = cluster.callApi().getSupervisorStatus(supervisor1.getId());
Assertions.assertEquals("COMPLETED", status1.getState());

// Run 2: same datasource, endOffset (50) < stale committed offset (100).
// Without the fix the supervisor reaches COMPLETED immediately without running tasks.
// With the fix it detects the config mismatch and becomes UNHEALTHY_SUPERVISOR.
Map<String, Long> startOffsets2 = new HashMap<>();
startOffsets2.put("0", 0L);
startOffsets2.put("1", 0L);

Map<String, Long> endOffsets2 = new HashMap<>();
endOffsets2.put("0", 50L);
endOffsets2.put("1", 50L);

BoundedStreamConfig boundedConfig2 = new BoundedStreamConfig(startOffsets2, endOffsets2);
final KafkaSupervisorSpec supervisor2 = createBoundedKafkaSupervisor(kafkaServer, topic, boundedConfig2);

cluster.callApi().postSupervisor(supervisor2);
waitForSupervisorToBeUnhealthy(supervisor2.getId());

final SupervisorStatus status2 = cluster.callApi().getSupervisorStatus(supervisor2.getId());
Assertions.assertFalse(status2.isHealthy(), "Supervisor should be unhealthy after detecting metadata mismatch");
Comment on lines +287 to +291
Copy link
Copy Markdown
Contributor

@abhishekrb19 abhishekrb19 May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think validating a third supervisor with a different ID but the same bounds would be worth adding that succeeds? (either in this test or the following one)

Assertions.assertEquals("UNHEALTHY_SUPERVISOR", status2.getState(), "Supervisor state should be UNHEALTHY_SUPERVISOR");
}

private void waitForSupervisorToComplete(String supervisorId)
{
overlord.latchableEmitter().waitForEvent(
event -> event.hasMetricName("supervisor/count")
.hasDimension(DruidMetrics.SUPERVISOR_ID, supervisorId)
.hasDimension("state", "COMPLETED")
);
}

private void waitForSupervisorToBeUnhealthy(String supervisorId)
{
overlord.latchableEmitter().waitForEvent(
event -> event.hasMetricName("supervisor/count")
.hasDimension(DruidMetrics.SUPERVISOR_ID, supervisorId)
.hasDimension("state", "UNHEALTHY_SUPERVISOR")
);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@ protected KinesisSupervisorSpec createKinesisSupervisor(KinesisResource kinesis,
Period.seconds(60),
null, null, null, null, null, null, null, null,
false,
null,
null
),
Map.of(),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ private KinesisSupervisorSpec createKinesisSupervisorSpec(String dataSource, Str
Period.seconds(5),
null, null, null, null, null, null, null, null,
false,
null,
null
),
Map.of(),
Expand Down
Loading
Loading