[WIP] #3823

lidezhu · 2025-12-27T15:05:31Z

What problem does this PR solve?

Issue Number: close #xxx

What is changed and how it works?

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

ti-chi-bot · 2025-12-27T15:05:35Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign charlescheung96 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist · 2025-12-27T15:06:00Z

Summary of Changes

Hello @lidezhu, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant architectural change to the log puller's event delivery mechanism by replacing the existing dynstream system with a new, custom-built Span Pipeline. The primary goal is to enhance performance and provide more precise control over data flow and resolved-timestamp advancement. The new design decouples region event processing into a sharded regionEventProcessor and then funnels these events into the Span Pipeline, which manages per-subscription state, ensures strict ordering of resolved-ts signals relative to data persistence, and implements a robust backpressure mechanism. This refactoring aims to improve the efficiency and reliability of change data capture.

Highlights

New Span Pipeline Architecture: Replaced the dynstream mechanism with a custom Span Pipeline for improved event delivery and resolved-ts handling in the log puller.
Enhanced Concurrency and Pipelining: The new design enables true pipelining of data (receive -> compress/group -> persist) within a subscription span, overcoming previous serialization bottlenecks.
Strict Resolved-TS Barrier Semantics: Implemented a robust mechanism to ensure that resolved-ts signals are advanced only after all preceding data batches for a given span have been persisted, even with out-of-order persist callbacks.
Decoupled Event Processing: Introduced a regionEventProcessor to concurrently handle region-level events, shard them by regionID, and then feed them into the Span Pipeline.
Improved Backpressure and Memory Control: Replaced dynstream's pause/resume with a semaphore-based quota system in the Span Pipeline to manage in-flight data and prevent memory blow-ups.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant and well-executed refactoring by replacing the dynstream-based event delivery in the log puller with a new, custom Span Pipeline. The new implementation, detailed in the excellent design document, aims to improve performance and throughput by removing a serialization bottleneck. The code is clean, well-structured, and closely follows the design. I've identified one potential correctness issue that could lead to non-monotonic resolved timestamps and a suggestion to improve configurability. Overall, this is a high-quality contribution.

gemini-code-assist · 2025-12-27T15:08:45Z

logservice/logpuller/span_pipeline.go

+		// If the previous barrier has the same ts, the new one is redundant because it
+		// would advance to the same resolved-ts but requires waiting for more data.
+		if len(s.pendingResolved) != 0 && s.pendingResolved[len(s.pendingResolved)-1].ts == ts {
+			return
+		}


The current check ts == s.pendingResolved[len(s.pendingResolved)-1].ts is insufficient to prevent non-monotonic resolved timestamps. If a new resolved timestamp arrives that is smaller than the last pending one (but larger than the last flushed one), it will be added to the queue. This can lead to advanceResolvedTs being called with a decreasing timestamp, which might violate the assumptions of downstream components.

To ensure strict monotonicity of pending resolved timestamps, the check should be changed to ts <= .... This will correctly discard any new resolved timestamp that is not a strict improvement over the last pending one.

Suggested change

// If the previous barrier has the same ts, the new one is redundant because it

// would advance to the same resolved-ts but requires waiting for more data.

if len(s.pendingResolved) != 0 && s.pendingResolved[len(s.pendingResolved)-1].ts == ts {

return

}

// If the new resolved ts is not greater than the last pending one,

// it's redundant. Advancing to a smaller or equal TS is not useful.

if len(s.pendingResolved) != 0 && ts <= s.pendingResolved[len(s.pendingResolved)-1].ts {

return

}

gemini-code-assist · 2025-12-27T15:08:45Z

logservice/logpuller/subscription_client.go

+	subClient.pipeline = newSpanPipelineManager(
+		subClient.ctx,
+		runtime.GOMAXPROCS(0),
+		4096,
+		1<<30, // 1GiB, same order of magnitude as previous dynstream pending size.
+	)


The parameters for newSpanPipelineManager are hardcoded. For better maintainability and to allow for performance tuning, these values should be configurable. I recommend adding corresponding fields to SubscriptionClientConfig and using them here, similar to how regionEventProcessor is configured.

For example, you could add these fields to SubscriptionClientConfig:

// PipelineWorkerCount is the number of workers for the span pipeline. // If 0, defaults to runtime.GOMAXPROCS(0). PipelineWorkerCount int // PipelineQueueSize is the per-worker channel size for the span pipeline. // If 0, defaults to 4096. PipelineQueueSize int // PipelineQuotaBytes is the total memory quota for in-flight data in the pipeline. // If 0, defaults to 1GiB. PipelineQuotaBytes int64

And then initialize the manager like this:

pipelineWorkerCount := config.PipelineWorkerCount if pipelineWorkerCount == 0 { pipelineWorkerCount = runtime.GOMAXPROCS(0) } pipelineQueueSize := config.PipelineQueueSize if pipelineQueueSize == 0 { pipelineQueueSize = 4096 } pipelineQuotaBytes := config.PipelineQuotaBytes if pipelineQuotaBytes == 0 { pipelineQuotaBytes = 1 << 30 // 1GiB } subClient.pipeline = newSpanPipelineManager( subClient.ctx, pipelineWorkerCount, pipelineQueueSize, pipelineQuotaBytes, )

ti-chi-bot · 2025-12-28T15:12:15Z

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

_{📖 For more info, you can check the "Contribute Code" section in the development guide.}

ti-chi-bot · 2025-12-28T15:14:19Z

@lidezhu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-error-log-review	`2a94fc3`	link	true	`/test pull-error-log-review`
pull-check	`2a94fc3`	link	true	`/test pull-check`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

lidezhu added 6 commits December 26, 2025 14:54

try optimize puller

0191fe4

small fix

edc4c8d

f

e85dec2

f

b482a78

fix

8e4e4a1

introduce pipeline in log puller

05c31f7

ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Dec 27, 2025

ti-chi-bot bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Dec 27, 2025

gemini-code-assist bot reviewed Dec 27, 2025

View reviewed changes

lidezhu added 11 commits December 28, 2025 00:10

increase worker write num

e599545

hack

528dc9b

fix

89dd6e0

avoid write

715dc62

hack

f0c6ddf

check

c04cf4c

f

fe38abd

f

fb2e389

add some log

d671b7f

small fix

70a2043

check

2a94fc3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] #3823

[WIP] #3823

Uh oh!

lidezhu commented Dec 27, 2025

Uh oh!

ti-chi-bot bot commented Dec 27, 2025

Uh oh!

gemini-code-assist bot commented Dec 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 27, 2025

Uh oh!

gemini-code-assist bot Dec 27, 2025

Uh oh!

ti-chi-bot bot commented Dec 28, 2025

Uh oh!

ti-chi-bot bot commented Dec 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[WIP] #3823

Are you sure you want to change the base?

[WIP] #3823

Uh oh!

Conversation

lidezhu commented Dec 27, 2025

What problem does this PR solve?

What is changed and how it works?

Check List

Tests

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Uh oh!

ti-chi-bot bot commented Dec 27, 2025

Uh oh!

gemini-code-assist bot commented Dec 27, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Dec 28, 2025

Uh oh!

ti-chi-bot bot commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ti-chi-bot bot commented Dec 28, 2025 •

edited

Loading