Skip to content

Conversation

@lidezhu
Copy link
Collaborator

@lidezhu lidezhu commented Dec 27, 2025

What problem does this PR solve?

Issue Number: close #xxx

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Dec 27, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Dec 27, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign charlescheung96 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Dec 27, 2025
@gemini-code-assist
Copy link

Summary of Changes

Hello @lidezhu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant architectural change to the log puller's event delivery mechanism by replacing the existing dynstream system with a new, custom-built Span Pipeline. The primary goal is to enhance performance and provide more precise control over data flow and resolved-timestamp advancement. The new design decouples region event processing into a sharded regionEventProcessor and then funnels these events into the Span Pipeline, which manages per-subscription state, ensures strict ordering of resolved-ts signals relative to data persistence, and implements a robust backpressure mechanism. This refactoring aims to improve the efficiency and reliability of change data capture.

Highlights

  • New Span Pipeline Architecture: Replaced the dynstream mechanism with a custom Span Pipeline for improved event delivery and resolved-ts handling in the log puller.
  • Enhanced Concurrency and Pipelining: The new design enables true pipelining of data (receive -> compress/group -> persist) within a subscription span, overcoming previous serialization bottlenecks.
  • Strict Resolved-TS Barrier Semantics: Implemented a robust mechanism to ensure that resolved-ts signals are advanced only after all preceding data batches for a given span have been persisted, even with out-of-order persist callbacks.
  • Decoupled Event Processing: Introduced a regionEventProcessor to concurrently handle region-level events, shard them by regionID, and then feed them into the Span Pipeline.
  • Improved Backpressure and Memory Control: Replaced dynstream's pause/resume with a semaphore-based quota system in the Span Pipeline to manage in-flight data and prevent memory blow-ups.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-executed refactoring by replacing the dynstream-based event delivery in the log puller with a new, custom Span Pipeline. The new implementation, detailed in the excellent design document, aims to improve performance and throughput by removing a serialization bottleneck. The code is clean, well-structured, and closely follows the design. I've identified one potential correctness issue that could lead to non-monotonic resolved timestamps and a suggestion to improve configurability. Overall, this is a high-quality contribution.

Comment on lines 390 to 394
// If the previous barrier has the same ts, the new one is redundant because it
// would advance to the same resolved-ts but requires waiting for more data.
if len(s.pendingResolved) != 0 && s.pendingResolved[len(s.pendingResolved)-1].ts == ts {
return
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current check ts == s.pendingResolved[len(s.pendingResolved)-1].ts is insufficient to prevent non-monotonic resolved timestamps. If a new resolved timestamp arrives that is smaller than the last pending one (but larger than the last flushed one), it will be added to the queue. This can lead to advanceResolvedTs being called with a decreasing timestamp, which might violate the assumptions of downstream components.

To ensure strict monotonicity of pending resolved timestamps, the check should be changed to ts <= .... This will correctly discard any new resolved timestamp that is not a strict improvement over the last pending one.

Suggested change
// If the previous barrier has the same ts, the new one is redundant because it
// would advance to the same resolved-ts but requires waiting for more data.
if len(s.pendingResolved) != 0 && s.pendingResolved[len(s.pendingResolved)-1].ts == ts {
return
}
// If the new resolved ts is not greater than the last pending one,
// it's redundant. Advancing to a smaller or equal TS is not useful.
if len(s.pendingResolved) != 0 && ts <= s.pendingResolved[len(s.pendingResolved)-1].ts {
return
}

Comment on lines +247 to +252
subClient.pipeline = newSpanPipelineManager(
subClient.ctx,
runtime.GOMAXPROCS(0),
4096,
1<<30, // 1GiB, same order of magnitude as previous dynstream pending size.
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The parameters for newSpanPipelineManager are hardcoded. For better maintainability and to allow for performance tuning, these values should be configurable. I recommend adding corresponding fields to SubscriptionClientConfig and using them here, similar to how regionEventProcessor is configured.

For example, you could add these fields to SubscriptionClientConfig:

// PipelineWorkerCount is the number of workers for the span pipeline.
// If 0, defaults to runtime.GOMAXPROCS(0).
PipelineWorkerCount int

// PipelineQueueSize is the per-worker channel size for the span pipeline.
// If 0, defaults to 4096.
PipelineQueueSize int

// PipelineQuotaBytes is the total memory quota for in-flight data in the pipeline.
// If 0, defaults to 1GiB.
PipelineQuotaBytes int64

And then initialize the manager like this:

	pipelineWorkerCount := config.PipelineWorkerCount
	if pipelineWorkerCount == 0 {
		pipelineWorkerCount = runtime.GOMAXPROCS(0)
	}
	pipelineQueueSize := config.PipelineQueueSize
	if pipelineQueueSize == 0 {
		pipelineQueueSize = 4096
	}
	pipelineQuotaBytes := config.PipelineQuotaBytes
	if pipelineQuotaBytes == 0 {
		pipelineQuotaBytes = 1 << 30 // 1GiB
	}
	subClient.pipeline = newSpanPipelineManager(
		subClient.ctx,
		pipelineWorkerCount,
		pipelineQueueSize,
		pipelineQuotaBytes,
	)

@ti-chi-bot
Copy link

ti-chi-bot bot commented Dec 28, 2025

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

📖 For more info, you can check the "Contribute Code" section in the development guide.

@ti-chi-bot
Copy link

ti-chi-bot bot commented Dec 28, 2025

@lidezhu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-error-log-review 2a94fc3 link true /test pull-error-log-review
pull-check 2a94fc3 link true /test pull-check

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants