Skip to content

Introduce new behavior for run completion proposals#4737

Merged
slinkydeveloper merged 7 commits into
restatedev:mainfrom
slinkydeveloper:issues/ack-completion-proposal
May 19, 2026
Merged

Introduce new behavior for run completion proposals#4737
slinkydeveloper merged 7 commits into
restatedev:mainfrom
slinkydeveloper:issues/ack-completion-proposal

Conversation

@slinkydeveloper
Copy link
Copy Markdown
Contributor

in protocol v7 we now send back an ad-hoc message indicating an ack for the completion proposal, instead than sending back the full completion.

Fix #4440

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@slinkydeveloper slinkydeveloper force-pushed the issues/ack-completion-proposal branch 3 times, most recently from 5de8f93 to 131a240 Compare May 14, 2026 11:16
@slinkydeveloper
Copy link
Copy Markdown
Contributor Author

Tested against this restatedev/sdk-typescript#719 and this restatedev/sdk-shared-core#79

@slinkydeveloper slinkydeveloper force-pushed the issues/ack-completion-proposal branch from 131a240 to ab28369 Compare May 18, 2026 11:57
…e now send back an ad-hoc message indicating an ack for the completion proposal, instead than sending back the full completion.
@slinkydeveloper slinkydeveloper force-pushed the issues/ack-completion-proposal branch from ab28369 to 832f2ca Compare May 18, 2026 14:01
Copy link
Copy Markdown
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this PR @slinkydeveloper. The changes look good to me. I had a few questions about some assumptions you are making (e.g. that the only notification an SDK proposes are run completion notifications). It would be great to codify these assumption in a way that things break if they are changed. It would also be great if you could expand a little bit more on why we are doing this change and why it is ok to do so (what is the SDK doing).

Comment thread crates/invoker-impl/src/invocation_state_machine.rs Outdated
Comment thread crates/invoker-impl/src/invocation_state_machine.rs
Comment thread crates/invoker-impl/src/invocation_state_machine.rs Outdated
Comment thread service-protocol/dev/restate/service/protocol.proto
Comment thread crates/invoker-impl/src/invocation_state_machine.rs Outdated
NotificationId::CompletionId(c)
if run_completion_proposals_to_ack.remove(c) =>
{
Notification::ProposeRunCompletionAck(*c)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the SDK needed to drop the run completion value in the mean time (e.g. it was evicted from a size bound cache)? Would it fail with a transient error so that a replay fixes the problem?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the SDK needed to drop the run completion value in the mean time (e.g. it was evicted from a size bound cache)? Would it fail with a transient error so that a replay fixes the problem?

The SDK never drops it. I guess worst case the sdk OOMs and a retry happens

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow in the protocol for this to happen? Otherwise, I could see an endpoint quite easily ooming compared to before when we have a lot of concurrent ctx.run steps.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow in the protocol for this to happen?

The SDK can decide to drop only when proposing the completion obviously, not later (otherwise needs another message, more sync runtime <-> sdk and co).

I guess this for you boils down to having a field requesting the ack or the whole completion in the proposal message?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be as easy as adding this option to the example SDK implementation or in some form of description.

With the behavior you've added right now, we can fill up the cache and stop caching entries once the cache is full. Instead of evicting entries we don't store new ones. Entries are only evicted once we see it's ack message. I think this can have better runtime properties than evicting the oldest entries and having to fail an invocation whose completion was dropped.

@tillrohrmann
Copy link
Copy Markdown
Contributor

Why is https://github.com/restatedev/restate/actions/runs/26038383767/job/76551467738?pr=4737#step:17:358 failing? Probably not related but something to follow up on.

@slinkydeveloper
Copy link
Copy Markdown
Contributor Author

slinkydeveloper commented May 19, 2026

Why is https://github.com/restatedev/restate/actions/runs/26038383767/job/76551467738?pr=4737#step:17:358 failing? Probably not related but something to follow up on.

I'm aware of this we're fixing it, and yes it's unrelated

@slinkydeveloper slinkydeveloper added the release-blocker Blocker for the next release label May 19, 2026
@slinkydeveloper
Copy link
Copy Markdown
Contributor Author

I think several comments of this PR are around the fact that completion proposal and completion proposal ack could be used "generically".

This is not the case, this system is something "ad hoc" for ctx.run and nothing else, and i think it's made clear enough in the service protocol contract. Is there any other safeguard you think we should put in place @tillrohrmann ?

@tillrohrmann
Copy link
Copy Markdown
Contributor

One idea could be to assert the notification type.

Copy link
Copy Markdown
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the completion type assertion and expanding the explanation of the new messages @slinkydeveloper. LGTM. +1 for merging :-)

NotificationId::CompletionId(c)
if run_completion_proposals_to_ack.remove(c) =>
{
Notification::ProposeRunCompletionAck(*c)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow in the protocol for this to happen? Otherwise, I could see an endpoint quite easily ooming compared to before when we have a lot of concurrent ctx.run steps.

Comment thread service-protocol/dev/restate/service/protocol.proto
Copy link
Copy Markdown
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of making the protocol flexible in the sense that the endpoint can control the behavior. I am just not sure whether we aren't shoe horning the allows_ack flag for that as a requires_ack == false means something specific for the ProposeRunCompletionMessage. We should probably document this special behavior.

Comment thread crates/invoker-impl/src/invocation_task/mod.rs Outdated
Copy link
Copy Markdown
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bearing with me @slinkydeveloper. The changes look really nice. +1 for merging :-)

@slinkydeveloper slinkydeveloper force-pushed the issues/ack-completion-proposal branch from 8da2912 to 4ca572b Compare May 19, 2026 15:22
@slinkydeveloper slinkydeveloper merged commit c76a018 into restatedev:main May 19, 2026
6 checks passed
@slinkydeveloper slinkydeveloper deleted the issues/ack-completion-proposal branch May 19, 2026 15:23
@github-actions github-actions Bot locked and limited conversation to collaborators May 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

release-blocker Blocker for the next release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize RunCompletionNotification: avoid echoing full payload back to SDK

2 participants