Introduce new behavior for run completion proposals#4737
Conversation
5de8f93 to
131a240
Compare
|
Tested against this restatedev/sdk-typescript#719 and this restatedev/sdk-shared-core#79 |
131a240 to
ab28369
Compare
…e now send back an ad-hoc message indicating an ack for the completion proposal, instead than sending back the full completion.
…2e repo detects them
ab28369 to
832f2ca
Compare
tillrohrmann
left a comment
There was a problem hiding this comment.
Thanks for creating this PR @slinkydeveloper. The changes look good to me. I had a few questions about some assumptions you are making (e.g. that the only notification an SDK proposes are run completion notifications). It would be great to codify these assumption in a way that things break if they are changed. It would also be great if you could expand a little bit more on why we are doing this change and why it is ok to do so (what is the SDK doing).
| NotificationId::CompletionId(c) | ||
| if run_completion_proposals_to_ack.remove(c) => | ||
| { | ||
| Notification::ProposeRunCompletionAck(*c) |
There was a problem hiding this comment.
What happens if the SDK needed to drop the run completion value in the mean time (e.g. it was evicted from a size bound cache)? Would it fail with a transient error so that a replay fixes the problem?
There was a problem hiding this comment.
What happens if the SDK needed to drop the run completion value in the mean time (e.g. it was evicted from a size bound cache)? Would it fail with a transient error so that a replay fixes the problem?
The SDK never drops it. I guess worst case the sdk OOMs and a retry happens
There was a problem hiding this comment.
Should we allow in the protocol for this to happen? Otherwise, I could see an endpoint quite easily ooming compared to before when we have a lot of concurrent ctx.run steps.
There was a problem hiding this comment.
Should we allow in the protocol for this to happen?
The SDK can decide to drop only when proposing the completion obviously, not later (otherwise needs another message, more sync runtime <-> sdk and co).
I guess this for you boils down to having a field requesting the ack or the whole completion in the proposal message?
There was a problem hiding this comment.
I think it could be as easy as adding this option to the example SDK implementation or in some form of description.
With the behavior you've added right now, we can fill up the cache and stop caching entries once the cache is full. Instead of evicting entries we don't store new ones. Entries are only evicted once we see it's ack message. I think this can have better runtime properties than evicting the oldest entries and having to fail an invocation whose completion was dropped.
|
Why is https://github.com/restatedev/restate/actions/runs/26038383767/job/76551467738?pr=4737#step:17:358 failing? Probably not related but something to follow up on. |
I'm aware of this we're fixing it, and yes it's unrelated |
|
I think several comments of this PR are around the fact that completion proposal and completion proposal ack could be used "generically". This is not the case, this system is something "ad hoc" for ctx.run and nothing else, and i think it's made clear enough in the service protocol contract. Is there any other safeguard you think we should put in place @tillrohrmann ? |
|
One idea could be to assert the notification type. |
tillrohrmann
left a comment
There was a problem hiding this comment.
Thanks for adding the completion type assertion and expanding the explanation of the new messages @slinkydeveloper. LGTM. +1 for merging :-)
| NotificationId::CompletionId(c) | ||
| if run_completion_proposals_to_ack.remove(c) => | ||
| { | ||
| Notification::ProposeRunCompletionAck(*c) |
There was a problem hiding this comment.
Should we allow in the protocol for this to happen? Otherwise, I could see an endpoint quite easily ooming compared to before when we have a lot of concurrent ctx.run steps.
There was a problem hiding this comment.
I like the idea of making the protocol flexible in the sense that the endpoint can control the behavior. I am just not sure whether we aren't shoe horning the allows_ack flag for that as a requires_ack == false means something specific for the ProposeRunCompletionMessage. We should probably document this special behavior.
tillrohrmann
left a comment
There was a problem hiding this comment.
Thanks for bearing with me @slinkydeveloper. The changes look really nice. +1 for merging :-)
We now do everything inside the invoker.
…ng. Update documentation on the field.
8da2912 to
4ca572b
Compare
in protocol v7 we now send back an ad-hoc message indicating an ack for the completion proposal, instead than sending back the full completion.
Fix #4440