optimistic sig share creation for managed keys#7831
Conversation
| return false | ||
| } | ||
|
|
||
| func (sr *subroundSignature) createSignaturesForManagedKeys(ctx context.Context) bool { |
There was a problem hiding this comment.
this pre-checks here are very similar with doSignatureJobForManagedKeys but i would keep it like this for simplicity
| sr.GetHeader().GetEpoch(), | ||
| pkBytes, | ||
| ) | ||
| signatureShare, err := sr.SigningHandler().SignatureShare(uint16(idx)) |
There was a problem hiding this comment.
in each start subround, there is a Reset for all consensus state, including SigningHandler, there should't be any index collision with the signatures from the previous round; the wait time is related to the proof arrival, not with the signatures
| } | ||
| } | ||
|
|
||
| // Wait once for the entire node if competing block detected |
There was a problem hiding this comment.
shouldn't L125-128 be inside the previous if? they basically do similar loops as waitForCompetingBlockEarlyChecks
There was a problem hiding this comment.
here there is also the check for single key, will check how to integrate them better
There was a problem hiding this comment.
refactored to include them in the same condition branch
| sr.signatureThrottler.StartProcessing() | ||
| wg.Add(1) | ||
|
|
||
| go func(ctx context.Context, idx int, pk string) { |
There was a problem hiding this comment.
this starts a goroutine, but does not handle the context, so goroutines may be hanging if the process stops
There was a problem hiding this comment.
added context check in goroutine
| if err != nil { | ||
| log.Debug("sendSignatureForManagedKey.CreateSignatureShareForPublicKey", "error", err.Error()) | ||
| return false | ||
| // signature share not found (optimistic signature share creation was not triggered) |
There was a problem hiding this comment.
what if failure reason was different?
There was a problem hiding this comment.
i think we should try to create the signature share (if not already created) whatever the reason; before we tried to create it directly each time, now there is additional check
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## feat/testnet-fixes #7831 +/- ##
====================================================
Coverage 77.55% 77.56%
====================================================
Files 884 885 +1
Lines 125242 125400 +158
====================================================
+ Hits 97132 97261 +129
- Misses 21655 21677 +22
- Partials 6455 6462 +7 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR introduces optimistic (early) creation of BLS signature shares for managed keys in SPoS v2, aiming to reduce latency when competing blocks force validators to delay sending signatures into the next round/subround. It also threads context.Context into signature-share creation to allow timeout/cancel behavior.
Changes:
- Add
context.ContexttoSigningHandler.CreateSignatureShareForPublicKeyand propagate it through production code, simulator code, and tests. - Add optimistic signature-share creation triggered during the Block subround (on header send/receive), plus synchronization/cancel hooks via
ConsensusState. - Update SPoS v2 Signature subround to wait for optimistic signature creation and reuse already-created shares when available.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| testscommon/consensus/signingHandlerStub.go | Update stub to accept context.Context in signature-share creation. |
| testscommon/consensus/consensusStateMock.go | Extend mock with optimistic-signature waitgroup/cancel hooks. |
| node/chainSimulator/process/processor.go | Adapt simulator to pass a context to signature-share creation. |
| node/chainSimulator/process/processor_test.go | Update simulator unit test stub to accept context. |
| factory/crypto/signingHandler.go | Add context-aware signature-share creation with timeout checks. |
| factory/crypto/signingHandler_test.go | Update signing-handler tests to pass context. |
| factory/crypto/errors.go | Add timeout error for signature handling. |
| consensus/spos/worker.go | Cancel optimistic signature context on Extend(). |
| consensus/spos/interface.go | Extend ConsensusStateHandler with waitgroup/cancel API for optimistic signatures. |
| consensus/spos/consensusState.go | Add optimistic-signature waitgroup and cancel-func storage. |
| consensus/spos/bls/v2/subroundSignatureCompetingBlock_test.go | Update stub signature-share creation callback signature. |
| consensus/spos/bls/v2/subroundSignature.go | Wait for optimistic signature creation; reuse pre-created shares; refactor throttler check. |
| consensus/spos/bls/v2/subroundSignature_test.go | Update tests and add coverage for “reuse existing sig share” behavior. |
| consensus/spos/bls/v2/subroundBlock.go | Trigger optimistic signature-share creation on header send/receive; require throttler in constructor. |
| consensus/spos/bls/v2/subroundBlock_test.go | Update constructor call sites and add tests for optimistic signature triggering. |
| consensus/spos/bls/v2/export_test.go | Update exported test helpers for new ctx-aware APIs. |
| consensus/spos/bls/v2/common.go | Introduce shared throttler-wait helper. |
| consensus/spos/bls/v2/blsSubroundsFactory.go | Pass signature throttler into v2 Block subround creation. |
| consensus/spos/bls/v2/benchmark_verify_signatures_test.go | Update benchmark to pass context. |
| consensus/spos/bls/v2/benchmark_send_proof_test.go | Update benchmark to pass context. |
| consensus/spos/bls/v1/subroundSignature.go | Thread context into v1 signature-share creation as well. |
| consensus/spos/bls/v1/subroundSignature_test.go | Update v1 tests for context-aware signature-share creation. |
| consensus/interface.go | Update SigningHandler interface signature for context-aware signature-share creation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if message == nil { | ||
| return nil, ErrNilMessage | ||
| } | ||
|
|
||
| select { | ||
| case <-ctx.Done(): | ||
| return nil, ErrTimeIsOut | ||
| default: | ||
| } |
| func (sr *subroundSignature) waitForSingatures() { | ||
| done := make(chan struct{}) | ||
| go func() { | ||
| sr.SignaturesWaitGroup().Wait() | ||
| close(done) | ||
| }() | ||
|
|
||
| timeLeft := sr.RoundHandler().RemainingTime(sr.RoundHandler().TimeStamp(), time.Duration(sr.EndTime())) | ||
|
|
||
| select { | ||
| case <-done: | ||
| sr.SignaturesCtxCancel() | ||
| return | ||
| case <-time.After(timeLeft): | ||
| log.Debug("timeout while waiting for signatures to be created") | ||
| return | ||
| } | ||
| } | ||
|
|
||
| func (sr *subroundSignature) doSignatureJobForManagedKeys(ctx context.Context) bool { | ||
| // wait for optimistic signatures creation to finish | ||
| sr.waitForSingatures() |
| sigCtx, cancel := context.WithTimeout(ctx, timeLeft) | ||
| sr.SetSignaturesCtxCancelFunc(cancel) | ||
|
|
||
| for idx, pk := range sr.ConsensusGroup() { | ||
| pkBytes := []byte(pk) | ||
| if !sr.IsKeyManagedBySelf(pkBytes) { | ||
| continue | ||
| } | ||
|
|
||
| err := checkGoRoutinesThrottler(ctx, sr.signatureThrottler) | ||
| if err != nil { | ||
| log.Debug("triggerCreateSignaturesForManagedKeys.checkGoRoutinesThrottler", "err", err) | ||
| return false | ||
| } | ||
| sr.signatureThrottler.StartProcessing() | ||
| sr.SignaturesWaitGroup().Add(1) | ||
|
|
| sr.triggerCreateSignaturesForManagedKeys(context.Background()) | ||
|
|
||
| ctx, cancel := context.WithTimeout(context.Background(), sr.RoundHandler().TimeDuration()) | ||
| defer cancel() | ||
|
|
There was a problem hiding this comment.
the other context is for process block and then it's cancelled, i think it's better to rely here only on the internal sig timeout context
| func (cns *ConsensusState) SignaturesWaitGroup() *sync.WaitGroup { | ||
| return cns.signaturesWaitGroup | ||
| } |
There was a problem hiding this comment.
this was done on another PR
| func (cns *ConsensusState) SetSignaturesCtxCancelFunc(cancelFunc context.CancelFunc) { | ||
| cns.mutState.Lock() | ||
| defer cns.mutState.Unlock() | ||
|
|
||
| cns.signaturesTimeoutCtxCancel = cancelFunc | ||
| } |
| select { | ||
| case <-ctx.Done(): | ||
| return nil, ErrTimeIsOut | ||
| default: | ||
| } | ||
|
|
||
| sh.data.sigShares[index] = sigShareBytes |
| return | ||
| } | ||
| sr.signatureThrottler.StartProcessing() | ||
| sr.SignaturesWaitGroup().Add(1) |
There was a problem hiding this comment.
you can read the wg before the loop, and then reuse that one (defensive in case there is a reset of the subround mid process, e.g if we promote the trigger to run on a goroutine or if the timeout crosses the round boundary)
same for the header hash and epoch and header nil check (if the round boundary is crossed, these may be different if read in the routine).
| continue | ||
| } | ||
|
|
||
| err := checkGoRoutinesThrottler(sigCtx, sr.signatureThrottler) |
There was a problem hiding this comment.
this may be blocking here when number of keys in consensus managed by the node is > than throttler allowed routines.
the check will wait for the delta of routines to finish before releasing the access for the validation.
There was a problem hiding this comment.
triggered on a goroutine
| // will try to create it | ||
| log.Debug("sendSignatureForManagedKey.SignatureShare: sig not already created, will try to create it", "error", err) | ||
|
|
||
| signatureShare, err = sr.SigningHandler().CreateSignatureShareForPublicKey( |
There was a problem hiding this comment.
here we create the sig share if it is missing, but do we still have time to create it?
| } | ||
|
|
||
| sh.data.sigShares[index] = sigShareBytes | ||
| // check again before setting signatures shares data |
There was a problem hiding this comment.
here we already have the signature, bypassing its storage will not save a lot of time, maybe store it.
| @@ -360,12 +360,21 @@ func (sr *subroundBlock) sendBlockHeader( | |||
| } | |||
|
|
|||
| func (sr *subroundBlock) triggerCreateSignaturesForManagedKeys(ctx context.Context) { | |||
There was a problem hiding this comment.
since we now trigger this on a goroutine, maybe give the headerHash and headerHandler as argument, so that it is not read inside the goroutine.
| currentHash := sr.GetData() | ||
| currentEpoch := sr.GetHeader().GetEpoch() | ||
|
|
||
| sigSubroundEndTime := time.Duration(float64(sr.RoundHandler().TimeDuration()) * srSignatureEndTime) |
There was a problem hiding this comment.
also if triggered on a goroutine, maybe check that the round on the header is still the round in the RoundHandler, otherwise exit without triggering signing.
| log.Debug("triggerCreateSignaturesForManagedKeys: context done", "timeLeft", timeLeft) | ||
| go func() { | ||
| for _, pk := range keys { | ||
| log.Info("aaaa") |
| select { | ||
| case <-sigCtx.Done(): | ||
| log.Debug("triggerCreateSignaturesForManagedKeys: context done", "timeLeft", timeLeft) | ||
| go func() { |
There was a problem hiding this comment.
yes, that was the intention, to trigger asynchronously, due to goroutines throttler which might block on limit reached
| return | ||
| default: | ||
| } | ||
| log.Info("bbb") |
| defer sr.signatureThrottler.EndProcessing() | ||
| defer wg.Done() | ||
|
|
||
| log.Info("ccc") |
|
|
||
| pkBytes := []byte(pk) | ||
| currentHash := sr.GetData() | ||
| log.Debug("step 1: multi keys signatures creation has been triggered", "num", len(keys)) |
There was a problem hiding this comment.
perhaps call cancel()?
There was a problem hiding this comment.
or call it on defer after creation
There was a problem hiding this comment.
the context was set to be cancelled on signature subround, after waiting with timeout, here we are just triggering the creation
| default: | ||
| } | ||
|
|
||
| err := checkGoRoutinesThrottler(ctx, sr.signatureThrottler) |
Reasoning behind the pull request
Testing procedure
Pre-requisites
Based on the Contributing Guidelines the PR author and the reviewers must check the following requirements are met:
featbranch created?featbranch merging, do all satellite projects have a proper tag insidego.mod?