Skip to content
This repository was archived by the owner on Mar 19, 2024. It is now read-only.

Add option to get execution data from AN/Observer#89

Open
koko1123 wants to merge 8 commits into
masterfrom
amlandeep/add-exec-api-option
Open

Add option to get execution data from AN/Observer#89
koko1123 wants to merge 8 commits into
masterfrom
amlandeep/add-exec-api-option

Conversation

@koko1123

@koko1123 koko1123 commented Apr 5, 2023

Copy link
Copy Markdown

Goal of this PR

introduces a second way to get exec data via the exec data API from a trusted node

Fixes #32

@koko1123 koko1123 changed the title Amlandeep/add exec api option Add option to get execution data from AN/Observer Apr 5, 2023
@koko1123 koko1123 marked this pull request as ready for review April 5, 2023 19:59
Comment thread cmd/flow-archive-live/main.go Outdated
pflag.StringVar(&flagSeedAddress, "seed-address", "", "host address of seed node to follow consensus")
pflag.StringVar(&flagSeedKey, "seed-key", "", "hex-encoded public network key of seed node to follow consensus")
pflag.BoolVarP(&flagTracing, "tracing", "t", false, "enable tracing for this instance")
pflag.BoolVarP(&flagDisableGCP, "disable-cloud-streaming", "g", false, "disable streaming exec data from GCP and use the Access Node instead")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather having a separate option to disable GCP, could we just enable it only if bucket is set?

Comment thread service/stream/exec_data_streamer.go Outdated
ctx := context.Background()

// initialize clients
opts := []grpc.DialOption{grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(grpcutils.DefaultMaxMsgSize)),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be configurable. grpcutils.DefaultMaxMsgSize most likely won't be enough for large blocks.

// data is available at the moment.
func (e *ExecDataStreamer) Next() (*uploader.BlockData, error) {
// same implementation as GCPStreamer
go e.poll()

@peterargue peterargue Apr 5, 2023

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could simplify this logic a bit by running a fixed size worker pool that continuously reads from e.queue fills e.buffer up to the configured size. Then there is no need for explicit polling and you can tune the number of worker needed to keep up with indexing

Comment on lines +149 to +152
b, err := e.accessApi.GetBlockByID(e.ctx, br)
if err != nil {
errs = multierror.Append(errs, fmt.Errorf("failed to get block data for blockID (%s): %w", blockID, err))
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we just get this from the local storage?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'd only have the header in local

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only the header? the archive node is running a consensus follower which should be indexing full blocks

Comment on lines +156 to +159
tx, err := e.accessApi.GetTransactionResultsByBlockID(e.ctx, txr)
if err != nil {
errs = multierror.Append(errs, fmt.Errorf("failed to get transaction results for blockID (%s): %w", blockID, err))
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning to serve the transaction result data? if not, maybe we can just stop indexing it?

If we want to keep it, this approach is fine for now, but we'll have to handle the case where the response size is too large. In that case, we need to get the total tx count (from the list of collections in the exec data), and call GetTransactionResultByIndex for each index to fetch the results individually.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we now serve transactions from the access API implemented in Archive, but yes, I want to remove this call entirely, left as a todo.

streamer = cloud.NewGCPStreamer(log, bucket,
cloud.WithCatchupBlocks(blockIDs),
)
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to print log about which streamer end up being used

}()
bucket := client.Bucket(flagBucket)
streamer = cloud.NewGCPStreamer(log, bucket,
cloud.WithCatchupBlocks(blockIDs),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid the number of blockIDs might be too big especially the indexing speed is far behind.

Could we at least log the total number of blockIDs here?

Comment thread service/stream/exec_data_streamer.go Outdated
return fmt.Errorf("could not pull execution record (name: %s): %w", blockID, err)
}

e.log.Debug().

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log is worth to be INFO level

Suggested change
e.log.Debug().
e.log.Info().

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create new Polling function to poll AN GetExecutionDataByBlockID API instead of gcp streamer

3 participants