Skip to content

Summary query data pannel freeze when the number of queries count is high #44

@MonkeyCanCode

Description

@MonkeyCanCode

Describe the bug
When a spark cluster has high number of queries ran (e.g. 800), the summary page for the query data panel is freezing all the time. Looking at the network tab, this appear to be data fetching from http://localhost:4040/api/v1/applications/local-1767169191734/stages?withSummaries=false&quantiles=0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0 which can take couple seconds (but this is async, so it is okay). However, the data processing with the large json payload from this API call via calculateStagesStore seems to be causing issues (it can take 4-5 seconds which then locked up main thread it seems).

Environemnt
spark verison: 4.x
platform: standalone (local spark connect)

To Reproduce

  1. Start local spark connect
sbin/start-connect-server.sh
  1. Connect via a local client and run sample query for 800 times (as the fetch is up to 1000 queries, so this is a valid use case)
from pyspark.sql import SparkSession
from pyspark.sql.functions import avg

spark = SparkSession.builder.remote("sc://localhost/").getOrCreate()

df = spark.createDataFrame(
    [
        ("sue", 32),
        ("li", 3),
        ("bob", 75),
        ("heo", 13),
    ],
    ["first_name", "age"],
)

for i in range(800):
    df.select(avg("age")).show()
  1. Open http://localhost:4040/dataflint/#/summary and scroll on the queries data panel and it should freeze every couple seconds

Expected behavior
It should not freeze during scrolling on the data panel.

Screenshots
N/A

Additional context
N/A

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions