-
Notifications
You must be signed in to change notification settings - Fork 50
Description
Describe the bug
When a spark cluster has high number of queries ran (e.g. 800), the summary page for the query data panel is freezing all the time. Looking at the network tab, this appear to be data fetching from http://localhost:4040/api/v1/applications/local-1767169191734/stages?withSummaries=false&quantiles=0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0 which can take couple seconds (but this is async, so it is okay). However, the data processing with the large json payload from this API call via calculateStagesStore seems to be causing issues (it can take 4-5 seconds which then locked up main thread it seems).
Environemnt
spark verison: 4.x
platform: standalone (local spark connect)
To Reproduce
- Start local spark connect
sbin/start-connect-server.sh- Connect via a local client and run sample query for 800 times (as the fetch is up to 1000 queries, so this is a valid use case)
from pyspark.sql import SparkSession
from pyspark.sql.functions import avg
spark = SparkSession.builder.remote("sc://localhost/").getOrCreate()
df = spark.createDataFrame(
[
("sue", 32),
("li", 3),
("bob", 75),
("heo", 13),
],
["first_name", "age"],
)
for i in range(800):
df.select(avg("age")).show()
- Open
http://localhost:4040/dataflint/#/summaryand scroll on the queries data panel and it should freeze every couple seconds
Expected behavior
It should not freeze during scrolling on the data panel.
Screenshots
N/A
Additional context
N/A