-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Summary
Currently, spark-web-proxy relies on the Spark History Server's "incomplete applications" feature to display running Spark applications. This approach has significant limitations when using S3-compatible object storage.
Problem Description
Current Behavior
The proxy successfully detects running Spark applications via the Kubernetes API (visible in logs):
The application 'spark-xxx' was updated: Running at [http://10.233.x.x:4040]
However, these applications do not appear in the UI because the History Server cannot read in-progress event logs from S3.
Root Cause
- S3 doesn't support partial file writes: Event log files (
.inprogress) remain at 0 bytes until the job completes or the 10MB rolling threshold is reached - Spark enforces a minimum 10MB rolling size:
spark.eventLog.rolling.maxFileSizecannot be set below 10MB - Small/short jobs never appear: Applications that don't generate 10MB of event logs are invisible until completion
Environment Details
- Spark version: 3.5.6
- Storage: S3-compatible (MinIO)
- Event log format: eventlog_v2 with rolling enabled
- History Server: Configured with
spark.history.fs.inProgressOptimization.enabled=true
Tested Configurations (all failed)
- ✅ Enabled
inProgressOptimization - ✅ Set
spark.eventLog.rolling.enabled=true - ✅ Tried reducing rolling size (blocked by 10MB minimum)
- ✅ Tried eventlog_v1 format
- ✅ Aligned Spark versions (History Server 3.5.6 = Applications 3.5.6)
- ❌ None of these solve the S3 partial-write limitation
Thank you for this great project! 🙏
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels