Skip to content

Feature Request: Direct Spark UI listing without relying on History Server for incomplete applications #11

@mbi-flh

Description

@mbi-flh

Summary

Currently, spark-web-proxy relies on the Spark History Server's "incomplete applications" feature to display running Spark applications. This approach has significant limitations when using S3-compatible object storage.

Problem Description

Current Behavior

The proxy successfully detects running Spark applications via the Kubernetes API (visible in logs):
The application 'spark-xxx' was updated: Running at [http://10.233.x.x:4040]

However, these applications do not appear in the UI because the History Server cannot read in-progress event logs from S3.

Root Cause

  1. S3 doesn't support partial file writes: Event log files (.inprogress) remain at 0 bytes until the job completes or the 10MB rolling threshold is reached
  2. Spark enforces a minimum 10MB rolling size: spark.eventLog.rolling.maxFileSize cannot be set below 10MB
  3. Small/short jobs never appear: Applications that don't generate 10MB of event logs are invisible until completion

Environment Details

  • Spark version: 3.5.6
  • Storage: S3-compatible (MinIO)
  • Event log format: eventlog_v2 with rolling enabled
  • History Server: Configured with spark.history.fs.inProgressOptimization.enabled=true

Tested Configurations (all failed)

  • ✅ Enabled inProgressOptimization
  • ✅ Set spark.eventLog.rolling.enabled=true
  • ✅ Tried reducing rolling size (blocked by 10MB minimum)
  • ✅ Tried eventlog_v1 format
  • ✅ Aligned Spark versions (History Server 3.5.6 = Applications 3.5.6)
  • ❌ None of these solve the S3 partial-write limitation

Thank you for this great project! 🙏

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions