Integrate spline with pyspark

I have been trying to run this configuration on Jupyter workbook with emr-serverless application attached.

```python
.config("spark.jars", 
            "s3://bucket/spark-3.5-spline-agent-bundle_2.12-2.2.1.jar") \
    .config("spark.sql.queryExecutionListeners", "za.co.absa.spline.harvester.listener.SplineQueryExecutionListener") \
    .config("spark.spline.lineageDispatcher", "console,file") \
    .config("spark.spline.lineageDispatcher.file.className", "za.co.absa.spline.harvester.dispatcher.FileLineageDispatcher") \
    .config("spark.spline.lineageDispatcher.file.fileName", 
            "s3://bucket/spline_workbook/lineage.csv")
```


script trying to run:
```python
empsDF = spark.read \
    .option("header", "true") \
    .option("inferschema", "true") \
    .csv(input_file_1) 
empsDF1 = empsDF.withColumnRenamed('name', 'Name')
empsDF1.show()

deptsDF = spark.read \
    .option("header", "true") \
    .option("inferschema", "true") \
    .csv(input_file_2)

resultsDF = empsDF1.join(deptsDF, empsDF1.dept_id==deptsDF.dept_id1, "left_outer")
resultsDF.write.csv( output_file_1, header=True, mode = "overwrite")
xdf = empsDF.groupBy('manager_id')
ydf = xdf.agg(sf.sum('salary').alias('total_salary'))
ydf.show()
ydf.coalesce(1).write.csv( output_file_2, header=True, mode = "overwrite")
```

However, even though the run is successful, the lineage file is not created created at the s3 location. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate spline with pyspark #862

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Integrate spline with pyspark #862

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions