This is my Python code
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("PySpark Spline Example").master("local[1]") \
.getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
spark.range(10).write.mode("overwrite").parquet("./temp")
spark.stop()
This is my task submission script:
spark-submit --master local \
--packages za.co.absa.spline.agent.spark:spark-2.3-spline-agent-bundle_2.11:2.2.3 \
--conf "spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener" \
--conf "spark.spline.lineageDispatcher=console" \
lineage_demo.py
spark=2.3.2 python=3.7
However, it reports an error during execution:
py4j.protocol.Py4JJavaError: An error occurred while calling o32.sessionState.
: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.internal.SessionStateBuilder':
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1069)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:145)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:144)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:144)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:141)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Session is unexpectedly missing. Spline cannot be initialized.
at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener$$anonfun$2.apply(SplineQueryExecutionListener.scala:35)
at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener$$anonfun$2.apply(SplineQueryExecutionListener.scala:35)
at scala.Option.getOrElse(Option.scala:121)
at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.<init>(SplineQueryExecutionListener.scala:35)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2747)
at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2736)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2736)
at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$$lessinit$greater$1.apply(QueryExecutionListener.scala:83)
at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$$lessinit$greater$1.apply(QueryExecutionListener.scala:82)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.sql.util.ExecutionListenerManager.<init>(QueryExecutionListener.scala:82)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$listenerManager$2.apply(BaseSessionStateBuilder.scala:270)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$listenerManager$2.apply(BaseSessionStateBuilder.scala:270)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.listenerManager(BaseSessionStateBuilder.scala:269)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:297)
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1066)
... 16 more
I don't know how to solve it anymore. Although I switched to Spark 3.5 and it works fine, the company cluster has Spark=2.3, so I want to know the correct way to use Split with Spark 2.3
This is my Python code
This is my task submission script:
spark=2.3.2 python=3.7
However, it reports an error during execution:
I don't know how to solve it anymore. Although I switched to Spark 3.5 and it works fine, the company cluster has Spark=2.3, so I want to know the correct way to use Split with Spark 2.3