QuerySetApp is a Scala/Spark application developed with the purpose in mind to test performance and compatibility among different source{d}'s engine versions applying different well know queries on repositories.
$SPARK_HOME/bin/spark-submit \
--name "QuerySetApp" \
--class "tech.sourced.queryset.Main" \
--master $SPARK_MASTER \
--num-executors $NUM_EXECUTORS \
--executor-memory $EXECUTOR_MEM \
--total-executor-cores $EXECUTOR_CORES \
path/to/queryset-0.1.0.jar $REPOS_PATH $REPOS_FORMAT-
SPARK_HOMEenvironment variable must point to the directory whereSparkwas downloaded (e.g./usr/local/spark) -
SPARK_MASTERenvironment variable must point to the master URL for the cluster (e.g.spark://p-spark-master:7077) -
NUM_EXECUTORS,EXECUTOR_MEM,EXECUTOR_CORES: Executor configuration parameters (eg:3,4G,64). -
REPOS_PATHmust point to the directory which contains the repositories. -
REPOS_FORMATmust specify the repositories' format (e.g.siva,standard)
-
First you should set the
Sparkdirectory:export SPARK_HOME="path/to/spark"
-
Build the fatjar to submit to a spark cluster:
make build
It leaves the fatjar under target/scala-2.11/queryset-0.1.0.jar
-
Run application in local uses the repositories under
src/main/resources/siva-filesby default:make run
-
Submit application to a cluster:
SPARK_MASTER="spark://p-spark-master:7077" \
REPOS_PATH="/path/to/repos" \
REPOS_FORMAT="siva" \
make run