I am trying to run my spark job in airflow, when I executed this command spark-submit --class dataload.dataload_daily /home/ubuntu/airflow/dags/scripts/data_to_s3-assembly-0.1.jar in terminal, it works fine without any issue.
However, I am doing the same here in airflow, but keep getting the error
/tmp/airflowtmpKQMdzp/spark-submit-scalaWVer4Z: line 1: spark-submit: command not found
t1 = BashOperator(task_id = 'spark-submit-scala',
bash_command = 'spark-submit --class dataload.dataload_daily \
/home/ubuntu/airflow/dags/scripts/data_to_s3-assembly-0.1.jar',
dag=dag,
retries=0,
start_date=datetime(2018, 4, 14))
I have my spark path mentioned in bash_profile,
export SPARK_HOME=/opt/spark-2.2.0-bin-hadoop2.7
export PATH="$SPARK_HOME/bin/:$PATH"
sourced this file as well. Not sure how to debug this, can anyone help me on this?