I hope these instructions will help you get a Spark environment up and running. I have this working on Mac OSX 10.10.3 Yosemite - It should be applicable to move Linux distros however.
First the bad news.... we need to install a few other things before we can do Spark...
- Install Tasks
- My Java Environment
My Java Environment
I do not want to install much into the System partitions, instead I will create my own JAVA area ... This way I can develop/play without effecting the integrity of the system.
In the following examples - my Java environment will all be based on ~/JAVA_ENV
So Lets create this and get installing
Ant is the make of the Java world. I downloaded it from apache apache-ant-1.9.5-bin.tar.gz and then
cd Downloads tar -zxvf apache-ant-1.9.5-bin.tar.gz cp ~/Downloads/apache-ant-1.9.5/bin/* ~/JAVA_ENV/bin/ cp ~/Downloads/apache-ant-1.9.5/lib/* ~/JAVA_ENV/bin/
We will Build a Java enviroment file as we go.
In the bash shell please type these commands
echo "ANT_HOME=~/JAVA_ENV/bin" >> ~/Java.env echo "PATH=\$ANT_HOME:\$PATH " >> ~/Java.env
I went to the Oracle site - and installed the latest JDK (jdk1.8.0_45) - this was a typical Mac install, and typically I had no idea where to find the JDK root.
However using the utility /usr/libexec/java_home I can 100% guarantee it is installed and where it is.
echo "JAVA_HOME=\$(/usr/libexec/java_home)" >> ~/Java.env echo "export JAVA_HOME" >> ~/Java.env
I again download Maven from apache.org - this time taking the binary apache-maven-3.3.3-bin.tar.gz
To install this I carried out these steps
cd ~/JAVA_ENV cp ~/Downloads/apache-maven-3.3.3-bin.tar.gz . tar -zxvf apache-maven-3.3.3-bin.tar.gz ln -s apache-maven-3.3.3 maven
If you create a logical link - you can install new versions of the product - and just use them by changing the symbolic link. Should they not work (or have a version miss-match) then you can quickly and seemlessly switch back to the previous version.
echo "MAVEN_HOME=~/JAVA_ENV/maven" >> ~/Java.env echo "export M2_HOME" >> ~/Java.env echo "export MAVEN_HOME" >> ~/Java.env echo "PATH=\$MAVEN_HOME\/bin:\$PATH " >> ~/Java.env
We now (yes you guessed) download Spark from apache.org, and carry out these steps. I chose the Source code - as I do not have a Hadoop Enviroment on my Mac.
cd ~/Downloads tar -zxvf Spark1.4.tar.gz mv Spark1.4 ~/JAVA_ENV/ cd ~/JAVA_ENV ln -s Spark1.5 Spark cd Spark
I need to get my Java enviroment sorted - so
Now build using mvn mvn -DskipTests clean package
If this produces an error like
command not found
Then you have an error with your Java setup, mvn (maven) can not be found.
The instalation at this point needs to access the Internet - as there are more packages and modules to be added. Just sit back and relax, go find a good book - and let all the packages download.
After the mavern build process has completed - you can quickly check if everything worked by
cd ~/JAVA_ENV/spark ./bin/spark-shell --master local
You should see the scala> prompt
To get out just type
Coming Next - Using Spark with Python