Installing Spark on a Mac

Apache Spark

I hope these instructions will help you get a Spark environment up and running. I have this working on Mac OSX 10.10.3 Yosemite - It should be applicable to move Linux distros however.

First the bad news.... we need to install a few other things before we can do Spark...

  • Install Tasks
    1. Java
    2. My Java Environment
    3. Ant
    4. Mavern
    5. JDK
    6. Spark
My Java Environment

I do not want to install much into the System partitions, instead I will create my own JAVA area ... This way I can develop/play without effecting the integrity of the system.

In the following examples - my Java environment will all be based on ~/JAVA_ENV

So Lets create this and get installing

mkdir ~/JAVA_ENV/bin

Apache Ant

Ant is the make of the Java world. I downloaded it from apache apache-ant-1.9.5-bin.tar.gz and then

cd Downloads
tar -zxvf  apache-ant-1.9.5-bin.tar.gz 
cp ~/Downloads/apache-ant-1.9.5/bin/* ~/JAVA_ENV/bin/ 
cp ~/Downloads/apache-ant-1.9.5/lib/* ~/JAVA_ENV/bin/

We will Build a Java enviroment file as we go.

Java Env

In the bash shell please type these commands

echo "ANT_HOME=~/JAVA_ENV/bin" >> ~/Java.env
echo "PATH=\$ANT_HOME:\$PATH " >> ~/Java.env

Java JDK

I went to the Oracle site - and installed the latest JDK (jdk1.8.0_45) - this was a typical Mac install, and typically I had no idea where to find the JDK root.

However using the utility /usr/libexec/java_home I can 100% guarantee it is installed and where it is.

Java Env

echo "JAVA_HOME=\$(/usr/libexec/java_home)" >> ~/Java.env
echo "export JAVA_HOME" >> ~/Java.env

Apache Maven

I again download Maven from apache.org - this time taking the binary apache-maven-3.3.3-bin.tar.gz

To install this I carried out these steps

cd ~/JAVA_ENV
cp ~/Downloads/apache-maven-3.3.3-bin.tar.gz .
tar -zxvf apache-maven-3.3.3-bin.tar.gz
ln -s apache-maven-3.3.3 maven

If you create a logical link - you can install new versions of the product - and just use them by changing the symbolic link. Should they not work (or have a version miss-match) then you can quickly and seemlessly switch back to the previous version.

Java Env

echo "MAVEN_HOME=~/JAVA_ENV/maven" >> ~/Java.env
echo "export M2_HOME" >> ~/Java.env
echo "export MAVEN_HOME" >> ~/Java.env
echo "PATH=\$MAVEN_HOME\/bin:\$PATH " >> ~/Java.env

Apache Spark

We now (yes you guessed) download Spark from apache.org, and carry out these steps. I chose the Source code - as I do not have a Hadoop Enviroment on my Mac.

cd ~/Downloads
tar -zxvf Spark1.4.tar.gz
mv Spark1.4 ~/JAVA_ENV/
cd ~/JAVA_ENV
ln -s Spark1.5 Spark
cd Spark

I need to get my Java enviroment sorted - so

source ~/Java.env

Now build using mvn mvn -DskipTests clean package

If this produces an error like

command not found

Then you have an error with your Java setup, mvn (maven) can not be found.

The instalation at this point needs to access the Internet - as there are more packages and modules to be added. Just sit back and relax, go find a good book - and let all the packages download.

Quick Test

After the mavern build process has completed - you can quickly check if everything worked by

cd ~/JAVA_ENV/spark
./bin/spark-shell --master local[2]

You should see the scala> prompt

To get out just type

exit

Coming Next - Using Spark with Python