PySpark – dev set up – Eclipse – Windows

By | October 4, 2017

For our example purposes, we will set-up Spark in the location: C:\Users\Public\Spark_Dev_set_up
Note: I am running Eclipse Neon

Prerequisites

  1. Python 3.5
  2. JRE 8
  3. JDK 1.8

Steps to set up:

  1. Download from here: https://spark.apache.org/downloads.html
    1. Choose a Spark release: 2.1.0
    2. Choose a package type: Pre-built for Apache Hadoop 2.6
    3. Download below version of Spark:


  2. Download winutils.exe
    Download from http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe and copy to C:\Users\Public\Spark_Dev_set_up\spark-2.1.0-bin-hadoop2.6\wintuils\bin
  3. In Eclipse, Set environment variables:
    Windows -> Preferences -> Environment
    Variable: SPARK_HOME
    Value: C:\Users\Public\Spark_Dev_set_up\spark-2.1.0-bin-hadoop2.6

    Variable: HADOOP_HOME
    Value: C:\Users\Public\Spark_Dev_set_up\spark-2.1.0-bin-hadoop2.6\winutils


  4. In Eclipse, Add libraries to PYTHONPATH:

    Windows -> Preferences -> Libraries -> New Egg/Zip(s) -> C:\Users\Public\Spark_Dev_set_up\spark-2.1.0-bin-hadoop2.6\python\lib\pyspark.zip

    Windows -> Preferences -> Libraries -> New Egg/Zip(s) -> C:\Users\Public\Spark_Dev_set_up\spark-2.1.0-bin-hadoop2.6\python\lib\py4j-0.10.4-src.zip

  5. In Eclipse, run sample pyspark program:

Leave a Reply

Your email address will not be published. Required fields are marked *