Introduction
This article demonstrates how to setup PySpark in Intellij Idea.
Download and Setup Intellij IDEA for PySpark
Step 1: Download Intellij Idea
You can download and install Intellij Idea from below location
URL: https://www.jetbrains.com/idea/download/#section=windows
Step2: Prerequisite for PySpark
Makesure below softwares are installed and configured properly
1. Java 2. Python 3. Spark 4. Hadoop Home for winutils.exe
Step3: Install Python plugin in Intellij Idea
Step4: Create new Python project in Intellij Idea
Step5: Setup PySpark required files in Project Structure
Step6: Create sample PySpark program
from pyspark.sql import SparkSession appName = "AddColumnUsingUDF" #Spark Session spark = SparkSession.Builder().appName(appName).getOrCreate() filePath = 'C:/tools/data/sampleEmpSalaryData.csv' #Read csv file sampleSalaryDF = spark.read.format('csv').options(header='true').load(filePath) #DF has many columns, so restricting to limited columns for better display #Here I am using Python list to pass column names, you may pass column names as directly in select method. colList = ['Emp ID', 'First Name', 'Last Name', 'Date of Birth', 'Salary', 'Last % Hike'] sampleSalaryDF = sampleSalaryDF.select(*colList) sampleSalaryDF.show(5)
Step7: run PySpark program
This program will be executed without any issues if you configured Env property and PySpark files as specified above
Step8: Results Window
If program get executed without any issues, you can find results as like below
Simple Program created to read csv file, and tested its working with Intellij idea.
Sample Data: courtesy to eforexcel.com
Copyright - There is no copyright on the code. You can copy, change and distribute it freely. Just mentioning this site should be fair
(C) November 2020, manivelcode