Introduction
This article demonstrates how to setup PySpark in Intellij Idea.
Download and Setup Intellij IDEA for PySpark
Step 1: Download Intellij Idea
You can download and install Intellij Idea from below location
URL: https://www.jetbrains.com/idea/download/#section=windows
data:image/s3,"s3://crabby-images/2f793/2f793f74a57248d68956341e1c772610fd765193" alt=""
Step2: Prerequisite for PySpark
Makesure below softwares are installed and configured properly
1. Java 2. Python 3. Spark 4. Hadoop Home for winutils.exe
data:image/s3,"s3://crabby-images/dc31d/dc31d0d4803d8e490ae84f328172d5305df30de1" alt=""
Step3: Install Python plugin in Intellij Idea
data:image/s3,"s3://crabby-images/e45d7/e45d7e517c9ffcf13db4f07631441dd023dcfa1d" alt=""
Step4: Create new Python project in Intellij Idea
data:image/s3,"s3://crabby-images/0b530/0b5307bfc11fa54e00255bbdfbb9905070170f28" alt=""
data:image/s3,"s3://crabby-images/8cc98/8cc98775b10af83849d9c006ae65b1bea3e6e639" alt=""
Step5: Setup PySpark required files in Project Structure
data:image/s3,"s3://crabby-images/ab2b2/ab2b2ca820711ff72d949f08cd2b689743435d74" alt=""
Step6: Create sample PySpark program
from pyspark.sql import SparkSession appName = "AddColumnUsingUDF" #Spark Session spark = SparkSession.Builder().appName(appName).getOrCreate() filePath = 'C:/tools/data/sampleEmpSalaryData.csv' #Read csv file sampleSalaryDF = spark.read.format('csv').options(header='true').load(filePath) #DF has many columns, so restricting to limited columns for better display #Here I am using Python list to pass column names, you may pass column names as directly in select method. colList = ['Emp ID', 'First Name', 'Last Name', 'Date of Birth', 'Salary', 'Last % Hike'] sampleSalaryDF = sampleSalaryDF.select(*colList) sampleSalaryDF.show(5)
data:image/s3,"s3://crabby-images/b3f01/b3f01a944dc73d49520dfe1f9848cda99641be03" alt=""
data:image/s3,"s3://crabby-images/8788b/8788b4d1b3eb751ec266aee0960c8b76587e17e8" alt=""
Step7: run PySpark program
This program will be executed without any issues if you configured Env property and PySpark files as specified above
data:image/s3,"s3://crabby-images/6ac67/6ac675806102155ad5275b744240a828f4b32426" alt=""
Step8: Results Window
If program get executed without any issues, you can find results as like below
data:image/s3,"s3://crabby-images/093d9/093d9ec3ef1d69b528087115895c5138c9c0f375" alt=""
Simple Program created to read csv file, and tested its working with Intellij idea.
Sample Data: courtesy to eforexcel.com
Copyright - There is no copyright on the code. You can copy, change and distribute it freely. Just mentioning this site should be fair
(C) November 2020, manivelcode