Programming Funda
02/06/2024
Find the Nth Highest Salary Using PySpark ✅❤️
================================
Without Partition:
from pyspark.sql import SparkSession
from pyspark.sql.window import Window
from pyspark.sql.functions import desc, row_number
# creating spark session
spark = (
SparkSession.builder.master("local[*]")
.appName("www.programmingfunda.com")
.getOrCreate()
)
# creating DataFrame from csv file
df = spark.read.option("header", "true").csv("./sample_data.csv")
# creating a window specification
# windowFunction = Window.orderBy(desc("salary"))
# applying window function
df = df.withColumn("rank", row_number().over(windowFunction))
# getting 2nd highest salaried employee
df = df.filter(df["rank"] == 2)
df.show()
---------------------------------
With Partition:
from pyspark.sql import SparkSession
from pyspark.sql.window import Window
from pyspark.sql.functions import desc, row_number
# creating spark session
spark = (
SparkSession.builder.master("local[*]")
.appName("www.programmingfunda.com")
.getOrCreate()
)
# creating DataFrame from csv file
df = spark.read.option("header", "true").csv("./sample_data.csv")
# creating a window specification
windowFunction = Window.partitionBy("department").orderBy(desc("salary"))
# applying window function
df = df.withColumn("rank", row_number().over(windowFunction))
# getting 2nd highest salaried employee
df = df.filter(df["rank"] == 2)
df.show()
--------------------------------------
Let's wrap up this article here
👉 Visit here to read the complete article:- https://www.programmingfunda.com/how-to-find-the-nth-highest-salary-using-pyspark/
💯Follow Programming Funda for more free Data analysis content.
How to Find the Nth Highest Salary Using PySpark In this article, we will see how to find the Nth highest salary using PySpark with the help of the examples. Throughout this article, we will explore two
Click here to claim your Sponsored Listing.
Category
Contact the business
Address
Noida
201301