Search

Sunday 9 July 2023

How to write data from a DataFrame to a CSV file using PySpark

 from pyspark.sql import SparkSession

# Create a SparkSession

spark = SparkSession.builder \

    .appName("Write to CSV") \

    .getOrCreate()

# Assume you have a DataFrame named "df" containing the data

# Write the DataFrame to a CSV file

df.write.csv("path/to/your/file.csv", header=True, mode="overwrite")

# Stop the SparkSession

spark.stop()


In the code above, after creating a SparkSession, we assume you have a DataFrame named df that contains the data you want to write to a CSV file.

To write the DataFrame to a CSV file, use the write.csv() method and specify the desired path to the output file. Set header=True to include the column headers in the CSV file.

You can also specify the mode parameter to determine how the file is written. In the example, we use mode="overwrite" to overwrite the file if it already exists. Other options for mode include "append" and "ignore".

After writing the DataFrame to the CSV file, you can find the output at the specified path ("path/to/your/file.csv" in the example).

Finally, stop the SparkSession using spark.stop().

Make sure to replace "path/to/your/file.csv" with the actual path and file name where you want to save the CSV file.