from pyspark.sql import SparkSession
# Create a SparkSession with Hive support
spark = SparkSession.builder \
.appName("Hive Read") \
.enableHiveSupport() \
.getOrCreate()
# Read data from a Hive table
df = spark.sql("SELECT * FROM your_hive_table")
# Display the dataframe
df.show()
# Perform further transformations or analysis on the dataframe as needed
# Stop the SparkSession
spark.stop()
In the code above, we create a SparkSession with Hive support by calling .enableHiveSupport() during the session creation.
To read data from a Hive table, you can use the spark.sql() method and pass a SQL query to select the desired data from the table. Replace "your_hive_table" in the SQL query with the actual name of your Hive table.
After reading the data, you can perform further transformations or analysis on the resulting DataFrame object df. Finally, stop the SparkSession using spark.stop().
Ensure that your Spark cluster is properly configured to work with Hive, and the necessary Hive metastore configuration is set up correctly.