Search

Tuesday 25 April 2023

SUBSTRING_INDEX function in Databricks Spark

SUBSTRING_INDEX is a string manipulation function that can be used in Databricks, which is a cloud-based big data processing platform based on Apache Spark.

The SUBSTRING_INDEX function allows you to extract a substring from a string by specifying a delimiter and the number of occurrences of the delimiter to consider from the left or right of the string. The syntax for the function is as follows:

SUBSTRING_INDEX(string, delimiter, count)

Where:
string: the input string to extract the substring from.
delimiter: the delimiter used to split the input string.
count: the number of occurrences of the delimiter to consider. If count is positive, the function will extract the substring from the left of the string, starting from the beginning. If count is negative, the function will extract the substring from the right of the string, starting from the end.
Here is an example usage of SUBSTRING_INDEX in Databricks:

%sql
SELECT SUBSTRING_INDEX('www.example.com', '.', 2)

This would return the output www.example as it extracts the first two occurrences of the delimiter "." from the left of the input string "www.example.com".