Writing each column into unpivot clause is so bothering for me…
BigQuery(BQ) is very useful for data analysis or processing.
It is good for handling huge data. It returns query result in short time.
During data handling, sometimes we would like to change columns to rows.
But in order to use UNPIVOT
, we have to write each column name in unpivot clause.
It is bothering.
It is better to set columns dynamically.
Are there any easier way for unpivoting ?
So today I introduce about "Dynamic unpivot SQL for multiple columns in BigQuery".
UNPIVOT in BigQuery
We can use BigQuery UNPIVOT
clause like following.
First, prepare data with multiple columns.
WITH Produce AS (
SELECT 'Kale' as product, 51 as Q1, 23 as Q2, 45 as Q3, 3 as Q4 UNION ALL
SELECT 'Apple', 77, 0, 25, 2)
SELECT * FROM Produce
+---------+----+----+----+----+
| product | Q1 | Q2 | Q3 | Q4 |
+---------+----+----+----+----+
| Kale | 51 | 23 | 45 | 3 |
| Apple | 77 | 0 | 25 | 2 |
+---------+----+----+----+----+
Then we set columns into UNPIVOT
operator like IN (Q1, Q2, Q3, Q4)
.
SELECT * FROM Produce
UNPIVOT(sales FOR quarter IN (Q1, Q2, Q3, Q4))
+---------+-------+---------+
| product | sales | quarter |
+---------+-------+---------+
| Kale | 51 | Q1 |
| Kale | 23 | Q2 |
| Kale | 45 | Q3 |
| Kale | 3 | Q4 |
| Apple | 77 | Q1 |
| Apple | 0 | Q2 |
| Apple | 25 | Q3 |
| Apple | 2 | Q4 |
+---------+-------+---------+
But in this way, we have to set each column name in SQL.
It takes time.
I need easier way.
Dynamic unpivot SQL for multiple columns in BigQuery
In order to unpivot for multiple columns, we can use function.
The function is fhoffa.x.unpivot()
.
We can use it like below.
SELECT product, unpivotted
FROM Produce a
, UNNEST(fhoffa.x.unpivot(a, 'Q'))
unpivotted
Then we get the result below.
product | unpivotted.key | unpivotted.value |
---|---|---|
Kale | Q1 | 51 |
Kale | Q2 | 23 |
Kale | Q3 | 45 |
Kale | Q4 | 3 |
Apple | Q1 | 77 |
Apple | Q2 | 0 |
Apple | Q3 | 25 |
Apple | Q4 | 2 |
fhoffa.x.unpivot()
The second parameter of fhoffa.x.unpivot(a, 'Q')
decides unpivot target columns.
In this case, columns that have Q
in their name are target columns.
We get same result with Q[1-9]
.
This function was introduced in 2020.
It is user defined function (UDF).
The source code is opened.
So you can define it in your own project.
Now we can use same SQL with a lot of columns.
Conclusion
Today I explained about "Dynamic unpivot SQL for multiple columns in BigQuery".
In order to unpivot multiple columns, we can use the UDF fhoffa.x.unpivot()
.
With using regular expressions, we can pick a lot of columns.
Now we don't have to set each column name for unpivot.