top of page
  • Writer's pictureZainab

Transform Data with SQL Pivot...

A pivot table is a data analysis and summarisation tool commonly used in spreadsheets.

It enables users to organise and summarise large datasets into manageable and meaningful data. Pivoting transposes rows into columns, aggregating data in the process of transposing. As a result, the output of a pivot operation returns more columns and fewer rows.

It provides a flexible way to aggregate and calculate a subset of data based on different dimensions and measures. A chosen aggregation function, such as sum, average, or another statistic, is used to accomplish this.


In SQL, pivots can be used for analysis, with different syntaxes depending on the vendor.

This demonstration focuses on Oracle.


A simple PIVOT syntax could be:

SELECT columns
FROM tables
PIVOT( aggregate_column
       FOR column_to_be_pivoted
       IN (values_of_column_to_be_pivoted)
        )
 

The dataset used in this demo is an e-commerce sales data.


Our initial objective is to efficiently obtain the data in a pivot format encompassing all columns. We must aggregate the price for each purchase and group it by all other columns.


  • The SELECT CLAUSE retrieves all columns from the table. Columns not included in the PIVOT CLAUSE are used for grouping.

  • The PIVOT CLAUSE is used to perform the actual pivot operation.

  • `SUM(order_value)` aggregate function is used to sum the sales amounts for each purchase for different purchase_id

  • `FOR purchase_id IN (360,361,362,363)` specifies which purchase_ids we want to pivot as columns.

NB: You cannot select individual columns in the SELECT clause. It throws an error.



To include specific columns in a query, add them as a subquery in the FROM clause.

The query result is grouped by the columns, not in the PIVOT CLAUSE.

You may also choose to exclude some column values.

In this query, only purchase_id 360 and 362 are preserved.


NB: Columns used in the PIVOT CLAUSE must be included in the subquery as part of the FROM CLAUSE to avoid errors.



Pivoting values from more than one column is possible, as shown below:



Pivoting with Filters

Properly position your WHERE clause to avoid errors.

If you want to retrieve rows with customer_id 41 or 43, place the WHERE clause at the end of your query:

SELECT  *
FROM (SELECT  product_category, brand, order_value,purchase_id,customer_id
      FROM customer_purchase_history)
                        
PIVOT (SUM(order_value)
        FOR purchase_id
        IN (360,362))
        WHERE customer_id IN (41,43)

Alternatively, it can also be part of the subquery in the FROM clause:

SELECT  *
FROM (SELECT  product_category, brand, order_value,purchase_id,customer_id
      FROM customer_purchase_history
      WHERE customer_id IN (41,43) )
PIVOT (SUM(order_value)
       FOR purchase_id
       IN (360,362)) 
                 

An example query is given below:



Pivoting and Multiple Aggregations

Multiple aggregations can be used together. We can count the number of sales for purchase_id 338 and 339 on the most recent date. The result is grouped by product_category and brand.



Pivoting with JOINS

The query above can be modified to retrieve the names of customers who made purchases by joining the customer table to the purchase history table.


In conclusion, SQL pivot is an essential technique for transforming normalised data into a format that can be readily read in real-world business situations. Pivot tables summarise and aggregate data, making it easier to analyse and draw insights. The syntax for creating pivot tables can vary in different database systems, but the basic principles remain the same.

With practice, you can use SQL pivoting to create informative and insightful reports from your data.



See you on another blog...

0 comments
bottom of page