Wednesday, December 05, 2012

Business Intelligence - Part 1 - Date/Time Dimensions, Table Design for periodic aggregate reports

Introduction


Often we end in scratching our head for writing SQLs for a report, where we finally end up writing few SQL in the loop and make a final report. There are age old techniques to achieve that, while we miss the view in doing that.

Sometimes single query could solve our issue in much faster approach than the queries in loop.

Going in search of knowledge of such options I have ended up in learning Data Warehouse and Business Intelligence. The primary approach would be to take baby steps one by one and to reach the destination.

Problem:

What we have:

It is a small store with sales data, what we have is just products, sales invoice and invoice items.


What we need:

Simple intelligence reports

  1. Sales per day of the provided month
  2. Sales per day of the provided week
  3. Sales per quarter of the provided year
  4. Sales per year overall
  5. and more if possible

Solution

Introduce date and time dimensions


Add new dimension tables as above, these tables help to give more business related information like weekday name like Monday, Tuesday, or the quarter of the year Q1, Q2,Q3,Q4 etc.,

The date dimension should have the date_key as long value like 20091125 to map a 2009, November 25th.
Having this as a numeric field like long will help the joins to be faster. Each other table columns are expected to repeat the values in detail. year as 2009, month as 11, day_of_month as 25 etc. We could add more columns as much as needed to provide the business reports.

The time would have a fixed 24 x 60 entries of 1440. If in case we need a second based match, we may need to have 86400 records, but better to avoid second level reporting as it is not required for the store management.

Introduce dimension mapping columns

Add the dimension mapping columns in the invoice for the invoice date, which will have invoice_date_key and invoice_date_time_key.

Both would have numeric values like 20091125 and 1429.

Write the queries.

1. Sales per day of the provided month

SELECT dd.day_of_month, SUM(invoice.total_amount) FROM invoice
RIGHT OUTER JOIN dim_date  dd ON dd.date_key = invoice.invoice_date_key
AND invoice.invoice_date BETWEEN 'x' AND 'y'
GROUP BY dd.day_of_month
ORDER BY dd.day_of_month

The above would result something like
1 $100
2 $90
3 $2000
4 $1200
5 $600
etc...

1.1 Sales per day of the per month, for provided year



SELECT dd.month_short_name_en, dd.day_of_month, SUM(invoice.total_amount) FROM invoice
RIGHT OUTER JOIN dim_date  dd ON dd.date_key = invoice.invoice_date_key
AND invoice.invoice_date BETWEEN 'x' AND 'y'
GROUP BY dd.month, dd.day_of_month
ORDER BY dd.month, dd.day_of_month



The above would result something like
Jan 1 $100
Jan 2 $90
Jan 3 $2000
...

Feb 1 $110
Feb 2 $20
Feb 3 $1000

Feb 4 $1200
Feb 5 $600
etc...


The join with the dimension can be varied and more grouping and aggregation can be done to form variety of reports in a single query, which could impress the business.


    No comments:

    Post a Comment