Enabling execution of Cartesian Joins in MicroStrategy

http://insights.iolap.com/author/ajacob/

Anil Jacob is a Senior BI Architect at iOLAP, Inc. having extensive experience in Business Intelligence Implementations with a unique blend of data integration and BI skills . He is highly motivated and keeps himself updated with the latest trends in this field including Big Data and Data Science. His areas of specializations include MicroStrategy, Informatica and Netezza.

View all author posts →

Recent years have seen the emergence of self-service and data visualization vendors such as Tableau, Power BI and Qlik Sense that has forced enterprise BI players such as MicroStrategy to improvise their tools to compete with these vendors. I have been a solutions architect in the field of BI been using MicroStrategy for more than 10 years. From my perspective as an architect, I appreciate the power of the underlying SQL engine in MicroStrategy and do not foresee self-service vendors replacing MicroStrategy as an enterprise BI tool in the coming years. What thrills me most as a Solutions Architect in Microstrategy is implementing workaround solutions to meet complex reporting requirements. In other words, I am building additional logic to change the default behavior of the MicroStrategy SQL engine. One example that comes to mind is the effective use of Cartesian joins of dimension tables in certain advanced reporting scenarios. As per conventional best practices, it is not recommended to use Cartesian joins in MicroStrategy. However, there are certain business cases where the execution of Cartesian joins is required. In this blog, I have illustrated the use of Cartesian joins of dimension tables using two scenarios.

Scenario 1- Accurate analysis across date dimension without the need of fictitious/dummy data

In this scenario, I used a public data set of construction spending data across educational and religious sectors for the year 2016. I massaged this data set further by deleting construction spending data of religious sector for the month May 2016. Shown below is a simple bar chart visualization displaying this scenario.

As seen in this graph, since there is no data in religious sector for May 2016, the bars are more spread apart in comparison with the educational sector data. Moreover, reference line calculations such mean, median ignore the fact that there was no construction spending data in the religious sector for May 2016, thereby misleading the audience. This issue becomes evident if the bar chart is converted to a line graph.

Using this graph, we may wrongly conclude that construction spending was consistently high during the middle of the year and slows down during the start and end of year.

This issue can be resolved by entering dummy data in the fact table to ensure all months appear across any other dimension during time series analysis. Nevertheless, not all organizations provide flexibility to manually enter fictitious data in the data warehouse to satisfy certain reporting scenarios. In such cases we may have to implement workaround solutions within the reporting layer to handle these complex issues.

In order to resolve this issue using MicroStrategy, I enabled the VLDB property- ‘Cartesian Join Warning’ to ‘Execute’. I also created an additional dummy metric that serves the same purpose as introduction of dummy data within the data warehouse. The sql statement for this metric is shown below.

CREATE TEMPORARY TABLE ZZMD00 as
select	a12.month_key  month_key,
  a11.sector_key  sector_key,
  sum(a11.spend_amt)  WJXBFS1
from	f_construction_spending	a11
  join	d_date	a12
    on 	(a11.calendar_date = a12.calendar_date)
group by	a12.month_key,
  a11.sector_key

CREATE TEMPORARY TABLE ZZMD01 as
select	a11.month_key  month_key,
  a12.sector_key  sector_key,
  max(a11.month_key)  WJXBFS1
from	d_month	a11
  cross join	 d_sector	a12
where	a11.month_key between 201601 and 201612
group by	a11.month_key,
  a12.sector_key 

select	coalesce(pa11.month_key, pa12.month_key)  month_key,
  max(a14.month_short_desc)  month_short_desc,
  coalesce(pa11.sector_key, pa12.sector_key)  sector_key,
  max(a13.category)  category,
  max(pa11.WJXBFS1)  WJXBFS1,
  max(pa12.WJXBFS1)  WJXBFS2
from	ZZMD00	pa11
  full outer join	ZZMD01	pa12
    on 	(pa11.sector_key = pa12.sector_key and 
  pa11.month_key = pa12.month_key)
  join	d_sector	a13
    on 	(coalesce(pa11.sector_key, pa12.sector_key) = a13.sector_key)
  join	d_month	a14
    on 	(coalesce(pa11.month_key, pa12.month_key) = a14.month_key)
group by	coalesce(pa11.month_key, pa12.month_key),
  coalesce(pa11.sector_key, pa12.sector_key)

drop table ZZMD00
drop table ZZMD01

This highlighted sql pass is basically a Cartesian join between the dimensions- Month and Sector with an additional filter restricting data to 2016. This sql pass thereby ensures there is a row for all months of 2016 for all sectors irrespective of the actual construction spending data.

The revised bar chart and line chart with the introduction of this metric is shown below.

As seen in the graphs, the revised average line considers the month of May 2016, dropping the average from 311 to 285.

Scenario 2- Month End Reporting

For this scenario, let us consider a hypothetical example of ‘Sales Commissions data’. A sales man ‘TOM’ belongs to the ‘CENTRAL’ region. On Jan 15, 2017, he moved from ‘CENTRAL’ region to ‘SOUTH EAST’ region. The dimension data corresponding to ‘Tom’ is shown below.

Sales Rep Key	Sales Rep ID	Sales Rep	Region	Effective Start Date	Effective End Date
100	1	TOM	CENTRAL	1/1/2016	1/15/2017
200	1	TOM	SOUTH EAST	1/16/2017	12/31/9999
100	1	TOM	CENTRAL	1/1/2016	1/15/2017
200	1	TOM	SOUTH EAST	1/16/2017	12/31/9999

As seen above, Type 2 strategy has been used to maintain history for the ‘Sales Rep’ dimension.

The Commissions fact data for this sales rep is shown below

Sales Rep Key	Month	Commission Amt
100	Jan-17	2000
200	Jan-17	1000

A simple report to show distribution of commissions across regions has been shown below.

Let us make this scenario a bit more complicated. What if the requirement for the month-end report is to allocate all commissions for each month to the region where the sales rep was assigned at the end of the month? To satisfy this requirement, I had to implement a workaround solution in which execution of Cartesian Joins was enabled. In this solution, the month-end region had to be created as a metric. An alias table for the Type 2 dimension was also created. The sql is shown below.

CREATE TEMPORARY TABLE ZZMD00 as
select	a11.sls_rep_id  sls_rep_id,
  a12.month_key  month_key,
  sum(a11.comm_amt)  WJXBFS1
from	f_commissions	a11
  join	d_date	a12
    on 	(a11.earnings_date = a12.calendar_date)
where	a12.month_key = 201701
group by	a11.sls_rep_id,
  a12.month_key 

CREATE TEMPORARY TABLE ZZMD01 as
select	a11.sls_rep_id  sls_rep_id,
  a12.month_key  month_key,
  max(a11.sls_rep_nm)  WJXBFS1,
  max(a11.sls_rep_region)  WJXBFS2
from	d_salesrep	a11
  cross join	 d_month	 a12
where	(((a11.sls_rep_key)
 in	(select	r12.sls_rep_key
  from	d_month	r11
    cross join	 d_salesrep	r12
  where	r11.month_end_date between r12.effective_start_dttm and r12.effective_end_dttm
  group by	r12.sls_rep_key))
 and a12.month_key = 201701)
group by	a11.sls_rep_id,
  a12.month_key 

select	coalesce(pa11.sls_rep_id, pa12.sls_rep_id)  sls_rep_id,
  coalesce(pa11.month_key, pa12.month_key)  month_key,
  a14.month_short_desc  month_short_desc,
  pa11.WJXBFS1  WJXBFS1,
  pa12.WJXBFS1  WJXBFS2,
  pa12.WJXBFS2  WJXBFS3
from	ZZMD00	pa11
  full outer join	ZZMD01	pa12
    on 	(pa11.sls_rep_id = pa12.sls_rep_id and 
  pa11.month_key = pa12.month_key)
  join	d_month	a14
    on 	(coalesce(pa11.month_key, pa12.month_key) = a14.month_key)


drop table ZZMD00
drop table ZZMD01

The highlighted sql pass is basically a Cartesian join between the dimensions- Month and Sales Rep identifying all sales rep versions that are active for the date- ‘201701’.

The revised report results is shown below.

As seen in both the scenarios, execution of Cartesian joins can prove to be very useful to meet complex reporting scenarios. What was used in both these scenarios was essentially a non-equi join with additional where clauses. It is recommended though that enabling execution of Cartesian joins be enabled only for certain complex reports. Enabling this setting at an overall project level can result in incorrect results and performance degradation especially when the Cartesian joins are implemented without any filters.

Cartesian Join MicroStrategy