This post is going to ignore Project Online as I am still a little underwhelmed by the reporting capabilities of the Excel+ODATA Power-shennanigans. Mostly I spend my BI days in SQL Reporting Services for our on-premise customers, and even our larger Project online customers are moving back to local BI storage and SQL RS.
Anyway, rant over.
So SSRS has been my friend for some years now and I continue to find new and interesting ways to make it do the things i want. However the real fun is down in the TSQL area. Here is a little selection of things I've found that may help you optimise or improve your report outputs.
Large datasets and the AssignmentByDay data
We all know that the MSP_EPMAssignmentbyDay table is a beast and is the main limiting factor when we look at time by period reporting. Yes it would be great if MS did an automatic aggregation to weekly periods as part of the publish job but that is never gonna happen. However it is possible to get some performance out of this monster table.
First things first. get your indexes in order!
Second things second. NEVER USE THE VIEW.
The MSP_EPMAssignmentbyDay_Userview is where madness lives. The view loops through a join to the MSP_EPMAssignementBaselineByDay table and aggregates the Baseline0-10 values by day - even if you dont use more than one baseline. Therefore you are executing a huge amount of SQL processing time just to get the values out.
Suggestion: If you need both PLAN and BASELINE0 values, just use the tables in your join, ensuring you select the correct BaselineNumber when joining to the AssignmentUID
SELECT foo
FROM MSP_EPMAssignment A
LEFT JOIN MSP_EPMAssignmentByDay AD
ON A.AssignmentUID = AD.AssignmentUID
LEFT JOIN MSP_EPMAssignmentBaselineByDay BD
ON A.AssignmentUID = BD.AssignmentUID as BD.BaselineNumber = 0
Then you can extract your Baseline data and Planned data by day from the source tables without the overhead of the other baseline queries that are held in the view.
Aggregation
Lets face it most people don't need by day information except when looking at near-field operational level reports such as who's doing what next week. These reports tend to have a limited date range and therefore the data set is reduced anyway, normally by adding a WHERE into the JOIN
INNER JOIN (
SELECT foo
FROM MSP_EPMAssignmentByDay
WHERE TimeByDay >= @DateFrom
AND TimeByDay <= @DateTo
) AD on ..................
So summing up to Week or Month is by far and away the best method. Doing this early can sometimes reap rewards too (see optimisation).
So just to make sure we know how to turn days into months, here's a sample :
SELECT SUM(ad.AssignmentWork) as WorkByMonth
,DATEADD(Month,DATEDIFF(Month,0,ad.TimeByDay),0) as MonthPeriodStart
FROM foo ad
This MonthPeriodStart will return YYYY-MM-01 00:00:00 type values which work great for reporting outputs as you can then fiddle with the date format output to display JAN-00 or similar.
Note: It is always useful to aggregate early using a pre-select and join later (see below)
Optimisation
Pre-selecting from the AssignmentsbyDay
Sticking the ByDay table in a big long nasty TSQL JOIN can really bear down on your performance. Therefore it can be useful to grab this data early, stick it in memory and then reference it. Normal approaches can be #temptables, @tablevariables or Common Table Expressions (CTE). I have a fondness for the CTE just because it is so useful in so many ways, especially its recursive nature (more to follow).
So to use the CTE as the pre-select you could do something like this:
DECLARE @DateFrom Date = 'datevalue'
DECLARE @DateTo Date = 'datevalue'
; WITH Assn AS
( SELECT foo
FROM MSP_EPMAssignmentByDay
WHERE TimeByDay >= @DateFrom
AND TimeByDay <= @DateTo
)
SELECT Assn.Foo
FROM Assn -- cte name
INNER JOIN on more tables as per normal
If we compare this approach with two standard SELECT models we see the difference in the estimated execution plan performance (on a small dataset)
DECLARE @DateFrom Date = '2015-01-01'
DECLARE @DateTo Date = '2016-01-01'
--Option 1 CTE pre-select
; WITH Assn AS
( SELECT *
FROM MSP_EPMAssignmentByDay
WHERE TimeByDay >= @DateFrom
AND TimeByDay <= @DateTo
)
SELECT Assn.*
FROM MSP_EpmAssignment A
inner join Assn
on assn.assignmentuid = a.assignmentuid
--Option 2 Subquery for assignment by day data
SELECT *
FROM ( SELECT *
FROM MSP_EPMAssignmentByDay ad
WHERE TimeByDay >= @DateFrom
AND TimeByDay <= @DateTo
) ad
inner join MSP_EpmAssignment A
on ad.assignmentuid = a.assignmentuid
--Option 3 Standard inline query
SELECT *
FROM MSP_EPMAssignmentByDay ad
inner join MSP_EpmAssignment A
on ad.assignmentuid = a.assignmentuid
WHERE TimeByDay >= @DateFrom
AND TimeByDay <= @DateTo
(Also - spot the missing index!)
So we see that each in-line SQL execution costs 36% overall whereas the CTE pre-select costs just 29% which is a small but not insignificant win!
interestingly moving the WHERE clause into a subquery didn't really get us any joy.
Just for completeness, the temp table approach cost than all the other query methods (with this small dataset) although I would expect some gains when larger datasets are involved. I have never seen it match CTE performance though but mileage may vary of course!
What happened to my months - filling in the gaps
When using [for example] a Matrix in SSRS or an Excel Pivot, if you are reviewing data from the AssignmentsByDay data and there are some periods where there are no data then you will lose months in the output. This could be especially problematic if you are doing Missing Timesheet reports or Resource Capacity reporting where the fact that work/actual work may not exist in a period is actually the point of the report so missing it out is not valid.
So how to fix. This is something I have been playing with recently and I am 100% not saying this is a perfect method but it's working right now.
So what we need is a fixed set of periods between @datefrom and @dateto
Firstly I use a recursive CTE to build these weekly periods:
DECLARE @ViewFrom DATE = '2015-10-26'
DECLARE @MinDate date =@ViewFrom
DECLARE @MaxDate date =DATEADD(WEEK,27,@ViewFrom)
;WITH ctedaterange
AS (
SELECT [Dates]=@MinDate
UNION ALL
-- recursion
SELECT DateAdd(WEEK,1,[dates])
FROM ctedaterange
WHERE [dates] < @MaxDate
)
SELECT * FROM ctedaterange
This will build us a set of 27 weekly periods between 2015-10-26 onward
Next we want to use this CTE and create a set of rows by Resource and "Dates" so that each resource has a date period associated. We do this with a horrible JOIN as follows
SELECT foo FROM MSP_EPMRESOURCE, ctedaterange
This will build us a set of 27 weekly periods between 2015-10-26 onward for each resource!
Once you have this you have a number of options to begin to join that data created to your ByDay planned data. I find joining on a combination of ResourceUID and time periods fields works well. Alternatively you can create a UNION on this data to your by period query. Either way you end up with a full set of periods even when no data is planned in.
I am not going to go into examples here as this goes way too deep for this blog post. Sorry ;o)
Parent Child relationships in Task structures
A question I've come across quite often over the last year is how would you join a Task with AssignmentWork on it to another task further up the task tree (but not directly the parent summary task). This is a specific need seem by those that are performing integration to external systems such as SAP PS, specifically aggregating to a SAP WBS code entered into a summary tasks in a plan somewhere up the task tree (as SAP PS is at a higher level than detailed operational planning).
This is easy when the summary task is just one level up (via the TaskParentUID field). However what happens when the aggregation point is "somewhere above the Task Assignment" level.
To do this we need to JOIN the Task with the Assignment with an summary level field somewhere above it. So in this example we are going to JOIN Task to a LEVEL 3 Task.
SELECT L3.foo, tv.foo
FROM MSP_EPMTask tv
LEFT JOIN
( SELECT TaskUID,TaskName, TaskOutlineNumber, ProjectUID,TaskWBS
FROM MSP_EPMTask T
WHERE TaskOutlineLevel = '3' -- this gets the L3 task parents
) L3
ON CONVERT(NVARCHAR(100), L3.TaskOutlineNumber)+'.'
= LEFT(tv.TaskOutlineNumber,LEN(CONVERT(NVARCHAR(100),L3.TaskOutlineNumber)+'.'))
AND L3.projectuid = tv.ProjectUID
So what we are doing here is using the TaskOutlineNumber (1.3.1.10.) from the TaskAssignment level and the L3 Summary level (1.3.1.) and joining on the 1.3.1. value.
So this works for a fixed value for the summary but you can also do a similar join approach to join onto the first, second, lowest summary above the Task Assignment that matches certain criteria], for example: Find the first summary task above the Task Assignment where a SAP WBS code is entered.
To do this you need to use a RANK/PARTITION in the Summary JOIN
What RANK allows you to do is associate a numeric ranking to your SQL results table. The Partition element allows you to Rank within a subset of the rows returned. In this scenario the Partition is the Task Assignment UID as we want to find all the parents of that task and then select the one we want based on RANK.
Now again i am not gonna give you the complete view of this as it really depends on your data/query/requirements but here's a sample.
SELECT data.*
FROM
(
SELECT RANK()
OVER (
PARTITION BY tv.TASKUID
ORDER BY L3.TaskIndex DESC
) r
, l3.taskname as SummaryName
, tv.TaskName
FROM ( SELECT *
FROM msp_EPMTASK
WHERE TaskIsSummary = 1
and SAPWBS IS NOT NULL
) L3
LEFT JOIN ( SELECT *
FROM msp_EPMTASK
WHERE TaskIsSummary = 0
) tv
ON CONVERT(NVARCHAR(100), L3.TaskOutlineNumber)+'.'
= LEFT(tv.TaskOutlineNumber,
LEN(CONVERT(NVARCHAR(100),L3.TaskOutlineNumber)+'.'))
AND L3.projectuid = tv.ProjectUID
) data
To do this we add in a filter on the RANK alias r
WHERE data.r = 1
and tadaa we have a table that shows the bottom level task and the lowest summary task where WBS Code is set. Now if you include the TaskUIDs etc in the subselects you can join out to wherever you need from here.
Anyway, rant over.
So SSRS has been my friend for some years now and I continue to find new and interesting ways to make it do the things i want. However the real fun is down in the TSQL area. Here is a little selection of things I've found that may help you optimise or improve your report outputs.
Large datasets and the AssignmentByDay data
We all know that the MSP_EPMAssignmentbyDay table is a beast and is the main limiting factor when we look at time by period reporting. Yes it would be great if MS did an automatic aggregation to weekly periods as part of the publish job but that is never gonna happen. However it is possible to get some performance out of this monster table.
First things first. get your indexes in order!
Second things second. NEVER USE THE VIEW.
The MSP_EPMAssignmentbyDay_Userview is where madness lives. The view loops through a join to the MSP_EPMAssignementBaselineByDay table and aggregates the Baseline0-10 values by day - even if you dont use more than one baseline. Therefore you are executing a huge amount of SQL processing time just to get the values out.
Suggestion: If you need both PLAN and BASELINE0 values, just use the tables in your join, ensuring you select the correct BaselineNumber when joining to the AssignmentUID
SELECT foo
FROM MSP_EPMAssignment A
LEFT JOIN MSP_EPMAssignmentByDay AD
ON A.AssignmentUID = AD.AssignmentUID
LEFT JOIN MSP_EPMAssignmentBaselineByDay BD
ON A.AssignmentUID = BD.AssignmentUID as BD.BaselineNumber = 0
Then you can extract your Baseline data and Planned data by day from the source tables without the overhead of the other baseline queries that are held in the view.
Aggregation
Lets face it most people don't need by day information except when looking at near-field operational level reports such as who's doing what next week. These reports tend to have a limited date range and therefore the data set is reduced anyway, normally by adding a WHERE into the JOIN
INNER JOIN (
SELECT foo
FROM MSP_EPMAssignmentByDay
WHERE TimeByDay >= @DateFrom
AND TimeByDay <= @DateTo
) AD on ..................
So summing up to Week or Month is by far and away the best method. Doing this early can sometimes reap rewards too (see optimisation).
So just to make sure we know how to turn days into months, here's a sample :
SELECT SUM(ad.AssignmentWork) as WorkByMonth
,DATEADD(Month,DATEDIFF(Month,0,ad.TimeByDay),0) as MonthPeriodStart
FROM foo ad
This MonthPeriodStart will return YYYY-MM-01 00:00:00 type values which work great for reporting outputs as you can then fiddle with the date format output to display JAN-00 or similar.
Note: It is always useful to aggregate early using a pre-select and join later (see below)
Optimisation
Pre-selecting from the AssignmentsbyDay
Sticking the ByDay table in a big long nasty TSQL JOIN can really bear down on your performance. Therefore it can be useful to grab this data early, stick it in memory and then reference it. Normal approaches can be #temptables, @tablevariables or Common Table Expressions (CTE). I have a fondness for the CTE just because it is so useful in so many ways, especially its recursive nature (more to follow).
So to use the CTE as the pre-select you could do something like this:
DECLARE @DateFrom Date = 'datevalue'
DECLARE @DateTo Date = 'datevalue'
; WITH Assn AS
( SELECT foo
FROM MSP_EPMAssignmentByDay
WHERE TimeByDay >= @DateFrom
AND TimeByDay <= @DateTo
)
SELECT Assn.Foo
FROM Assn -- cte name
INNER JOIN on more tables as per normal
If we compare this approach with two standard SELECT models we see the difference in the estimated execution plan performance (on a small dataset)
DECLARE @DateFrom Date = '2015-01-01'
DECLARE @DateTo Date = '2016-01-01'
--Option 1 CTE pre-select
; WITH Assn AS
( SELECT *
FROM MSP_EPMAssignmentByDay
WHERE TimeByDay >= @DateFrom
AND TimeByDay <= @DateTo
)
SELECT Assn.*
FROM MSP_EpmAssignment A
inner join Assn
on assn.assignmentuid = a.assignmentuid
--Option 2 Subquery for assignment by day data
SELECT *
FROM ( SELECT *
FROM MSP_EPMAssignmentByDay ad
WHERE TimeByDay >= @DateFrom
AND TimeByDay <= @DateTo
) ad
inner join MSP_EpmAssignment A
on ad.assignmentuid = a.assignmentuid
--Option 3 Standard inline query
SELECT *
FROM MSP_EPMAssignmentByDay ad
inner join MSP_EpmAssignment A
on ad.assignmentuid = a.assignmentuid
WHERE TimeByDay >= @DateFrom
AND TimeByDay <= @DateTo
(Also - spot the missing index!)
So we see that each in-line SQL execution costs 36% overall whereas the CTE pre-select costs just 29% which is a small but not insignificant win!
interestingly moving the WHERE clause into a subquery didn't really get us any joy.
Just for completeness, the temp table approach cost than all the other query methods (with this small dataset) although I would expect some gains when larger datasets are involved. I have never seen it match CTE performance though but mileage may vary of course!
What happened to my months - filling in the gaps
When using [for example] a Matrix in SSRS or an Excel Pivot, if you are reviewing data from the AssignmentsByDay data and there are some periods where there are no data then you will lose months in the output. This could be especially problematic if you are doing Missing Timesheet reports or Resource Capacity reporting where the fact that work/actual work may not exist in a period is actually the point of the report so missing it out is not valid.
So how to fix. This is something I have been playing with recently and I am 100% not saying this is a perfect method but it's working right now.
So what we need is a fixed set of periods between @datefrom and @dateto
Firstly I use a recursive CTE to build these weekly periods:
DECLARE @ViewFrom DATE = '2015-10-26'
DECLARE @MinDate date =@ViewFrom
DECLARE @MaxDate date =DATEADD(WEEK,27,@ViewFrom)
;WITH ctedaterange
AS (
SELECT [Dates]=@MinDate
UNION ALL
-- recursion
SELECT DateAdd(WEEK,1,[dates])
FROM ctedaterange
WHERE [dates] < @MaxDate
)
SELECT * FROM ctedaterange
This will build us a set of 27 weekly periods between 2015-10-26 onward
Next we want to use this CTE and create a set of rows by Resource and "Dates" so that each resource has a date period associated. We do this with a horrible JOIN as follows
SELECT foo FROM MSP_EPMRESOURCE, ctedaterange
This will build us a set of 27 weekly periods between 2015-10-26 onward for each resource!
Once you have this you have a number of options to begin to join that data created to your ByDay planned data. I find joining on a combination of ResourceUID and time periods fields works well. Alternatively you can create a UNION on this data to your by period query. Either way you end up with a full set of periods even when no data is planned in.
I am not going to go into examples here as this goes way too deep for this blog post. Sorry ;o)
Parent Child relationships in Task structures
A question I've come across quite often over the last year is how would you join a Task with AssignmentWork on it to another task further up the task tree (but not directly the parent summary task). This is a specific need seem by those that are performing integration to external systems such as SAP PS, specifically aggregating to a SAP WBS code entered into a summary tasks in a plan somewhere up the task tree (as SAP PS is at a higher level than detailed operational planning).
This is easy when the summary task is just one level up (via the TaskParentUID field). However what happens when the aggregation point is "somewhere above the Task Assignment" level.
To do this we need to JOIN the Task with the Assignment with an summary level field somewhere above it. So in this example we are going to JOIN Task to a LEVEL 3 Task.
SELECT L3.foo, tv.foo
FROM MSP_EPMTask tv
LEFT JOIN
( SELECT TaskUID,TaskName, TaskOutlineNumber, ProjectUID,TaskWBS
FROM MSP_EPMTask T
WHERE TaskOutlineLevel = '3' -- this gets the L3 task parents
) L3
ON CONVERT(NVARCHAR(100), L3.TaskOutlineNumber)+'.'
= LEFT(tv.TaskOutlineNumber,LEN(CONVERT(NVARCHAR(100),L3.TaskOutlineNumber)+'.'))
AND L3.projectuid = tv.ProjectUID
So what we are doing here is using the TaskOutlineNumber (1.3.1.10.) from the TaskAssignment level and the L3 Summary level (1.3.1.) and joining on the 1.3.1. value.
So this works for a fixed value for the summary but you can also do a similar join approach to join onto the first, second, lowest summary above the Task Assignment that matches certain criteria], for example: Find the first summary task above the Task Assignment where a SAP WBS code is entered.
To do this you need to use a RANK/PARTITION in the Summary JOIN
What RANK allows you to do is associate a numeric ranking to your SQL results table. The Partition element allows you to Rank within a subset of the rows returned. In this scenario the Partition is the Task Assignment UID as we want to find all the parents of that task and then select the one we want based on RANK.
Now again i am not gonna give you the complete view of this as it really depends on your data/query/requirements but here's a sample.
SELECT data.*
FROM
(
SELECT RANK()
OVER (
PARTITION BY tv.TASKUID
ORDER BY L3.TaskIndex DESC
) r
, l3.taskname as SummaryName
, tv.TaskName
FROM ( SELECT *
FROM msp_EPMTASK
WHERE TaskIsSummary = 1
and SAPWBS IS NOT NULL
) L3
LEFT JOIN ( SELECT *
FROM msp_EPMTASK
WHERE TaskIsSummary = 0
) tv
ON CONVERT(NVARCHAR(100), L3.TaskOutlineNumber)+'.'
= LEFT(tv.TaskOutlineNumber,
LEN(CONVERT(NVARCHAR(100),L3.TaskOutlineNumber)+'.'))
AND L3.projectuid = tv.ProjectUID
) data
Now what this would return is a record for each summary above the task WHERE SAPWBS IS NOT NULL. What we want is the first one (i.e. the lowest level task) where SAP WBS IS NOT NULL so one record per bottom level task.
To do this we add in a filter on the RANK alias r
WHERE data.r = 1
and tadaa we have a table that shows the bottom level task and the lowest summary task where WBS Code is set. Now if you include the TaskUIDs etc in the subselects you can join out to wherever you need from here.
Comments
Post a Comment