” Huston we got a problem!” when I run a availability report the data isn’t complete. I’m missing a huge number of days. The graph shows UP (Monitoring unavailable) But I am really sure the server was up and monitored !!

Analyze:
Don’t panic, we are going to solve this. (I hope..) First we are going to look up the days we are missing. Simply click on the white bar. And the detail report will be rendered.

Okay looks like we are missing the most data from of 4-3-2012. And we see strange gaps of data that is present.
Okay that’s what the report says, but I am a core stuff guy I check it this way:
Open an SQL session and connect to the DWH db. Run this query. The last aggregated data will be on the first row. So you know what the last data date is you have. We change the DateTime to the same datetime we used in the report.
SELECT ManagedEntity.FullName, vStateHourly.* FROM ManagedEntityMonitor INNER JOIN ManagedEntity ON ManagedEntityMonitor.ManagedEntityRowId = ManagedEntity.ManagedEntityRowId INNER JOIN State.vStateHourly ON ManagedEntityMonitor.ManagedEntityMonitorRowId = State.vStateHourly.ManagedEntityMonitorRowId WHERE (ManagedEntity.FullName LIKE ‘Microsoft.Windows.Computer:opsrms01%’) AND vStateHourly.DateTime between ‘20120301’ and ‘20120401’ order by vStateHourly.DateTime desc
|
The output will be:

So the last successful hourly aggregation was 02-03-2012 (dd-mm-yyyy). Hmmmm but when I look at the rendered report I see periods of data after this date ??? I must confess I really don’t have a idea now why 
Now we have to find the root cause and fix this missing aggregations. Luckily we can enable debug information for the aggregation process so we can see more what going wrong.
Open SQL and run the query below to enable debugging for the State aggregation .
UPDATE [OperationsManagerDW].[dbo].[StandardDataset] SET [DebugLevel] = 5
WHERE [SchemaName] = ‘State’ GO
|
Now we can see the debug date with this query:
SELECT TOP (100) DATEADD(hh, 1, DebugMessage.MessageDateTime) AS CET_datetime, StandardDataset.SchemaName , DebugMessage.MessageLevel, DebugMessage.MessageText FROM DebugMessage WITH (nolock) INNER JOIN StandardDataset ON DebugMessage.DatasetId = StandardDataset.DatasetId WHERE (StandardDataset.SchemaName = N’State’) order by messagedatetime desc
|
The output will be as below:

It looks like my aggregation process is way behind !
Since the maintenance for the DWH has a sequence run it means when some procedure before fails (lets say the event staging) the other won’t be hit. So I look in the debug table for other messages with ‘failed’ in the message.
Notice that we are now going a little of track , we main problem was the State report incomplete , but now we are looking at the Events. Just follow me.
SELECT TOP (100) DATEADD(hh, 1, DebugMessage.MessageDateTime) AS CET_datetime, StandardDataset.SchemaName , DebugMessage.MessageLevel, DebugMessage.MessageText FROM DebugMessage WITH (nolock) INNER JOIN StandardDataset ON DebugMessage.DatasetId = StandardDataset.DatasetId where messagetext like ‘%Failed%’ order by messagedatetime desc
|
O no this is not good:

It looks like the event staging is broken. The error is:
Failed to process stagingarea for data set. Error 777971002, Procedure EventProcessStaging, Line 398, Message: Sql execution failed. Error 515, Level 16, State 2, Procedure DebugMessageInsert, Line 15, Message: Cannot insert the value NULL into column ‘MessageText’, table ‘OperationsManagerDW.dbo.DebugMessage’; column does not allow nulls. INSERT fails.
Mmmm When I look at the error I see the debug procedure that writes to the debug log has a problem writing a debug message. Strange this error… So we have to find the real error. So I open the stored procedure “EventProcessStaging”. And there I found a BUG .. brrrr. The variable @InsertTableName is not set to a value before it is used as part of the debug message variable @MessageText. Because you can’t concat NULL to a string variable an exception is raised. I fixed this by moving the sql where this variable @InsertTableName is assigned to above the first use of the @InsertTableName variable. (this is for SCOM 2007 and 2012!) I raised already a bug request @Microsoft throughout the TAP program. This only occurs when you set the debuglevel > 3.
For you it simple means don’t set it above 3 or fix this stp own your own risk. (as I have done ;-0 ) In our case the debug level was already above 3 for the state dataset for the last month. So because the event processing was braking the total maintenance (bad architecture , sorry) all my state staging was stopped. And caused my empty reports.
So now we know it we can go back to the real issue. The missing states fix.
Fix:
This is now very simple. And if you Bing you can find plenty of info on it. One I found out in the beginning very helpful was this one from the Microsoft SCOM team ( http://blogs.technet.com/b/operationsmgr/archive/2011/09/06/standard-dataset-maintenance-troubleshooter-for-system-center-operations-manager-2007.aspx )
What I do is most of the time :
Set a enable = false override on the rule “Standard Data Warehouse Data Set maintenance rule” for all instances of “Standard Data Set”.

Now I am really sure no maintenance process is running.
And I run my own maintenance process every 1 min. Because I know catching up the state data aggregation will take some time and I don’t want to create problem’s for the other datasets (performance , events ..) I will also run the important ones in the same script.
Open a query to the DWH and run:
USE [OperationsManagerDW] DECLARE @DataSet uniqueidentifier
Print ‘starting loop of StandardDatasetMaintenance jobs’ Print ‘Processing dataset:’ Print @DataSet
while (1=1) begin PRINT getdate() Print ‘Start StandardDatasetMaintenance’ SET @DataSet = (SELECT DatasetId FROM StandardDataset WHERE SchemaName = ‘Perf’) EXEC StandardDatasetMaintenance @DataSet
SET @DataSet = (SELECT DatasetId FROM StandardDataset WHERE SchemaName = ‘Exchange2010’) EXEC StandardDatasetMaintenance @DataSet
SET @DataSet = (SELECT DatasetId FROM StandardDataset WHERE SchemaName = ‘State’) EXEC StandardDatasetMaintenance @DataSet
SET @DataSet = (SELECT DatasetId FROM StandardDataset WHERE SchemaName = ‘Event’) EXEC StandardDatasetMaintenance @DataSet –EXEC StandardDatasetProcessStaging @Dataset PRINT getdate() PRINT ‘END StandardDatasetMaintenance’ WAITFOR DELAY ’00:01′ end
|
now you check the debug log on regularly base to see if the state aggregation is completed.
You can also use the query below:
———————————————————————– — check first and last aggregation time from still to be processed data — first and last date must be equal ———————————————————————– Declare @DataSet as uniqueidentifier Set @DataSet = (Select DataSetId From StandardDataSet Where SchemaName = ‘State’) Select AggregationTypeId, COUNT(*) as ‘Count’, MIN(AggregationDateTime) as ‘First’, MAX(AggregationDateTime) as ‘Last’ From StandardDataSetAggregationHistory Where DataSetId = @DataSet AND LastAggregationDurationSeconds IS NULL group by AggregationTypeId
|
So lets check if the process is running okay. Simply rerun the report. the output will be:

looks like its all going to be alright. Just be patient.
DO NOT FORGET:
after the states are complete to remove the overrides , other wise you will have for sure the same and more , problem again.
Not to be continued:
I really hope not. Because in my case we have a DWH size almost against 1TB and because of this size it can be very complex and tricky to solve this sort of problems. So if mr. Murphy is reading this , skip my place please …
Happy SCOMMING,
Michel Kamp
Tags: DataWareHouse, Reporting, SCOM, scom2012