Failed to store data in the Data Warehouse. The operation will be retried.

Over recent months a client (with a SCOM 2007 environment) had noticed a lot of DataWarehouse errors. Problems inserting data into the warehouse, problems with performance data and maintenance mode data etc. Problems with data not being included in reports and so forth. So we were called in to take a look…let’s take a look at what we found.

I noticed in the event log on the Management Server (RMS) there were a lot of 31551 errors. These errors were occurring daily and were affecting the ability to store data in the data warehouse. An example of the error is as follows:

Failed to store data in the Data Warehouse. The operation will be retried.
Exception ‘SqlException’: Sql execution failed. Error 8115, Level 16, State 1, Procedure MaintenanceModeChange, Line 157, Message: Arithmetic overflow error converting IDENTITY to data type int.

One or more workflows were affected by this.

Workflow name: Microsoft.SystemCenter.DataWarehouse.Synchronization.MaintenanceMode
Instance name: testserver@thescomlab.com
Instance ID: {8941F949-7940-3A4F-2625-E8EE6DD8BFA9}
Management group: THESCOMLAB

If we take a closer look at this error it tells us that the “MaintenanceMode” table is the problem…and it also says Arithmetic overflow error converting IDENTITY to data type int. So let’s take a look at the identity value of this table by issuing the following SQL Query.

DBCC CHECKIDENT (‘MaintenanceMode’)

Checking identity information: current identity value ‘2147483647’, current column value ‘2147483647’.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

This will tell us what the identity value of our MaintenceMode is: in this case it was 2147483647. As a comparison I ran it on my test lab…my value is only 120.

DBCC CHECKIDENT

 

 

 

 

 

 

 

I then ran the following query:

Select max(MaintenanceModeRowId) from MaintenanceMode

This shows us what the current row value is in the MaintenanceMode table. The result this client got was 8742350. The issue is that these 2 values need to be the same because SCOM will need to increment the value the next time it inserts a new row. So we’ll need to fix this and make sure that these 2 commands return the same value.

We’ll run the following SQL (passing in the value we received from the previous query):

DBCC CHECKIDENT(‘dbo.MaintenanceMode’,RESEED, 8742350)

 

Then run the following command again to ensure the value has changed.

DBCC CHECKIDENT (‘MaintenanceMode’)
Checking identity information: current identity value ‘8742350’, current column value ‘8742350’.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

 

Once this change was made and these values aligned we then saw a lot of errors appearing in the event log every minute.

Failed to store data in the Data Warehouse. The operation will be retried.
Exception ‘SqlException’: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

One or more workflows were affected by this.

Workflow name: Microsoft.SystemCenter.DataWarehouse.Synchronization.MaintenanceMode
Instance name: testserver@thescomlab.com
Instance ID: {8941F949-7940-3A4F-2625-E8EE6DD8BFA9}
Management group: THESCOMLAB

This is because the data now can be sent through to the data warehouse but the MaintenanceModeStage table has a huge amount of data and as a result the stored procedure MaintenanceModeChange cannot be finished in one minute which is the timeout value allocated for this task.

So, to fix this we performed the following steps:

1. We turned the “Data Warehouse maintenance mode information synchronization rule” off to prevent it from running.

Data Warehouse

 

 

 

 

 

 

 

 

 

2. On the Management Server we “Restarted the HealthService” so the disabled rule would take effect.

3. On the DataWarehouse Server we ran the following query against the OperationsManagerDW. Note: this query must be run as the DBWriter account or it will fail.

exec MaintenanceModeChange @ManagementGroupGuid=’2d503e85-8e51-3912-efff-f2a7b4f0fea4′,@ChangeXml=N'<Root />’,@SynchronizationProcessId=’578F4E34-A0B0-F158-8F21-70691F06C38F’,@TargetOperationalState=default

This query took just under 50 minutes to complete, which explains why they were getting errors all this time. Once we had completed the full run, subsequent runs took just seconds.

4. Re-enabled the rule and create an override for the “Batch Generation Frequency Seconds” and set the value from 30 to 300 so it’s less aggressive.

Data Warehouse Overrides Batch

 

 
Comments

Hi

Thank you a lot for sharing, we just had the same problem and this article solved it for us!

You’re very welcome. Glad I could help you.

Leave a Reply