Alert generation was temporarily suspended due to too many alerts

Every now and then you’ll come across a warning alert in SCOM that tells you “Alert generation was temporarily suspended due to too many alerts”. This is a built in action that SCOM takes to prevent floods of alerts. In many cases if an agent produces over 50 alerts in a minute its a pretty good guess that something is wrong, rather than there being 50+ genuine alerts.

Alert Generation Suspended

 

 

The actual rule that prevents these floods is configured to suspend monitoring for 10 minutes when 50 alerts have been generated by a rule within 60 seconds. Sometimes you’ll hear people talk about the 50/60/10 rule.

50 Alerts / 60 Seconds / 10 Minutes.

It’ll look like this…take note of the parts I’ve highlighted. The Alert Description tells us that a rule has generated 50 alerts in the last 60 seconds. And the Created Time is 10 minutes before the suspension time is up.

50-60-10

 

But that said there are times where you will genuinely expect to see a great number of alerts that can exceed this amount. Just the other day I was out at a clients site and they expected to see almost 100 events whenever their remote stations were started and the server monitoring them would generate around 100 events in the Windows Application Event Log. So this particular agent needed to be configured to allow it to generate more than the standard 50 events in 60 seconds.

So it’s a pretty simple process but will require a few registry edits. But first, let’s test this theory to make sure the rule is working the way that I’m describing it.

So I’ve created a rule to alert on an error in the Windows Event Log. I’ve configured it to look for an Error with the ID of 56. And I’ll use Windows PowerShell to generate over 50 Alerts…75 in fact.

PowerShell Events

 

 

 

 

 

What we’d expect to happen is that SCOM will suspend alerting on this agent, once we hit the limit of 50 alerts within 60 seconds…

51 Alerts

 

 

 

 

 

 

 

 

 

 

 

 

 

And as expected, you can see in the above image we have 50 Critical Events (not 75) and our Warning telling us that Alert Generation has now been suspended.

So we know that everything works as planned, but for this agent we want to increase this value.

On the agent in question, I opened up Regedit and navigated to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\Management Groups\ followed by the Management Group you wish to configure. Just bear in mind you might have more than 1 Management Group listed here if you are multihoming the agent.

So for me, it was HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\Management Groups\SCOMCulham as my Lab Management Group Name is SCOMCulham.

Registry ManagementGroup

 

 

 

 

 

 

 

 

 

In here we’ll need to create 3 DWORD 32bit Values. Alert Count, Alert Count Interval and Alert Suspend Interval.

Alert Count = How Many Alerts we will accept in the interval specified in “Alert Count Interval”
Alert Count Interval = The Amount of Time we’ll allow continuous events for
Alert Suspend Interval = If the Alert Count is exceeded during the Alert Count Interval, we’ll suspend Alert Generation for this many Seconds.

Registry DWORD Values

 

 

 

 

 

 

 

 

Remember these are DWORD 32 Bit Values and switch the values to Decimal.

Registry Decimal

 

 

 

 

 

 

 

 

 

 

You could of course do this in Windows PowerShell too:

New-ItemProperty “HKLM:\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\Management Groups\SCOMCulham\” -Name “Alert Count” -Value 150 -PropertyType “DWord”
New-ItemProperty “HKLM:\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\Management Groups\SCOMCulham\” -Name “Alert Count Interval” -Value 60 -PropertyType “DWord”
New-ItemProperty “HKLM:\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\Management Groups\SCOMCulham\” -Name “Alert Suspend Interval” -Value 600 -PropertyType “DWord”

 

Ok, now you will need to restart the HealthService on the Agent for these changes to take effect. Personally I use PowerShell Again for this:

Restart-Service HealthService

 

Now I’ll fire off my PowerShell Script, this time I’ll set the amount of Event Log Entries to something that’ll exceed the new value of 150. Let’s try 250.

PowerShell Events 250

 

 

 

 

 

Ok, SCOM has stopped at 150 Alerts as we expected, the 151st Alert being our Suspended Warning Alert.

150-60-10

 

And in the description we can see that a Rule has generated 150 Alerts in 60 Seconds and if you note the time stamps, it’ll be suspended for 10 minutes.

So there you have it, pretty easy to set different values for agents that require it but it’s also good to see that rule working by default as it can prevent flooding.

Happy Alerting 🙂

 

 
Comments

Hey thank you been trying to understand how this in-built Alert Storm prevention could be disabled/modified due to legitimate scenarios like load testing.

Glad to have helped!

Leave a Reply