I was called this morning to find out why monitoring wasn’t working at one clients site…Thankfully it was a short lived trouble shooting exercise but an important lesson for people that are new to SCOM.
The first thing I noticed was that every single server in the “Windows Computers” view appeared to have a maintenance on. Just to be sure I located all of the Management Servers and yes, they too were in Maintenance Mode.
So there was the problem. With SCOM 2012, when a SCOM Management Server is placed into maintenance the Configuration Service will simply move the workflows that are targeting the “All Management Servers” Resource Pool to another management server in the pool (assuming you have more than one server). This is good because another server can assume the role of removing objects from maintenance mode when the time to do so is up.
But when all Management Servers are in maintenance at the same time there’s no available server to failover to and you’re stuck in this endless loop of no monitoring.
So the moral of the story…don’t just perform a blanket maintenance on everything, leave at least one of your management servers out of the maintenance, or be prepared to be without monitoring until someone manually removes the maintenance.