Nagios ran out of memory
Description
Activity

Marc Dequènes (Duck) August 10, 2018 at 5:11 AM
First, I increased the RAM (or doubled in this case), so it should be fine now.
As for the network condition, there was a planned outage scheduled by the link provider and I think it was not believed to have such impact. Also I don't know why the link redundancy did not help, so we're discussing the issue and there will be more detailed information on the Community Cage ML (where you should already be subscribed).

Former user August 7, 2018 at 7:35 AM
As for the reasons of the flood of false positives, there is no direct indication in the logs but seems like the VM lost networking completely as everything including the MailMan VM (located in the same OSAS cage) was reported as down:
Aug 07 04:10:23 monitoring.ovirt.org nagios[13103]: HOST ALERT: alterway02.ovirt.org;DOWN;SOFT;1;CRITICAL - Network Unreachable (89.31.150.216)
Aug 07 04:11:09 monitoring.ovirt.org nagios[13103]: HOST ALERT: engine-phx.ovirt.org;DOWN;SOFT;1;(Host check timed out after 30.01 seconds)
Aug 07 04:11:21 monitoring.ovirt.org nagios[13103]: HOST ALERT: lists.ovirt.org;DOWN;SOFT;1;check_ping: Invalid hostname/address - lists.ovirt.org
Aug 07 04:12:20 monitoring.ovirt.org nagios[13103]: HOST ALERT: gerrit.ovirt.org;DOWN;SOFT;1;(Host check timed out after 30.01 seconds)
was there any recorded outage at OSAS that could have caused that?
Details
Assignee
Marc Dequènes (Duck)Marc Dequènes (Duck)Reporter
Former userFormer user(Deactivated)Priority
Medium
Details
Details
Assignee

Reporter

There was a flood of Nagios notifications this morning which looked like false positives.
While investigating I stumbled upon numerous OOM conditions in logs so memory for the VM should be increased.