Add production servers monitoring including:
any service that affects the env.
make sure firstname.lastname@example.org list is updated on each failure.
1. create new VM on production DC - monitoring.phx.ovirt.org
2. install el7
3. choose monitoring system (we might have a few running on it) - want want to start with nagios/icingna
4. puppetize the server
5. start adding monitoring for all hypervisors/vms/disk spaces /etc.,..
best to open a google sheet and document which server/service we are monitoring.
but lets start with installing / creating vm.
Updated list of all services in PHX requiring monitoring
http,nagios,disk (not PROD yet)
squid,disk (not all monitored)
http,disk (not PROD yet)
Also there are the physical servers which need to be monitored. Hosts need to be added to monitoring as well as the hosts in the internal VLAN by setting up passive checks and forwarding them from the PHX Nagios.
Basic health monitoring for storage added to verify that both hosts respond to ping as well as the cluster IP.
unblocking, lets add what we can to the existing Nagios, it might take more time to reinstall the alterway server for it.
this should probably be moved to be an Epic and include smaller tasks for specific monitoring of services we are still lacking.
Please use this Epic for attaching tickets about monitoring to the remaining critical services, do we need to update the list with what we have monitoring today?