Document oVirt infra SLA + List of services

Description

We need to document how we make changes:

  • What services need notices out to the world?

  • What are the timeframes for notices?

    - 1 hour for critical

    - 12 to 24 hours for non-critical

  • Who needs to be notified for what?

    - e.g. leave announce out of administrivia

    - arch, users, and infra are the defaults so far

    - perhaps create an infra-announce list that just includes the others?

  • When do we not make changes?

    - change freeze around releases

    - other change freeze reasons

  • What is the process to make a change during a freeze?

    - When is the freeze slushy?

    - When is the freeze solid and should not be broken because of the potential risk at that moment? (We need something to weigh risk against risk.)

relates to

Activity

Show:

Eyal Edri December 15, 2018 at 2:38 PM

The current infra SLA is 'best effort', according to working hours ( mostly in UTC +2/3 Timezone ).
If at some point a request will come up, we'll revisit and see what are the options, given the available people in the team.

Former user February 20, 2017 at 9:48 AM

Some points from my side on our SLA's:
1) We do not have 24/7 or weekend coverage so cannot guarantee notification SLA below 48h
2) We have no spare/standby hardware outside of the PHX rack so I would not set any SLA on disaster recovery situations.

We can and should document and periodically test recovery of critical services like the Gerrit / Jenkins / resources to ensure we have neough backed up.

Eyal Edri February 19, 2017 at 4:39 PM

Important, but not critical.
ATM, planned outages are communicated to the infra & devel lists and usually lasts a few min with downtime of services.

We should find the time to formalize the process, but not highest priority considering the other open tasks.

Eyal Edri December 22, 2016 at 10:12 AM

http://ovirt-infra-docs.readthedocs.io/en/latest/General/Communication.html is a good start,
I think we should add a section also on maintenance windows and service upgrades.

Eyal Edri February 7, 2016 at 1:55 PM

this should be done in readthedocs.org:
http://ovirt-infra-docs.readthedocs.org/en/latest/

Fixed

Details

Assignee

Reporter

Components

Priority

Created December 3, 2012 at 11:02 PM
Updated August 29, 2019 at 2:12 PM
Resolved December 15, 2018 at 2:38 PM

Flag notifications