The certificate used to secure shift.ovirt.org is going to expire on 19.05.2020 - we need to replace it before that happens
Thanks for providing the certificates. I am now testing the playbook on Staging to see how disruptive re-enrollment is before doing it on Prod
Looked through the docs and related ansible playbooks:
https://docs.openshift.com/container-platform/3.9/install_config/certificate_customization.html
The recommended certificate redeployment flow involves regenerating all kinds of certificates (master, node, etcd, registry) and effectively restarts the entire cluster which is something that I’d like to avoid.
The router part is quite straight-forward as it is mostly limited to changing the “router-certs” secret of the “default“ namespace which can be done manually including backing up pre-existing secret contents.
oc get secret/router-certs -n default -o yaml
The Web Console is similar (UPD: the CN on the cert is webconsole.openshift-web-console.svc so it likely doesn’t need to be replaced, just the API cert below)
oc get secret/webconsole-serving-cert -n openshift-web-console -o yaml
For updating the API cert we’ll first have to update /etc/origin/master/master-config.yaml in the “namedCertificates“ section along with something on the UI side most probably and restart the masters one by one.
Since we already have the new API certificate, I’ve installed it across the masters and prepared changes in master configs. Unfortunately I can’t redirect traffic from the load balancer due to its settings so will just restart masters one by one in the evening to minimize CI effects due to disconnects.
Currently all masters have different service uptimes:
server | API up since |
---|---|
shift-m01 | 2019-12-04 |
shift-m02 | 2019-12-29 |
shift-m03 | 2020-02-28 |
main API consumers are: jenkins.ovirt.org and prow.apps.ovirt.org
Origin-master-api service restart completed on all three masters. accessing https://shift.ovirt.org shows the new certificate in place.
I was also able to confirm that jenkins is able to create new pods properly through the API so CI has not been disrupted.
Waiting for the wildcard to install it to the application routers.
I've split the router task into a separate ticket since it is in fact unrelated. Closing the API cert request as complete.